CS 5/7320
Artificial Intelligence
Local Search
AIMA Chapters 4.1 & 4.2
Slides by Michael Hahsler
based on slides by Svetlana Lazepnik
with figures from the AIMA textbook.
This work is licensed under a Creative Commons
Attribution-ShareAlike 4.0 International License.
Recap: Uninformed and Informed Search
Tries to plan the Initial state
best path
from a
given initial state
to a
given goal state.
• Typically searches a large Goal
portion of the search space state
(needs time and memory).
• Often comes with optimality
guarantees.
Local Search Algorithms
Local move from
state to state
• What if we do not know the goal
state, but the utility of different
states is given by a utility function
𝑈 = 𝑢(𝑠)?
x • We use a factored state description.
x Here 𝑠 = (𝑥, 𝑦)
• We could try to identify the best or a
at least a “good” state?
• This is the optimization problem:
𝑠 ∗ = argmax 𝑢(𝑠)
𝑠∈𝑆
• We need a fast and memory-efficient
way to find the best/a good state.
Idea:
Start with a current solution (a state) and improve the solution by moving from the
current state to a “neighboring” better state (a.k.a. performing a series of local moves).
Local Search Algorithms
Difference to search from the previous chapter:
a) Goal state is unknown, but we know or can calculate the utility for each state.
We want to identify the state with the highest utility.
b) Often no explicit initial state + path to goal and path cost are not important.
c) No search tree. Just stores the current state and move to a “better” state if
possible.
Use in AI
• Goal-based agent: Identify a good goal state with a good utility before planning
the path to that state.
• Utility-based agent: Always move to neighboring higher utility states. A simple
greedy method used for complicated/large state spaces or online search.
• General optimization: 𝑢(𝑠) can be replaced by a general objective function.
Local search is an effective heuristic to find good solutions in large or continuous
search spaces. E.g., gradient descend to train neural networks.
states
Example:
n-Queens Problem
Goal: Put n queens on an n × n board with no
two queens on the same row, column, or
diagonal.
Defining the search problem:
• State space: All possible n-queen
configurations. How many are there?
• State representation: How do we define a
structured representation?
• Objective function: What is a possible utility
function given the state representation?
• Local neighborhood: What states are close to
each other?
2 conflicts = utility of -2
4
Example:
3
n-Queens Problem
2 Defining the search problem:
• State space: All possible n-queen
configurations. How many are there?
1 16
4-queens problem: 4 = 1820
A B C D • State representation: How do we define a
structured representation?
E.g. (𝐴2, 𝐵3, 𝐵4, 𝐶1)
• Objective function: What is a possible utility
function given the state representation?
Maximizing utility means minimize the
number of pairwise conflicts based on the
state representation.
• Local neighborhood: What states are close
to each other?
Move a single queen.
0 conflicts = utility of 0
Example: Traveling salesman problem
• Goal: Find the shortest tour connecting a given set of cities
• State space: all possible tours (states are not individual cities!)
• State representation: Order of cities in the tour.
• Objective function: minimize the length of the tour
• Local neighborhood: Change the order of visiting a few cities.
Note: We have solved a different problem with uninformed/informed search! Each city was defined as a state
and the path was the solution.
Hill-Climbing Search
aka Greedy Local Search
Idea: keep a single “current” state and try to find better
neighboring states.
MotorCycleUSA.com
Example: n-Queens Problem
• Goal: Put n queens on an n × n board with no two queens on the same row, column, or diagonal.
• State space: all possible n-queen configurations. We can restrict the state space: Only one queen
per column.
• State representation: row position of each queen in its column (e.g., 2, 3, 2, 3)
• Objective function: minimize the number of pairwise conflicts. State space is
• Local neighborhood: Move one queen anywhere in its column. reduced
from 1820 to
44 = 256
Improvement strategy
• Find a local neighboring state (move one queen within its column) to reduce conflicts
1
Example: n-Queens Problem
To find the best local move, we must evaluate all local neighbors (moving a queen
in its column) and calculate the objective function.
Objective value after moving
the queen to this square
Current objective value: ℎ = 17
best local improvement has ℎ = 12
Notes:
• There are many options with ℎ = 12.
We must choose one!
• Calculating all the objective values may be
expensive!
Example: n-Queens Problem
Formulation as an optimization problem:
Find the best state 𝑠 ∗ representing an arrangement of queens.
𝑠 ∗ = argmin𝑠∈𝑆 conflicts(𝑠)
subject to: 𝑠 has one queen per column Remember: This makes
the problem a lot easier.
Example: Traveling Salesman Problem
• Goal: Find the shortest tour connecting n cities
• State space: all possible tours
• State representation: tour (order in which to visit the cities) = a permutation
• Objective function: length of tour
• Local neighborhood: reverse the order of visiting a few cities
Local move to reverse the order of cities C, E and D:
A B A B
C C
D E D E
State representation: ABDEC ABCED
Example: Traveling Salesman Problem
Formulation as an optimization problem:
Find the best tour 𝜋
𝜋 ∗ = argmin𝜋 tourLength 𝜋
s.t. 𝜋 is a valid permutation (i.e., sub-tour elimination)
Local move to reverse the order of cities C, E and D:
A B A B
C C
D E D E
State representation: ABDEC ABCED
Hill-Climbing Search (= Greedy Local Search)
Typically, we start with a random state
Variants:
Steepest-ascend hill climbing
• Check all possible successors and choose the highest-valued successors.
Stochastic hill climbing
• choose randomly among all uphill moves, or
• generate randomly one new successor at a time until a better one is found = first-choice
hill climbing – the most popular variant, this is what people often mean when they say
“stochastic hill climbing”
Local Optima
Hill-climbing search is like greedy best-first search with the objective function
as a (maybe not admissible) heuristic and no frontier (just stops in a dead
end).
Is it complete/optimal?
• No – can get stuck in local optima
Example: local optimum for the 8-
queens problem. No single queen
can be moved within its column
to improve the objective
function.
ℎ = 1
Simple approach that can help with local optima:
Random-restart hill climbing
• Restart hill-climbing many times with random initial states and return the best
solution.
The State Space “Landscape”
We can get the utility (objective function value) from the state description using 𝑈 = 𝑢 𝑠 .
max.
𝑢 𝑠
Neighbors placed
𝑠 next to each other
How to escape local maxima?
→ Random restart hill-climbing can help.
What about “shoulders” (called “ridges” in higher dimensional space)?
→ Hill-climbing that allows sideways moves and uses momentum.
Minimization vs. Maximization
• The name hill climbing implies maximizing a function.
• Optimizers like to state problems as minimization problems and call hill
climbing gradient descent instead.
• Both types of problems are equivalent:
ma𝑥 𝑓 𝑥 ⟺ min −𝑓 𝑥
Convex vs. Non-Convex
Optimization Problems
Minimization problems
Convex Problem Non-convex Problem
One global optimum +
Many local optima → hard
smooth function → calculus
makes it easy
Many discrete optimization
problems are like this.
Simulated Annealing
Use heat to escape local optima…
Simulated Annealing
• Idea: First-choice stochastic hill climbing + escape local minima by
allowing some “bad” moves but gradually decrease their frequency.
• Inspired by the process of tempering or hardening metals by
decreasing the temperature (chance of accepting bad moves)
gradually.
“bad” local move
Objective function (minimize)
State space
Simulated Annealing
• Idea: First-choice stochastic hill climbing + escape local minima
by allowing some “bad” moves but gradually decreasing their
frequency as we get closer to the solution.
• The probability of accepting “bad” moves follows an annealing
schedule that reduces the temperature 𝑇 over time 𝑡.
Typically, we start with a random state
VALUE(next) – Value(current) Always do good moves
Uses the Metropolis
acceptance criterion
to accept “bad” moves
Note: Use VALUE(current) – VALUE(next) for minimization
The Effect of Temperature
Prob. of accepting a worse state exp(E/T)
E (neg. means the solution gets worse)
The lower the temperature, the less likely the algorithm will accept a worse state.
Cooling Schedule
The cooling schedule is very important.
Popular schedules for the temperature at time 𝑡:
1
• Classic simulated annealing: 𝑇𝑡 = 𝑇0 log(1+𝑡)
• Fast simulated annealing (Szy and Hartley; 1987)
1
𝑇𝑡 = 𝑇0
1+𝑡
• Exponential cooling (Kirkpatrick, Gelatt and Vecchi; 1983)
𝑇𝑡 = 𝑇0 𝛼 𝑡 for 0.8 < 𝛼 < 1
Notes:
• The best schedule is typically determined by trial-and-error.
• Choose 𝑇0 to provide a high probability that any move will be accepted at time 𝑡 = 0.
• 𝑇𝑡 will not become 0 but very small. Stop when 𝑇 < 𝜖 (𝜖 is a very small constant).
Simulated Annealing Search
Guarantee: If temperature decreases slowly enough,
then simulated annealing search will find a global
optimum with probability approaching one.
However:
• This usually takes impractically long.
Evolutionary Algorithms
A Population-based Metaheuristics
Evolutionary Algorithms / Genetic Algorithms
• A metaheuristic for population-based optimization.
• Uses mechanisms inspired by biological evolution (genetics):
• Reproduction: Random selection with probability based on a
fitness function.
• Random recombination (crossover)
• Random mutation 8
• Repeated for many generations 7
6
5
4
3
• Example: 8-queens problem 2
1
Individual = state
representation as
a chromosome:
row of the queen
in each column
next generation
Search in Continuous Spaces
Discretization of Continuous Space
• Use atomic states and create a graph as the transition
function.
• Use a grid with spacing of size 𝛿
Note: You probably need a way y
finer grid!
x
Discretization of Continuous Space
How did we discretize this space?
Initial state
Goal
state
Search in Continuous Spaces:
Gradient Descent 𝑓(𝑥)
State space: infinite
State representation: 𝒙 = 𝑥1 , 𝑥2 , … , 𝑥𝑘
Objective function: min 𝑓 𝒙 = 𝑓 𝑥1 , 𝑥2 , … , 𝑥𝑘
Local neighborhood: small changes in 𝑥1 , 𝑥2 , … , 𝑥𝑘
𝜕𝑓 𝒙 𝜕𝑓 𝒙 𝜕𝑓 𝒙
Gradient at point 𝒙: ∇𝑓 𝒙 = , , …, 𝑥1 𝑥2
𝜕𝑥1 𝜕𝑥2 𝜕𝑥𝑘
(=evaluation of the Jacobian matrix at 𝒙)
Find optimum by solving: ∇𝑓 𝒙 = 0
• Gradient descent (= Steepest-ascend hill climbing for minimization)
with step size 𝛼
Repeat: 𝒙 ← 𝒙 − 𝛼∇𝑓 𝒙
• Newton-Raphson method
uses the inverse of the Hessian matrix (second-order partial derivative of 𝑓 𝒙 )
𝜕2 𝑓
𝐻𝑖𝑗 = 𝜕𝑥 𝜕𝑥 for the step size 𝛼
𝑖 𝑗
Repeat: 𝒙 ← 𝒙 − 𝑯𝑓−1 (𝒙)∇𝑓 𝒙
Note: May get stuck in a local optima if the search space is non-convex! Use simulated annealing, momentum or other
methods to escape local optima.
Search in Continuous Spaces:
Empirical Gradient Methods
• What if the mathematical formulation of the objective function is not
known?
• We may have objective values at fixed points, called the training data.
• In this case, we can estimate the gradient using the data and use
empirical gradient search.
→ We will talk more about search in continuous spaces with loss functions
using gradient descend when we talk about parameter learning for
machine learning.