DIT411/TIN175, Artificial Intelligence
Peter Ljunglöf
23 January, 2018
A* always finds an optimal solution first, provided that:
the branching factor is finite,
arc costs are bounded above zero
(i.e., there is some \(\epsilon>0\)
such that all
of the arc costs are greater than \(\epsilon\)), and
\(h(n)\) is admissible
i.e., \(h(n)\) is nonnegative and an underestimate of
the cost of the shortest path from \(n\) to a goal node.
Graph search keeps track of visited nodes, so we don’t visit the same node twice.
Suppose that the first time we visit a node is not via the most optimal path
\(\Rightarrow\) then graph search will return a suboptimal path
Under which circumstances can we guarantee that A* graph search is optimal?
A heuristic function \(h\) is consistent (or monotone) if
\( |h(m)-h(n)| \leq cost(m,n) \)
for every arc \((m,n)\)
(This is a form of triangle inequality)
If \(h\) is consistent, then A* graph search will always finds
the shortest path to a goal.
This is a stronger requirement than admissibility.
A* tree search is optimal if:
A* graph search is optimal if:
Search strategy |
Frontier selection |
Halts if solution? | Halts if no solution? | Space usage |
---|---|---|---|---|
Depth first | Last node added | No | No | Linear |
Breadth first | First node added | Yes | No | Exp |
Greedy best first | Minimal \(h(n)\) | No | No | Exp |
Uniform cost | Minimal \(g(n)\) | Optimal | No | Exp |
A* | \(f(n)=g(n)+h(n)\) | Optimal* | No | Exp |
**On finite graphs with cycles, not infinite graphs.
*Provided that \(h(n)\) is admissible.
Search strategy |
Frontier selection |
Halts if solution? | Halts if no solution? | Space usage |
---|---|---|---|---|
Depth first | Last node added | (Yes)** | Yes | Exp |
Breadth first | First node added | Yes | Yes | Exp |
Greedy best first | Minimal \(h(n)\) | No | Yes | Exp |
Uniform cost | Minimal \(g(n)\) | Optimal | Yes | Exp |
A* | \(f(n)=g(n)+h(n)\) | Optimal* | Yes | Exp |
**On finite graphs with cycles, not infinite graphs.
*Provided that \(h(n)\) is consistent.
If (admissible) \(h_{2}(n)\geq h_{1}(n)\) for all \(n\),
then \(h_{2}\) dominates \(h_{1}\) and is better for search.
Typical search costs (for 8-puzzle):
depth = 14 | DFS ≈ 3,000,000 nodes A*(\(h_1\)) = 539 nodes A*(\(h_2\)) = 113 nodes |
depth = 24 | DFS ≈ 54,000,000,000 nodes A*(\(h_1\)) = 39,135 nodes A*(\(h_2\)) = 1,641 nodes |
Given any admissible heuristics \(h_{a}\), \(h_{b}\),
the maximum heuristics \(h(n)\)
is also admissible and dominates both:
\[ h(n) = \max(h_{a}(n),h_{b}(n)) \]
Admissible heuristics can be derived from the exact solution cost of
a relaxed problem:
If the rules of the 8-puzzle are relaxed so that a tile can move anywhere,
then \(h_{1}(n)\) gives the shortest solution
If the rules are relaxed so that a tile can move to any adjacent square,
then \(h_{2}(n)\) gives the shortest solution
Key point: the optimal solution cost of a relaxed problem is
never greater than
the optimal solution cost of the real problem
A* tree (graph) search with admissible (consistent) heuristics is optimal.
But what happens if the heuristics is non-admissible (non-consistent)?
Why would we want to use a non-admissible heuristics?
* for graph search, \( |h(m)-h(n)| > cost(m,n) \), for some \((m,n)\)
Here is an example demo of several different search algorithms, including A*.
Furthermore you can play with different heuristics:
http://qiao.github.io/PathFinding.js/visual/
Note that this demo is tailor-made for planar grids,
which is a special case of all possible search graphs.
BFS is guaranteed to halt but uses exponential space.
DFS uses linear space, but is not guaranteed to halt.
Idea: take the best from BFS and DFS — recompute elements of the frontier rather than saving them.
Iterative deepening search calls depth-bounded DFS with increasing bounds:
Depth bound = 0 1 2 3
Complexity with solution at depth \(k\) and branching factor \(b\):
level | # nodes | BFS node visits | ID node visits |
---|---|---|---|
\(1\) \(2\) \(3\) \(\vdots\) \(k\) |
\(b\) \(b^{2}\) \(b^{3}\) \(\vdots\) \(b^{k}\) |
\(1\cdot b^{1}\) \(1\cdot b^{2}\) \(1\cdot b^{3}\) \(\vdots\) \(1\cdot b^{k}\) |
\(\,\,\,\,\,\,\,\,k\,\,\cdot b^{1}\) \((k{-}1)\cdot b^{2}\) \((k{-}2)\cdot b^{3}\) \(\,\,\,\,\,\,\,\,\vdots\) \(\,\,\,\,\,\,\,\,1\,\,\cdot b^{k}\) |
total | \({}\geq b^{k}\) | \({}\leq b^{k}\left(\frac{b}{b-1}\right)^{2}\) |
Numerical comparison for \(k=5\) and \(b=10\):
Note: IDS recalculates shallow nodes several times,
but this doesn’t have a big effect compared to BFS!
(will not be in the written examination, but could be used in Shrdlite)
The definition of searching is symmetric: find path from start nodes to goal node or from goal node to start nodes.
Forward branching factor: number of arcs going out from a node.
Backward branching factor: number of arcs going into a node.
Search complexity is \(O(b^{n})\).
Therefore, we should use forward search if forward branching factor is less than backward branching factor, and vice versa.
Note: if a graph is dynamically constructed, the backwards graph may not be available.
Idea: search backward from the goal and forward from the start simultaneously.
This can result in an exponential saving, because \(2b^{k/2}\ll b^{k}\).
The main problem is making sure the frontiers meet.
One possible implementation:
Use BFS to gradually search backwards from the goal,
building a set of locations that will lead to the goal.
Interleave this with forward heuristic search (e.g., A*)
that tries to find a path to these interesting locations.
(will not be in the written examination, but could be used in Shrdlite)
A big problem with A* is space usage — is there an iterative deepening version?
Put \(n\) queens on an \(n\times n\) board, in separate columns
Move a queen to reduce the number of conflicts;
repeat until we cannot move any queen anymore
\(\Rightarrow\) then we are at a local maximum, hopefully it is global too
This almost always solves \(n\)-queens problems
almost instantaneously
for very large \(n\)
(e.g., \(n\) = 1 million)
Move a queen within its column, choose the minimum n:o of conflicts
Start with any complete tour, and perform pairwise exchanges
Variants of this approach can very quickly get
within 1% of optimal solution for thousands of cities
Also called (gradient/steepest) (ascent/descent),
or greedy local search
Local maxima — Ridges — Plateaux
Consider two methods to find a maximum value:
Greedy ascent: start from some position,
keep moving upwards, and report maximum value found
Pick values at random, and report maximum value found
Which do you expect to work better to find a global maximum?
As well as upward steps we can allow for:
Random steps: (sometimes) move to a random neighbor.
Random restart: (sometimes) reassign random values to all variables.
Both variants can be combined!
Two 1-dimensional search spaces; you can step right or left:
(these sections will not be in the written examination)
Simulated annealing is an implementation of random steps:
T is the “cooling temperature”, which decreases slowly towards 0
The cooling speed is decided by the schedule
Idea: maintain a population of \(k\) states in parallel, instead of one.
The value of \(k\) lets us limit space and parallelism.
Note: this is not the same as \(k\) searches run in parallel!
Similar to beam search, but it chooses the next \(k\) individuals probabilistically.
The probability that a neighbor is chosen is proportional to its heuristic value.
This maintains diversity amongst the individuals.
The heuristic value reflects the fitness of the individual.
Similar to natural selection:
each individual mutates and the fittest ones survive.
(will not be in the written examination)
How can you compare three algorithms A, B and C, when
A solves the problem 30% of the time very quickly but doesn’t halt
for the other 70% of the cases
B solves 60% of the cases reasonably quickly but doesn’t solve the rest
C solves the problem in 100% of the cases, but slowly?
Summary statistics, such as mean run time or median run time
don’t make much sense.
Plots the runtime and the proportion of the runs that are solved within that runtime.