DIT410/TIN174, Artificial Intelligence
Peter Ljunglöf
28 March, 2017
A* always finds an optimal solution first, provided that:
the branching factor is finite,
arc costs are bounded above zero
(i.e., there is some \(\epsilon>0\)
such that all
of the arc costs are greater than \(\epsilon\)), and
\(h(n)\) is admissible
i.e., \(h(n)\) is nonnegative and an underestimate of
the cost of the shortest path from \(n\) to a goal node.
Graph search keeps track of visited nodes, so we don’t visit the same node twice.
Suppose that the first time we visit a node is not via the most optimal path
\(\Rightarrow\) then graph search will return a suboptimal path
Under which circumstances can we guarantee that A* graph search is optimal?
If \(h\) is consistent, then A* graph search is optimal:
Consistency is defined as: \(h(n’) \leq cost(n’, n) + h(n)\) for all arcs \((n’, n)\)
The \(f\) values in A* are nondecreasing, therefore:
first | A* expands all nodes with \( f(n) < C \) |
then | A* expands all nodes with \( f(n) = C \) |
finally | A* expands all nodes with \( f(n) > C \) |
A* will not expand any nodes with \( f(n) > C* \),
where \(C*\) is the cost of an optimal solution.
A* tree search is optimal if:
A* graph search is optimal if:
Search strategy |
Frontier selection |
Halts if solution? | Halts if no solution? | Space usage |
---|---|---|---|---|
Depth first | Last node added | No | No | Linear |
Breadth first | First node added | Yes | No | Exp |
Greedy best first | Minimal \(h(n)\) | No | No | Exp |
Uniform cost | Minimal \(g(n)\) | Optimal | No | Exp |
A* | \(f(n)=g(n)+h(n)\) | Optimal* | No | Exp |
*Provided that \(h(n)\) is admissible.
If (admissible) \(h_{2}(n)\geq h_{1}(n)\) for all \(n\),
then \(h_{2}\) dominates \(h_{1}\) and is better for search.
Typical search costs (for 8-puzzle):
depth = 14 | DFS ≈ 3,000,000 nodes A*(\(h_1\)) = 539 nodes A*(\(h_2\)) = 113 nodes |
depth = 24 | DFS ≈ 54,000,000,000 nodes A*(\(h_1\)) = 39,135 nodes A*(\(h_2\)) = 1,641 nodes |
Given any admissible heuristics \(h_{a}\), \(h_{b}\),
the maximum heuristics \(h(n)\)
is also admissible and dominates both:
\[ h(n) = \max(h_{a}(n),h_{b}(n)) \]
Admissible heuristics can be derived from the exact solution cost of
a relaxed problem:
If the rules of the 8-puzzle are relaxed so that a tile can move anywhere,
then \(h_{1}(n)\) gives the shortest solution
If the rules are relaxed so that a tile can move to any adjacent square,
then \(h_{2}(n)\) gives the shortest solution
Key point: the optimal solution cost of a relaxed problem is
never greater than
the optimal solution cost of the real problem
A* search with admissible (consistent) heuristics is optimal
But what happens if the heuristics is non-admissible?
Why would we want to use a non-admissible heuristics?
Here is an example demo of several different search algorithms, including A*.
Furthermore you can play with different heuristics:
http://qiao.github.io/PathFinding.js/visual/
Note that this demo is tailor-made for planar grids,
which is a special case of all possible search graphs.
BFS is guaranteed to halt but uses exponential space.
DFS uses linear space, but is not guaranteed to halt.
Idea: take the best from BFS and DFS — recompute elements of the frontier rather than saving them.
Iterative deepening search calls depth-bounded DFS with increasing bounds:
Depth bound = 0 1 2 3
Complexity with solution at depth \(k\) and branching factor \(b\):
level | breadth-first | iterative deepening | # nodes |
---|---|---|---|
\(1\) \(2\) \(\vdots\) \(k-1\) \(k\) |
\(1\) \(1\) \(\vdots\) \(1\) \(1\) |
\(k\) \(k-1\) \(\vdots\) \(2\) \(1\) |
\(b\) \(b^{2}\) \(\vdots\) \(b^{k-1}\) \(b^{k}\) |
total | \({}\geq b^{k}\) | \({}\leq b^{k}\left(\frac{b}{b-1}\right)^{2}\) |
Numerical comparison for \(k=5\) and \(b=10\):
Note: IDS recalculates shallow nodes several times,
but this doesn’t have a big effect compared to BFS!
The definition of searching is symmetric: find path from start nodes to goal node or from goal node to start nodes.
Forward branching factor: number of arcs going out from a node.
Backward branching factor: number of arcs going into a node.
Search complexity is \(O(b^{n})\). Therefore, we should use forward search if forward branching factor is less than backward branching factor, and vice versa.
Note: when a graph is dynamically constructed, the backwards graph may not be available.
Idea: search backward from the goal and forward from the start simultaneously.
This can result in an exponential saving, because \(2b^{k/2}\ll b^{k}\).
The main problem is making sure the frontiers meet.
One possible implementation:
Use BFS to gradually search backwards from the goal,
building a set of locations that will lead to the goal.
Interleave this with forward heuristic search (e.g., A*)
that tries to find a path to these interesting locations.
Idea: for statically stored graphs,
build a table of the actual distance \(dist(n)\),
of the shortest path from node \(n\) to a goal.
\(dist(n)\) | = | if \(isGoal(n)\) then \(0\) else \(\min_{(n,m)\in G}(\left|(n,m)\right|+dist(m))\) |
The calculation of \(dist\) can be interleaved with a forward heuristic search.
The biggest problem with A* is the space usage.
Can we make an iterative deepening version?
Put \(n\) queens on an \(n\times n\) board, in separate columns
Move a queen to reduce the number of conflicts;
repeat until we cannot move any queen anymore
\(\Rightarrow\) then we are at a local maximum, hopefully it is global too
This almost always solves \(n\)-queens problems
almost instantaneously
for very large \(n\)
(e.g., \(n\) = 1 million)
Move a queen within its column, choose the minimum n:o of conflicts
Start with any complete tour, and perform pairwise exchanges
Variants of this approach get within 1% of optimal
very quickly with thousands of cities
Also called gradient/steepest ascent/descent,
or greedy local search.
Local maxima — Ridges — Plateaux
Which do you expect to work better to find a global maximum?
As well as upward steps we can allow for:
Random steps: (sometimes) move to a random neighbor.
Random restart: (sometimes) reassign random values to all variables.
Both variants can be combined!
Two 1-dimensional search spaces; you can step right or left:
Simulated annealing is an implementation of random steps:
T is the “cooling temperature”, which decreases slowly towards 0
The cooling speed is decided by the schedule
Idea: maintain a population of \(k\) states in parallel, instead of one.
The value of \(k\) lets us limit space and parallelism.
Note: this is not the same as \(k\) searches run in parallel!
Similar to beam search, but it chooses the next \(k\) individuals probabilistically.
The probability that a neighbor is chosen is proportional to its heuristic value.
This maintains diversity amongst the individuals.
The heuristic value reflects the fitness of the individual.
Similar to natural selection:
each individual mutates and the fittest ones survive.
The \(n\) queens problem can be encoded as \(n\) numbers \(1\ldots n\):
How can you compare three algorithms A, B and C, when
A solves the problem 30% of the time very quickly but doesn’t halt for the other 70% of the cases
B solves 60% of the cases reasonably quickly but doesn’t solve the rest
C solves the problem in 100% of the cases, but slowly?
Summary statistics, such as mean run time or median run time
don’t make much sense.
Plots the runtime and the proportion of the runs that are solved within that runtime.