Chapters 4–5: Nonclassical and adversarial search
DIT411/TIN175, Artificial Intelligence
Peter Ljunglöf
2 February, 2018
Repetition
 Search problems, graphs, states, arcs, goal test,
generic search algorithm, tree search, graph search,
depthfirst search, breadthfirst search, uniform cost search,
iterative deepending, bidirectional search, …
Heuristic search (R&N 3.5–3.6)
 Greedy bestfirst search, A* search,
heuristics, admissibility, consistency, dominating heuristics, …
Local search (R&N 4.1)
 Hill climbing / gradient descent, random moves, random restarts, beam search, simulated annealing, …
Nonclassical search
Nondeterministic search (R&N 4.3)
Partial observations (R&N 4.4)
Nondeterministic search (R&N 4.3)

 Contingency plan / strategy
 Andor search trees (not in the written exam)
An erratic vacuum cleaner

The eight possible states of the vacuum world; states 7 and 8 are goal states.

There are three actions: Left, Right, Suck.
Assume that the Suck action works as follows:
 if the square is dirty, it is cleaned but sometimes also the adjacent square is
 if the square is clean, the vacuum cleaner sometimes deposists dirt
Nondeterministic outcomes, contingency plans
 Assume that the Suck action is nondeterministic:
 if the square is dirty, it is cleaned but sometimes also the adjacent square is
 if the square is clean, the vacuum cleaner sometimes deposists dirt

 Now we need a more general result function:
 instead of returning a single state, it returns a set of possible outcome states
 e.g., \(\textsf{Results}(\textsf{Suck}, 1) = \{5, 7\}\) and \(\textsf{Results}(\textsf{Suck}, 5) = \{1, 5\}\)

 We also need to generalise the notion of a solution:
 instead of a single sequence (path) from the start to the goal,
we need a strategy (or a contingency plan)
 i.e., we need ifthenelse constructs
 this is a possible solution from state 1:
 [Suck,
if
State=5 then
[Right, Suck] else
[]]
How to find contingency plans
(will not be in the written examination)
 We need a new kind of nodes in the search tree:
 and nodes:
these are used whenever an action is nondeterministic
 normal nodes are called or nodes:
they are used when we have several possible actions in a state

 A solution for an andor search problem is a subtree that:
 has a goal node at every leaf
 specifies exactly one action at each of its or node
 includes every branch at each of its and node
A solution to the erratic vacuum cleaner
(will not be in the written examination)
The solution subtree is shown in bold, and corresponds to the plan:
[Suck, if
State=5 then
[Right, Suck] else
[]]
An algorithm for finding a contingency plan
(will not be in the written examination)
This algorithm does a depthfirst search in the andor tree,
so it is not guaranteed to find the best or shortest plan:
 function AndOrGraphSearch(problem):
 return OrSearch(problem.InitialState, problem, [])

 function OrSearch(state, problem, path):
 if problem.GoalTest(state) then return []
 if state is on path then return failure
 for each action in problem.Actions(state):
 plan := AndSearch(problem.Results(state, action), problem, [state] ++ path)
 if plan ≠ failure then return [action] ++ plan
 return failure

 function AndSearch(states, problem, path):
 for each \(s_i\) in states:
 \(plan_i\) := OrSearch(\(s_i\), problem, path)
 if \(plan_i\) = failure then return failure
 return [
if
\(s_1\) then
\(plan_1\) else if
\(s_2\) then
\(plan_2\) else
… if
\(s_n\) then
\(plan_n\)]
While loops in contingency plans
(will not be in the written examination)
 If the search graph contains cycles, ifthenelse is not enough in a contingency plan:
 we need while loops instead

 In the slippery vacuum world above, the cleaner don’t always move when told:
 the solution above translates to [Suck,
while
State=5 do
Right, Suck]
Partial observations (R&N 4.4)

 Belief states: goal test, transitions, …
 Sensorless (conformant) problems
 Partially observable problems
Observability vs determinism
 A problem is nondeterministic if there are several possible outcomes of an action
 deterministic — nondeterministic (chance)

 It is partially observable if the agent cannot tell exactly which state it is in
 fully observable (perfect info.) — partially observable (imperfect info.)

 A problem can be either nondeterministic, or partially observable, or both:
Belief states
 Instead of searching in a graph of states, we use belief states
 A belief state is a set of states

 In a sensorless (or conformant) problem, the agent has no information at all
 The initial belief state is the set of all problem states
 e.g., for the vacuum world the initial state is {1,2,3,4,5,6,7,8}

 The goal test has to check that all members in the belief state is a goal
 e.g., for the vacuum world, the following are goal states: {7}, {8}, and {7,8}

 The result of performing an action is the union of all possible results
 i.e., \(\textsf{Predict}(b,a) = \{\textsf{Result}(s,a)\) for each \(s\in b\}\)
 if the problem is also nondeterministic:
 \(\textsf{Predict}(b,a) = \bigcup\{\textsf{Results}(s,a)\) for each \(s\in b\}\)
Predicting belief states in the vacuum world

(a) Predicting the next belief state for the sensorless vacuum world
with a deterministic action, Right.

(b) Prediction for the same belief state and action in the nondeterministic
slippery version of the sensorless vacuum world.
The deterministic sensorless vacuum world
Partial observations: state transitions
 With partial observations, we can think of belief state transitions in three stages:
 Prediction, the same as for sensorless problems:
 \(b’ = \textsf{Predict}(b,a) = \{\textsf{Result}(s,a)\) for each \(s\in b\}\)
 Observation prediction, determines the percepts that can be observed:
 \(\textsf{PossiblePercepts}(b’) = \{\textsf{Percept}(s)\) for each \(s\in b’\}\)
 Update, filters the predicted states according to the percepts:
 \(\textsf{Update}(b’,o) = \{s\) for each \(s\in b’\) such that \(o = \textsf{Percept}(s)\}\)

 Belief state transitions:
 \(\textsf{Results}(b,a) = \{\textsf{Update}(b’,o)\) for each \(o\in\textsf{PossiblePercepts}(b’)\}\)
where \(b’ = \textsf{Predict}(b,a)\)
Transitions in partially observable vacuum worlds
 The percepts return the current position and the dirtyness of that square.


 The deterministic world:
Right always succeeds.






 The slippery world:
Right sometimes fails.
Example: Robot Localisation
The percepts return whether there is a wall in each of the directions.
 Possible initial positions of the robot, after E_{1} = North, South, West.
 After moving right and observing E_{2} = North, South,
there’s only one possible position left.
Adversarial search
Types of games (R&N 5.1)
Minimax search (R&N 5.2–5.3)
Imperfect decisions (R&N 5.4–5.4.2)
Stochastic games (R&N 5.5)
Types of games (R&N 5.1)

 cooperative, competetive, zerosum games
 game trees, ply/plies, utility functions
Types of games (again)

Perfect information games are solvable in a manner similar to
fully observable singleagent systems, e.g., using forward search.

If two agents compete, so that a positive reward for one is a negative reward
for the other agent, we have a twoagent zerosum game.

The value of a game zerosum game can be characterised by a single number that one agent is trying to maximise and the other agent is trying to minimise.

This leads to a minimax strategy:
 A node is either a MAX node (if it is controlled by the maximising agent),
 or is a MIN node (if it is controlled by the minimising agent).
Minimax search (R&N 5.2–5.3)

 Minimax algorithm
 αβ pruning
Minimax search for zerosum games
 Given two players called MAX and MIN:
 MAX wants to maximise the utility value,
 MIN wants to minimise the same value.
 \(\Rightarrow\) MAX should choose the alternative that maximises, assuming MIN minimises.

 Minimax gives perfect play for deterministic, perfectinformation games:
 function Minimax(state):
 if TerminalTest(state) then return Utility(state)
 A := Actions(state)
 if state is a MAX node then return \(\max_{a\in A}\) Minimax(Result(state, a))
 if state is a MIN node then return \(\min_{a\in A}\) Minimax(Result(state, a))
Minimax search: tictactoe
Minimax example
The Minimax algorithm gives perfect play for deterministic, perfectinformation games.
Can Minimax be wrong?
 Minimax gives perfect play, but is that always the best strategy?
 Perfect play assumes that the opponent is also a perfect player!
3player minimax
(will not be in the written examination)
Minimax can also be used on multiplayer games
\(\alpha{}\beta\,\) pruning
Minimax(root) 
= 
\( \max(\min(3,12,8), \min(2,x,y), \min(14,5,2)) \) 

= 
\( \max(3, \min(2,x,y), 2) \) 

= 
\( \max(3, z, 2) \) where \(z = \min(2,x,y) \leq 2\) 

= 
\( 3 \) 
 I.e., we don’t need to know the values of \(x\) and \(y\)!
\(\alpha{}\beta\,\) pruning, general idea

 The general idea of αβ pruning is this:
 • if \(m\) is better than \(n\) for Player,
 we don’t want to pursue \(n\)
 • so, once we know enough about \(n\)
 we can prune it
 • sometimes it’s enough to examine
 just one of \(n\)’s descendants



 αβ pruning keeps track of the possible range
of values for every node it visits;
the parent range is updated when the child has been visited.
The \(\alpha{}\beta\) algorithm
 function AlphaBetaSearch(state):
 v := MaxValue(state, \(\infty\), \(+\infty\)))
 return the action in Actions(state) that has value v

 function MaxValue(state, α, β):
 if TerminalTest(state) then return Utility(state)
 v := \(\infty\)
 for each action in Actions(state):
 v := max(v, MinValue(Result(state, action), α, β))
 if v ≥ β then return v
 α := max(α, v)
 return v

 function MinValue(state, α, β):
 same as MaxValue but reverse the roles of α/β and min/max and \(\infty/{+}\infty\)
How efficient is \(\alpha{}\beta\) pruning?
Imperfect decisions (R&N 5.4–5.4.2)

 Hminimax algorithm
 evaluation function, cutoff test
 features, weighted linear function
 quiescence search, horizon effect
Hminimax algorithm
 The Heuristic Minimax algorithm is similar to normal Minimax
 it replaces TerminalTest with CutoffTest, and Utility with Eval
 the cutoff test needs to know the current search depth
 function HMinimax(state, depth):
 if CutoffTest(state, depth) then return Eval(state)
 A := Actions(state)
 if state is a MAX node then return \(\max_{a\in A}\) HMinimax(Result(state, a), depth+1)
 if state is a MIN node then return \(\min_{a\in A}\) HMinimax(Result(state, a), depth+1)
Chess positions: how to evaluate
Weighted linear evaluation functions
Evaluation functions
A naive weighted sum of features will not see the difference between these two states.
Problems with cutoff tests
 Too simplistic cutoff tests and evaluation functions can be problematic:
 e.g., if the cutoff is only based on the current depth
 then it might cut off the search in unfortunate positions
(such as (b) on the previous slide)

 We want more sophisticated cutoff tests:
 only cut off search in quiescent positions
 i.e., in positions that are “stable”, unlikely to exhibit wild swings in value
 nonquiescent positions should be expanded further

 Another problem is the horizon effect:
 if a bad position is unavoidable (e.g., loss of a piece), but the system can
delay it from happening, it might push the bad position “over the horizon”
 in the end, the resulting delayed position might be even worse
Deterministic games in practice
 Chess:
 IBM DeepBlue beats world champion Garry Kasparov, 1997.
 Google AlphaZero beats best chess program Stockfish, December 2017.

 Checkers/Othello/Reversi:
 Logistello beats the world champion in Othello/Reversi, 1997.
 Chinook plays checkers perfectly, 2007. It uses an endgame database
defining perfect play for all 8piece positions on the board,
(a total of 443,748,401,247 positions).

 Go:
 First Go programs to reach low danlevels, 2009.
 Google AlphaGo beats the world’s best Go player, Ke Jie, May 2017.
 Google AlphaZero beats AlphaGo, December 2017.
 AlphaZero learns board game strategies by playing itself, it does not use a database of previous matches, opening books or endgame tables.
Stochastic games (R&N 5.5)
Note: this section will be presented Tuesday 6th February!