Chapters 4–5: Nonclassical and adversarial search
DIT410/TIN174, Artificial Intelligence
Peter Ljunglöf
21 April, 2017
Repetition
 Search problems, graphs, states, arcs, goal test,
generic search algorithm, tree search, graph search,
depthfirst search, breadthfirst search, uniform cost search,
iterative deepending, bidirectional search, …
Heuristic search (R&N 3.5–3.6)
 Greedy bestfirst search, A* search,
heuristics, admissibility, consistency, dominating heuristics, …
Local search (R&N 4.1)
 Hill climbing / gradient descent, random moves, random restarts, beam search, simulated annealing, …
Nonclassical search
Nondeterministic search (R&N 4.3)
Partial observations (R&N 4.4)
Nondeterministic search (R&N 4.3)

 Contingency plan (strategy)
 Andor search trees
 Andor graph search algorithm
The vacuum cleaner world, again

The eight possible states of the vacuum world; states 7 and 8 are goal states.

There are three actions: Left, Right, Suck
An erratic vacuum cleaner
 Assume that the Suck action works as follows:
 if the square is dirty, it is cleaned but sometimes also the adjacent square is
 if the square is clean, the vacuum cleaner sometimes deposists dirt

 Now we need a more general result function:
 instead of returning a single state, it returns a set of possible outcome states
 e.g., \(\textsf{Results}(\textsf{Suck}, 1) = \{5, 7\}\) and \(\textsf{Results}(\textsf{Suck}, 5) = \{1, 5\}\)

 We also need to generalise the notion of a solution:
 instead of a single sequence (path) from the start to the goal,
we need a strategy (or a contingency plan)
 i.e., we need ifthenelse constructs
 this is a possible solution from state 1:
 [Suck,
if
State=5 then
[Right, Suck] else
[]]
How to find contingency plans
 We need a new kind of nodes in the search tree:
 and nodes:
these are used whenever an action is nondeterministic
 normal nodes are called or nodes:
they are used when we have several possible actions in a state

 A solution for an andor search problem is a subtree that:
 has a goal node at every leaf
 specifies exactly one action at each of its or node
 includes every branch at each of its and node
A solution to the erratic vacuum cleaner
The solution subtree is shown in bold, and corresponds to the plan:
[Suck, if
State=5 then
[Right, Suck] else
[]]
An algorithm for finding a contingency plan
This algorithm does a depthfirst search in the andor tree,
so it is not guaranteed to find the best or shortest plan:
 function AndOrGraphSearch(problem):
 return OrSearch(problem.InitialState, problem, [])

 function OrSearch(state, problem, path):
 if problem.GoalTest(state) then return []
 if state is on path then return failure
 for each action in problem.Actions(state):
 plan := AndSearch(problem.Results(state, action), problem, [state] ++ path)
 if plan ≠ failure then return [action] ++ plan
 return failure

 function AndSearch(states, problem, path):
 for each \(s_i\) in states:
 \(plan_i\) := OrSearch(\(s_i\), problem, path)
 if \(plan_i\) = failure then return failure
 return [
if
\(s_1\) then
\(plan_1\) else if
\(s_2\) then
\(plan_2\) else
… if
\(s_n\) then
\(plan_n\)]
While loops in contingency plans
 If the search graph contains cycles, ifthenelse is not enough in a contingency plan:
 we need while loops instead

 In the slippery vacuum world above, the cleaner don’t always move when told:
 the solution is a subgraph (not a subtree), shown in bold above
 this solution translates to [Suck,
while
State=5 do
Right, Suck]
Partial observations (R&N 4.4)

 Belief states: goal test, transitions, …
 Sensorless (conformant) problems
 Partially observable problems
Observability vs determinism
 A problem is nondeterministic if there are several possible outcomes of an action
 deterministic — nondeterministic (chance)
 It is partially observable if the agent cannot tell exactly which state it is in
 fully observable (perfect info.) — partially observable (imperfect info.)
 A problem can be either nondeterministic, or partially observable, or both:
Belief states
 Instead of searching in a graph of states, we use belief states
 A belief state is a set of states
 In a sensorless (or conformant) problem, the agent has no information at all
 The initial belief state is the set of all problem states
 e.g., for the vacuum world the initial state is {1,2,3,4,5,6,7,8}
 The goal test has to check that all members in the belief state is a goal
 e.g., for the vacuum world, the following are goal states: {7}, {8}, and {7,8}
 The result of performing an action is the union of all possible results
 i.e., \(\textsf{Predict}(b,a) = \{\textsf{Result}(s,a)\) for each \(s\in b\}\)
 if the problem is also nondeterministic:
 \(\textsf{Predict}(b,a) = \bigcup\{\textsf{Results}(s,a)\) for each \(s\in b\}\)
Predicting belief states in the vacuum world

(a) Predicting the next belief state for the sensorless vacuum world
with a deterministic action, Right.

(b) Prediction for the same belief state and action in the nondeterministic
slippery version of the sensorless vacuum world.
The deterministic sensorless vacuum world
Partial observations: state transitions
 With partial observations, we can think of belief state transitions in three stages:
 Prediction, the same as for sensorless problems:
 \(b’ = \textsf{Predict}(b,a) = \{\textsf{Result}(s,a)\) for each \(s\in b\}\)
 Observation prediction, determines the percepts that can be observed:
 \(\textsf{PossiblePercepts}(b’) = \{\textsf{Percept}(s)\) for each \(s\in b’\}\)
 Update, filters the predicted states according to the percepts:
 \(\textsf{Update}(b’,o) = \{s\) for each \(s\in b’\) such that \(o = \textsf{Percept}(s)\}\)

 Belief state transitions:
 \(\textsf{Results}(b,a) = \{\textsf{Update}(b’,o)\) for each \(o\in\textsf{PossiblePercepts}(b’)\}\)
where \(b’ = \textsf{Predict}(b,a)\)
Transitions in partially observable vacuum worlds

 The percepts return the current position and the dirtyness of that square.

 (a) The deterministic world:
Right always succeeds.

 (b) The slippery world:
Right sometimes fails.
Example: Robot Localisation

The percepts return if there is a wall in each of the directions.

(a) Possible initial positions of the robot, after one observation.

(b) After moving right and a new observation, there is only one possible position left.
Adversarial search
Types of games (R&N 5.1)
Minimax search (R&N 5.2–5.3)
Imperfect decisions (R&N 5.4–5.4.2)
Stochastic games (R&N 5.5)
Types of games (R&N 5.1)

 cooperative, competetive, zerosum games
 game trees, ply/plies, utility functions
Types of games (again)

Perfect information games are solvable in a manner similar to
fully observable singleagent systems, e.g., using forward search.

If two agents are competing so that a positive reward for one is a negative reward
for the other agent, we have a twoagent zerosum game.

The value of a game zerosum game can be characterized by a single number that one agent is trying to maximize and the other agent is trying to minimize.

This leads to a minimax strategy:
 A node is either a MAX node (if it is controlled by the maximising agent),
 or is a MIN node (if it is controlled by the minimising agent).
Minimax search (R&N 5.2–5.3)

 Minimax algorithm
 αβ pruning
Minimax search for zerosum games
 Given two players called MAX and MIN:
 MAX wants to maximize the utility value,
 MIN wants to minimize the same value.
 \(\Rightarrow\) MAX should choose the alternative that maximizes assuming that MIN minimizes.

 Minimax gives perfect play for deterministic, perfectinformation games:
 function Minimax(state):
 if TerminalTest(state) then return Utility(state)
 A := Actions(state)
 if state is a MAX node then return \(\max_{a\in A}\) Minimax(Result(state, a))
 if state is a MIN node then return \(\min_{a\in A}\) Minimax(Result(state, a))
Minimax search: tictactoe
Minimax example
The Minimax algorithm gives perfect play for deterministic, perfectinformation games.
Can Minimax be wrong?
 Minimax gives perfect play, but is that always the best strategy?
 Perfect play assumes that the opponent is also a perfect player!
3player minimax
Minimax can also be used on multiplayer games
\(\alpha{}\beta\) pruning
Minimax(root) 
= 
\( \max(\min(3,12,8), \min(2,x,y), \min(14,5,2)) \) 

= 
\( \max(3, \min(2,x,y), 2) \) 

= 
\( \max(3, z, 2) \) where \(z\leq 2\) 

= 
\( 3 \) 
 I.e., we don’t need to know the values of \(x\) and \(y\)!
\(\alpha{}\beta\) pruning, general idea

 The general idea of αβ pruning is this:
 • if \(m\) is better than \(n\) for Player,
 we don’t want to pursue \(n\)
 • so, once we know enough about \(n\) we can prune it
 • sometimes it’s enough to examine just one
 of \(n\)’s descendants



 αβ pruning keeps track of the possible range
of values for every node it visits;
the parent range is updated when the child has been visited.
The \(\alpha{}\beta\) algorithm
 function AlphaBetaSearch(state):
 v := MaxValue(state, \(\infty\), \(+\infty\)))
 return the action in Actions(state) that has value v

 function MaxValue(state, α, β):
 if TerminalTest(state) then return Utility(state)
 v := \(\infty\)
 for each action in Actions(state):
 v := max(v, MinValue(Result(state, action), α, β))
 if v ≥ β then return v
 α := max(α, v)
 return v

 function MinValue(state, α, β):
 same as MaxValue but reverse the roles of α/β and min/max and \(\infty/{+}\infty\)
How efficient is \(\alpha{}\beta\) pruning?
Imperfect decisions (R&N 5.4–5.4.2)
Stochastic games (R&N 5.5)
Note: these two sections were presented Tuesday 25th April!