# Chapters 4–5: Non-classical and adversarial search

DIT411/TIN175, Artificial Intelligence

Peter Ljunglöf

2 February, 2018

# Repetition

## Uninformed search (R&N 3.4)

• Search problems, graphs, states, arcs, goal test, generic search algorithm, tree search, graph search, depth-first search, breadth-first search, uniform cost search, iterative deepending, bidirectional search, …

## Heuristic search (R&N 3.5–3.6)

• Greedy best-first search, A* search, heuristics, admissibility, consistency, dominating heuristics, …

## Local search (R&N 4.1)

• Hill climbing / gradient descent, random moves, random restarts, beam search, simulated annealing, …

# Non-classical search

## Partial observations (R&N 4.4)

### Nondeterministic search (R&N 4.3)

• Contingency plan / strategy
• And-or search trees (not in the written exam)

### An erratic vacuum cleaner

• The eight possible states of the vacuum world; states 7 and 8 are goal states.

• There are three actions: Left, Right, Suck.
Assume that the Suck action works as follows:

• if the square is dirty, it is cleaned but sometimes also the adjacent square is
• if the square is clean, the vacuum cleaner sometimes deposists dirt

### Nondeterministic outcomes, contingency plans

• Assume that the Suck action is nondeterministic:
• if the square is dirty, it is cleaned but sometimes also the adjacent square is
• if the square is clean, the vacuum cleaner sometimes deposists dirt
•
• Now we need a more general result function:
• instead of returning a single state, it returns a set of possible outcome states
• e.g., $$\textsf{Results}(\textsf{Suck}, 1) = \{5, 7\}$$ and  $$\textsf{Results}(\textsf{Suck}, 5) = \{1, 5\}$$
•
• We also need to generalise the notion of a solution:
• instead of a single sequence (path) from the start to the goal,
we need a strategy (or a contingency plan)
• i.e., we need if-then-else constructs
• this is a possible solution from state 1:
• [Suck, if State=5 then [Right, Suck] else []]

### How to find contingency plans

(will not be in the written examination)

• We need a new kind of nodes in the search tree:
• and nodes:
these are used whenever an action is nondeterministic
• normal nodes are called or nodes:
they are used when we have several possible actions in a state
•
• A solution for an and-or search problem is a subtree that:
• has a goal node at every leaf
• specifies exactly one action at each of its or node
• includes every branch at each of its and node

### A solution to the erratic vacuum cleaner

(will not be in the written examination)

The solution subtree is shown in bold, and corresponds to the plan:
[Suck, if State=5 then [Right, Suck] else []]

### An algorithm for finding a contingency plan

(will not be in the written examination)

This algorithm does a depth-first search in the and-or tree,
so it is not guaranteed to find the best or shortest plan:

• function AndOrGraphSearch(problem):
• return OrSearch(problem.InitialState, problem, [])
•
• function OrSearch(state, problem, path):
• if problem.GoalTest(state) then return []
• if state is on path then return failure
• for each action in problem.Actions(state):
• plan := AndSearch(problem.Results(state, action), problem, [state] ++ path)
• if plan ≠ failure then return [action] ++ plan
• return failure
•
• function AndSearch(states, problem, path):
• for each $$s_i$$ in states:
• $$plan_i$$ := OrSearch($$s_i$$, problem, path)
• if $$plan_i$$ = failure then return failure
• return [if $$s_1$$ then $$plan_1$$ else if $$s_2$$ then $$plan_2$$ elseif $$s_n$$ then $$plan_n$$]

### While loops in contingency plans

(will not be in the written examination)

• If the search graph contains cycles, if-then-else is not enough in a contingency plan:
• we need while loops instead
•
• In the slippery vacuum world above, the cleaner don’t always move when told:
• the solution above translates to [Suck, while State=5 do Right, Suck]

## Partial observations (R&N 4.4)

• Belief states: goal test, transitions, …
• Sensor-less (conformant) problems
• Partially observable problems

### Observability vs determinism

• A problem is nondeterministic if there are several possible outcomes of an action
• deterministic — nondeterministic (chance)
•
• It is partially observable if the agent cannot tell exactly which state it is in
• fully observable (perfect info.) — partially observable (imperfect info.)
•
• A problem can be either nondeterministic, or partially observable, or both:

### Belief states

• Instead of searching in a graph of states, we use belief states
• A belief state is a set of states
•
• In a sensor-less (or conformant) problem, the agent has no information at all
• The initial belief state is the set of all problem states
• e.g., for the vacuum world the initial state is {1,2,3,4,5,6,7,8}
•
• The goal test has to check that all members in the belief state is a goal
• e.g., for the vacuum world, the following are goal states: {7}, {8}, and {7,8}
•
• The result of performing an action is the union of all possible results
• i.e., $$\textsf{Predict}(b,a) = \{\textsf{Result}(s,a)$$ for each $$s\in b\}$$
• if the problem is also nondeterministic:
• $$\textsf{Predict}(b,a) = \bigcup\{\textsf{Results}(s,a)$$ for each $$s\in b\}$$

### Predicting belief states in the vacuum world

• (a) Predicting the next belief state for the sensorless vacuum world
with a deterministic action, Right.

• (b) Prediction for the same belief state and action in the nondeterministic
slippery version of the sensorless vacuum world.

### Partial observations: state transitions

• With partial observations, we can think of belief state transitions in three stages:
• Prediction, the same as for sensorless problems:
• $$b’ = \textsf{Predict}(b,a) = \{\textsf{Result}(s,a)$$ for each $$s\in b\}$$
• Observation prediction, determines the percepts that can be observed:
• $$\textsf{PossiblePercepts}(b’) = \{\textsf{Percept}(s)$$ for each $$s\in b’\}$$
• Update, filters the predicted states according to the percepts:
• $$\textsf{Update}(b’,o) = \{s$$ for each $$s\in b’$$ such that $$o = \textsf{Percept}(s)\}$$
•
• Belief state transitions:
• $$\textsf{Results}(b,a) = \{\textsf{Update}(b’,o)$$ for each $$o\in\textsf{PossiblePercepts}(b’)\}$$
where   $$b’ = \textsf{Predict}(b,a)$$

### Transitions in partially observable vacuum worlds

• The percepts return the current position and the dirtyness of that square.
•
•
• The deterministic world:
Right always succeeds.
•
•
•
•
•
•
• The slippery world:
Right sometimes fails.

### Example: Robot Localisation

The percepts return whether there is a wall in each of the directions.

• Possible initial positions of the robot, after E1 = North, South, West.

• After moving right and observing E2 = North, South,
there’s only one possible position left.

## Types of games (R&N 5.1)

• cooperative, competetive, zero-sum games
• game trees, ply/plies, utility functions

### Multiple agents

• Let’s consider problems with multiple agents, where:

• the agents select actions autonomously

• each agent has its own information state
• they can have different information (even conflicting)
• the outcome depends on the actions of all agents

• each agent has its own utility function (that depends on the total outcome)

### Types of agents

• There are two extremes of multiagent systems:

• Cooperative: The agents share the same utility function
• Example: Automatic trucks in a warehouse
• Competetive: When one agent wins all other agents lose
• A common special case is when $$\sum_{a}u_{a}(o)=0$$ for any outcome $$o$$.
This is called a zero-sum game.
• Example: Most board games
• Many multiagent systems are between these two extremes.

• Example: Long-distance bike races are usually both cooperative
(bikers form clusters where they take turns in leading a group),
and competetive (only one of them can win in the end).

### Games as search problems

• The main difference to chapters 3–4:
now we have more than one agent that have different goals.

• All possible game sequences are represented in a game tree.

• The nodes are states of the game, e.g. board positions in chess.

• Initial state (root) and terminal nodes (leaves).

• States are connected if there is a legal move/ply.
(a ply is a move by one player, i.e., one layer in the game tree)

• Utility function (payoff function). Terminal nodes have utility values
$${+}x$$ (player 1 wins), $${-}x$$ (player 2 wins) and $$0$$ (draw).

### Perfect information games: Zero-sum games

• Perfect information games are solvable in a manner similar to
fully observable single-agent systems, e.g., using forward search.

• If two agents compete, so that a positive reward for one is a negative reward
for the other agent, we have a two-agent zero-sum game.

• The value of a game zero-sum game can be characterised by a single number that one agent is trying to maximise and the other agent is trying to minimise.

• This leads to a minimax strategy:

• A node is either a MAX node (if it is controlled by the maximising agent),
• or is a MIN node (if it is controlled by the minimising agent).

## Minimax search (R&N 5.2–5.3)

• Minimax algorithm
• α-β pruning

### Minimax search for zero-sum games

• Given two players called MAX and MIN:
• MAX wants to maximise the utility value,
• MIN wants to minimise the same value.
• $$\Rightarrow$$ MAX should choose the alternative that maximises, assuming MIN minimises.
•
• Minimax gives perfect play for deterministic, perfect-information games:

• function Minimax(state):
• if TerminalTest(state) then return Utility(state)
• A := Actions(state)
• if state is a MAX node then return $$\max_{a\in A}$$ Minimax(Result(state, a))
• if state is a MIN node then return $$\min_{a\in A}$$ Minimax(Result(state, a))

### Minimax example

The Minimax algorithm gives perfect play for deterministic, perfect-information games.

### Can Minimax be wrong?

• Minimax gives perfect play, but is that always the best strategy?

• Perfect play assumes that the opponent is also a perfect player!

### 3-player minimax

(will not be in the written examination)

Minimax can also be used on multiplayer games

### $$\alpha{-}\beta\,$$ pruning

 Minimax(root) = $$\max(\min(3,12,8), \min(2,x,y), \min(14,5,2))$$ = $$\max(3, \min(2,x,y), 2)$$ = $$\max(3, z, 2)$$   where  $$z = \min(2,x,y) \leq 2$$ = $$3$$
• I.e., we don’t need to know the values of $$x$$ and $$y$$!

### $$\alpha{-}\beta\,$$ pruning, general idea

•
• The general idea of α-β pruning is this:
•   •  if $$m$$ is better than $$n$$ for Player,
•      we don’t want to pursue $$n$$
•   •  so, once we know enough about $$n$$
•      we can prune it
•   •  sometimes it’s enough to examine
•      just one of $$n$$’s descendants
•
•
•
• α-β pruning keeps track of the possible range of values for every node it visits;
the parent range is updated when the child has been visited.

### The $$\alpha{-}\beta$$ algorithm

• function AlphaBetaSearch(state):
• v := MaxValue(state, $$-\infty$$, $$+\infty$$))
• return the action in Actions(state) that has value v
•
• function MaxValue(state, α, β):
• if TerminalTest(state) then return Utility(state)
• v := $$-\infty$$
• for each action in Actions(state):
• v := max(v, MinValue(Result(state, action), α, β))
• if vβ then return v
• α := max(α, v)
• return v
•
• function MinValue(state, α, β):
• same as MaxValue but reverse the roles of α/β and min/max and $$-\infty/{+}\infty$$

### How efficient is $$\alpha{-}\beta$$ pruning?

• The amount of pruning provided by the α-β algorithm depends on the ordering of the children of each node.

• It works best if a highest-valued child of a MAX node is selected first and
if a lowest-valued child of a MIN node is selected first.

• In real games, much of the effort is made to optimise the search order.

• With a “perfect ordering”, the time complexity becomes $$O(b^{m/2})$$

• this doubles the solvable search depth
• however, $$35^{80/2}$$ (for chess) or $$250^{160/2}$$ (for go) is still quite large…

### Minimax and real games

• Most real games are too big to carry out minimax search, even with α-β pruning.

• For these games, instead of stopping at leaf nodes,
we have to use a cutoff test to decide when to stop.

• The value returned at the node where the algorithm stops
is an estimate of the value for this node.

• The function used to estimate the value is an evaluation function.

• Much work goes into finding good evaluation functions.

• There is a trade-off between the amount of computation required
to compute the evaluation function and the size of the search space
that can be explored in any given time.

## Imperfect decisions (R&N 5.4–5.4.2)

• H-minimax algorithm
• evaluation function, cutoff test
• features, weighted linear function
• quiescence search, horizon effect

### H-minimax algorithm

• The Heuristic Minimax algorithm is similar to normal Minimax
• it replaces TerminalTest with CutoffTest, and Utility with Eval
• the cutoff test needs to know the current search depth

• function H-Minimax(state, depth):
• if CutoffTest(state, depth) then return Eval(state)
• A := Actions(state)
• if state is a MAX node then return $$\max_{a\in A}$$ H-Minimax(Result(state, a), depth+1)
• if state is a MIN node then return $$\min_{a\in A}$$ H-Minimax(Result(state, a), depth+1)

### Weighted linear evaluation functions

• A very common evaluation function is to use a weighted sum of features: $Eval(s) = w_1 f_1(s) + w_2 f_2(s) + \cdots + w_n f_n(s) = \sum_{i=1}^{n} w_i f_i(s)$

• This relies on a strong assumption: all features are independent of each other
• which is usually not true, so the best programs for chess
(and other games) also use nonlinear feature combinations
•
• The weights can be calculated using machine learning algorithms,
but a human still has to come up with the features.
• using recent advances in deep machine learning,
the computer can learn the features too

### Evaluation functions

A naive weighted sum of features will not see the difference between these two states.

### Problems with cutoff tests

• Too simplistic cutoff tests and evaluation functions can be problematic:
• e.g., if the cutoff is only based on the current depth
• then it might cut off the search in unfortunate positions
(such as (b) on the previous slide)
•
• We want more sophisticated cutoff tests:
• only cut off search in quiescent positions
• i.e., in positions that are “stable”, unlikely to exhibit wild swings in value
• non-quiescent positions should be expanded further
•
• Another problem is the horizon effect:
• if a bad position is unavoidable (e.g., loss of a piece), but the system can
delay it from happening, it might push the bad position “over the horizon”
• in the end, the resulting delayed position might be even worse

### Deterministic games in practice

• Chess:
• IBM DeepBlue beats world champion Garry Kasparov, 1997.
• Google AlphaZero beats best chess program Stockfish, December 2017.
•
• Checkers/Othello/Reversi:
• Logistello beats the world champion in Othello/Reversi, 1997.
• Chinook plays checkers perfectly, 2007. It uses an endgame database
defining perfect play for all 8-piece positions on the board,
(a total of 443,748,401,247 positions).
•
• Go:
• First Go programs to reach low dan-levels, 2009.
• Google AlphaGo beats the world’s best Go player, Ke Jie, May 2017.
• Google AlphaZero beats AlphaGo, December 2017.
• AlphaZero learns board game strategies by playing itself, it does not use a database of previous matches, opening books or endgame tables.

## Stochastic games (R&N 5.5)

Note: this section will be presented Tuesday 6th February!