Artificial Intelligence Unit Iii: Adversarial Search and Games
Artificial Intelligence Unit Iii: Adversarial Search and Games
UNIT III
Adversarial Search and
Games
Prepared By
Mrs. Jayshri Dhere
Assistant Professor
Artificial Intelligence and Data Science
DYPIEMR, Pune
Syllabus
• Game Theory,
• Optimal Decisions in Games,
• Heuristic Alpha–Beta Tree Search,
• Monte Carlo Tree Search,
• Stochastic Games,
• Partially Observable Games,
• Limitations of Game Search Algorithms,
• Constraint Satisfaction Problems (CSP),
• Constraint Propagation: Inference in CSPs,
Backtracking Search for CSPs.
2
History of Game Theory
• Game theory has a rich history in artificial
intelligence (AI) and intersects deeply with various
aspects of decision-making, strategy, and
interaction.
3.Number of activities :
This may be finite or infinite.
Number of alternatives available to each person in a
particular activity may also be finite or infinite.
Finite game has finite number of activities each
involving a finite number of alternatives otherwise the
game is said to be infinite.
4. Information to the players about the past
activities of the Other player is completely
available partly available or not available at all.
5.Pay off:
A quantitative measure of satisfaction a person
gets(rewards) at the end of each play is called a
pay of it is a real valued function of variables in
the game.
• Perfect information: A game with the perfect
information is that in which agents can look into the
complete board. Agents have all the information
about the game, and they can see each other moves
also. Examples are Chess, Checkers, Go, etc.
• Imperfect information: If in a game agents do not
have all information about the game and not aware
with what's going on, no complete search, search is
complete earlier, such type of games are called the
game with imperfect information, such as
Battleship, blind, Bridge, etc.
9
Adversarial Search
• Adversarial search is a search, where we examine the
problem which arises when we try to plan ahead of the
world and other agents are planning against us.
14
Search Under Adversarial
Circumstances
• Game-playing programs developed by AI
researchers since the beginning of the
modern AI era (chess, checkers in 1950s)
• Game Search
• Sequences of player’s decisions we control
• Decision of other player(s) we do not control
15
Types of games
• Deterministic games:
• Non-deterministic games:
Non-deterministic are those games which have
various unpredictable events and has a factor of
chance or luck. This factor of chance or luck is
introduced by either dice or cards.
These are random, and each action response is
not fixed. Such games are also called as
stochastic games.
• Example: Backgammon, Monopoly, Poker, etc.
Types of games
Type Exact Information Inexact (approximate)
Information
Assembly sequencing.
Chess programming
• How to find the max-min move?
• Evaluate all possible scenarios
• For chess, number of such possibilities is enormous
– Beyond the reach of computers
• How to even systematically track all such moves?
– Game tree
• How to evaluate a move?
– Are two pawns better than a knight?
• Heuristics
– Approximate but reasonable answers
– Too much deep analysis may lead to defeat
– Chess game comes under deterministic and exact (perfect)
information category.
– This game is a two person, zero-sum game.
20
Game Setup
• Consider games with two players, whom we call MAX and MIN
• MAX moves first, and then they take turns moving until the game is over.
• At the end of the game,
• points are awarded to the winning player and
• penalties are given to the loser.
• A game can be formally defined as a kind of search problem with the following
elements:
• S0: The initial state, which specifies how the game is set up at the start.
• PLAYER(s): Defines which player has the move in a state.
• ACTIONS(s): Returns the set of legal moves in a state.
• RESULT(s, a): The transition model, which defines the result of a move.
• TERMINAL-TEST(s): A terminal test,
• which is true when the game is over and
• false otherwise.
• States where the game has ended are called terminal states.
• UTILITY(s, p): A utility function defines the final numeric value for a game
that ends in terminal state s for a player p.
21
• In chess,
Game Setup
• the outcome is a win, loss, or draw, with values +1, 0, or ½
– Chess is zero-sum (all-or-nothing, win-or-lose) because every game has
payoff of either 0 + 1, 1 + 0 or 1/2+ 1/2
• The initial state, ACTIONS function, and RESULT function define the game
tree for the game
• a tree where
• the nodes are game states and
• the edges are moves.
• Figure shows part of the game tree for tic-tac-toe
• From the initial state, MAX has nine possible moves.
• Play alternates between MAX’s placing an X and MIN’s placing an O until we
reach leaf nodes corresponding to terminal states such that one player has
three in a row or all the squares are filled.
• The number on each leaf node indicates
• the utility value of the terminal state from the point of view of MAX;
• high values are assumed to be good for MAX and bad for MIN 22
Game Setup
23
Basic definitions
1. competitive games :
A competitive situation is called a competitive
game if it has the following four properties:
a. There are finite number of n of competitor
called players.
b. Each payer has a list of finite number of
possible activities.
c. Play is said to occur when each player chooses
one of his activities.
d. Every combination of activity determines and
outcome.
2. Zero sum and non zero Sum game
• Competitive games are classified according to
the number of players involved i.e. as a two
person game 3 person game etc.
• If a player make payments only to each other
that is the loss of one is the gain of others and
nothing comes from the outside the
competitive game is said to be zero sum.
• A game which is not zero sum is called a
non-Zero sum game.
3. Strategy
• A strategy for a given player is a set of rules that specifies
which of the available course of action he should make at each
play. This strategy maybe of two kinds.:
a. Pure strategy
– If a player knows exactly what the other player is going to do a
deterministic situation is obtained and objective function is to
maximize the gain.
– Therefore the pure strategy is a decision rule always to select a
particular course of action.
b. Mixed strategy
If a player is guessing as to which activity is to be selected by the other
on any particular occasion a probabilistic situation is obtained and
objective function is to maximize the expected gain.
This the mixed strategy is a selection among pure strategies with fixed
probabilities.
Two person or zero sum rectangular
games-
• A game with only two player A and player B is
called a two person zero sum game, if the
losses of one player is equivalent to the gain of
other so that the sum of their net gains is
zero.
• Two person zero sum games are also called a
rectangular games as this are usually
represented by a pay of matrix in rectangular
form.
Pay off matrix
• Suppose the player A has m activities and the player be has n
activities.
• Then a pay off matrix can be formed by adopting the following
rules.
• Row designations for each Matrix are activities available to player
A.
• Column designations for each Matrix are activities available to
player B.
• Cell entry vij is the payment to player A in A's pay off matrix when
A chooses activity i and B chooses activities j.
• With a zero sum two person game the cell entry in player B's pay
of matrix will be negative of the corresponding cell entry in Vij in
the player A's pay off matrix so that some of pay off matrices for
player A's and player B is the ultimately zero.
Optimal Decisions In Games
• In a normal search problem
• The optimal solution would be a sequence of actions leading to a goal
state ie. a terminal state that is a win
• In adversarial search
• MIN has something to say about it.
• MAX therefore must find a contingent strategy, which specifies MAX’s
move in the initial state,
• then MAX’s moves in the states resulting from every possible response
by MIN,
• then MAX’s moves in the states resulting from every possible response
by MIN to those moves, and so on.
• This is exactly analogous to the AND–OR search algorithm with
• MAX playing the role of OR and
• MIN equivalent to AND.
• Roughly speaking, an optimal strategy leads to outcomes at least as
good as any other strategy when one is playing an infallible opponent.
• We begin by showing how to find this optimal strategy.
31
Optimal Decisions In Games
• For game tic-tac-toe
• possible moves for MAX at
the root node are
labeled a1, a2, and a3.
• The possible replies to a1
for MIN are b1, b2, b3, and
so on.
• This particular game
ends after one move
each by MAX and MIN.
• The utilities of the
terminal states in this
game range from 2 to 14.
32
Optimal Decisions In Games
• Given a game tree, the optimal strategy can
be determined from the minimax value of
each node, which we write as MINIMAX(n).
33
Optimal Decisions In Games
• The terminal nodes on the bottom level get their utility
values from the game’s UTILITY function.
• The first MIN node, labeled B, has three successor
states with values 3, 12, and 8, so its minimax value is
3.
• Similarly, the other two MIN nodes have minimax value
2.
• The root node is a MAX node; its successor states have
minimax values 3, 2, and 2; so it has a minimax value of
3.
• We can also identify the minimax decision at the root:
action a1 is the optimal choice for MAX because it
leads to the state with the highest minimax value.
34
Minimax Algorithm
• The minimax algorithm computes the
minimax decision from the current state.
35
• Backtracking Algorithm
• Best Move Strategy
• Max will try to maximize its utility (Best Move)
• Min will try to minimize its utility (Worst
Move)
36
37
38
39
40
41
Minimax Algorithm
42
Minimax Algorithm
• For example, in Fig.,
• the algorithm first
recurses down to the
three bottom left nodes
and uses the UTILITY
function on them to
discover that their values
are 3, 12, and 8,
respectively.
• Then it takes the
minimum of these values,
3, and returns it as the
backed up value of node
B.
• A similar process gives the
backed-up values of 2 for
C and 2 for D.
• Finally, we take the
maximum of 3, 2, and 2 to
get the backed-up value
of 3 for the root node.
43
Minimax Algorithm
• The minimax algorithm performs a complete depth-first
exploration of the game tree.
• If the maximum depth of the tree is m
• b legal moves at each point,
• Complete- Min-Max algorithm is Complete.
It will definitely find a solution (if exist), in the finite search
tree.
• Optimal- Min-Max algorithm is optimal if both opponents
are playing optimally.
• Time complexity- As it performs DFS for the game-tree, so
the time complexity of Min-Max algorithm is O(bm), where b
is branching factor of the game-tree, and m is the maximum
depth of the tree.
• Space Complexity- Space complexity of Mini-max algorithm
44
is also similar to DFS which is O(bm).
Limitation of the minimax Algorithm:
• The main drawback of the minimax algorithm is
that it gets really slow for complex games such as
Chess.
• This type of games has a huge branching factor,
and the player has lots of choices to decide.
• This limitation of the minimax algorithm can be
improved from alpha-beta pruning.
45
Solve
46
Alpha–Beta Pruning
• The problem with minimax search is that the number of
game states it has to examine is exponential in the
depth of the tree.
• we can’t eliminate the exponent, but it turns out we
can effectively cut it in half.
• The trick is that it is possible to compute the correct
minimax decision without looking at every node in the
game tree.
• Idea of pruning to eliminate large parts of the tree from
consideration.
• This involves two threshold parameter Alpha and beta
for future expansion, so it is called alpha-beta pruning.
It is also called as Alpha-Beta Algorithm.
• When applied to a standard minimax tree,
• it returns the same move as minimax would, but prunes
away branches that cannot possibly influence the final
decision. 47
• Alpha-beta pruning can be applied at any depth of a tree,
and sometimes it not only prune the tree leaves but also
entire sub-tree.
48
Key points about alpha-beta pruning:
• The Max player will only update the value of alpha.
• The Min player will only update the value of beta.
• While backtracking the tree, the node values will be
passed to upper nodes instead of values of alpha and
beta.
• We will only pass the alpha, beta values to the child
nodes.
49
Working of Alpha-Beta Pruning:
• Step 1: At the first
step the, Max player
will start first move
from node A where
α= -∞ and β= +∞,
these value of alpha
and beta passed
down to node B
where again α= -∞
and β= +∞, and Node
B passes the same
value to its child D.
50
• Step 2: At Node D, the value
of α will be calculated as its
turn for Max. The value of α
is compared with firstly 2
and then 3, and the max (2,
3) = 3 will be the value of α
at node D and node value
will also 3.
• Step 3: Now algorithm
backtrack to node B, where
the value of β will change as
this is a turn of Min, Now β=
+∞, will compare with the
available subsequent nodes
value, i.e. min (∞, 3) = 3,
hence at node B now α= -∞,
and β= 3
51
• In the next step, algorithm
traverse the next successor
of Node B which is node E,
and the values of α= -∞, and
β= 3 will also be passed.
• Step 4: At node E, Max will
take its turn, and the value
of alpha will change. The
current value of alpha will
be compared with 5, so max
(-∞, 5) = 5, hence at node E
α= 5 and β= 3, where α>=β,
so the right successor of E
will be pruned, and
algorithm will not traverse
it, and the value at node E
will be 5. 52
• Step 5: At next step,
algorithm again
backtrack the tree, from
node B to node A. At
node A, the value of
alpha will be changed
the maximum available
value is 3 as max (-∞, 3)=
3, and β= +∞, these two
values now passes to
right successor of A
which is Node C.
• At node C, α=3 and β=
+∞, and the same values
will be passed on to
node F.
53
• Step 6: At node F, again
the value of α will be
compared with left child
which is 0, and max(3,0)=
3, and then compared
with right child which is 1,
and max(3,1)= 3 still α
remains 3, but the node
value of F will become 1.
54
Step 7: Node F returns the
node value 1 to node C, at C α=
3 and β= +∞, here the value of
beta will be changed, it will
compare with 1 so min (∞, 1) =
1. Now at C, α=3 and β= 1, and
again it satisfies the condition
α>=β, so the next child of C
which is G will be pruned, and
the algorithm will not compute
the entire sub-tree G.
55
Step 8: C now returns the
value of 1 to A here the
best value for A is max (3, 1)
= 3. Following is the final
game tree which is the
showing the nodes which
are computed and nodes
which has never computed.
Hence the optimal value for
the maximizer is 3 for this
example.
56
57
Move Ordering in Alpha-Beta pruning:
• Worst ordering: In some cases, alpha-beta pruning
algorithm does not prune any of the leaves of the tree, and
works exactly as minimax algorithm.
In this case, it also consumes more time because of
alpha-beta factors, such a move of pruning is called worst
ordering. In this case, the best move occurs on the right side
of the tree. The time complexity for such an order is O(bm).
60
Monte Carlo Tree Search (MCTS)
• Ability to effectively handle complex and strategic video
games with massive search areas, in which traditional
algorithms may additionally struggle due to the full-size
number of feasible actions or actions.
63
64
Selection: In this process,
the MCTS algorithm
traverses the current tree
from the root node using a
specific strategy.
The strategy uses an
evaluation function to
optimally select nodes with
the highest estimated
value.
MCTS uses the Upper
Confidence Bound (UCB)
formula applied to trees as
the strategy in the selection
process to traverse the
tree.
65
66
• Selecting👆| This process is used to select a node
on the tree that has the highest possibility of
winning.
• For Example — Consider the moves with winning
possibility 2/3, 0/1 & 1/2 after the first move 4/6,
the node 2/3 has the highest possibility of
winning.
• The node selected is searched from the current
state of the tree and selected node is located at
the end of the branch. Since the selected node
has the highest possibility of winning — that path
is also most likely to reach the solution faster than
other path in the tree.
67
• where;
Si = value of a node i
xi = empirical mean of a node i
C = a constant
t = total number of simulations
69
By Mrs. Priyadarshani Doke, AI&DS,
70
DYPIEMR, Pune
• Expanding — After selecting the right node,
Expanding is used to increase the options
further in the game by expanding the selected
node and creating many children nodes. We are
using only one children node in this case. These
children nodes are the future moves that can be
played in the game.
• The nodes that are not expanded further for the
time being are known are leaves.
71
• Simulation: In this process, a simulation is
performed by choosing moves or strategies
until a result or predefined state is achieved.
• The simulation is done for every children node
is followed by their individual rewards.
72
By Mrs. Priyadarshani Doke, AI&DS,
73
DYPIEMR, Pune
• Simulating|Exploring 🚀 Since nobody knows which node is
the best children/ leaf from the group. The move which will
perform best and lead to the correct answer down the tree.
• How do we find the best children which will lead us to the
correct solution?
• We use Reinforcement Learning to make random decisions in
the game further down from every children node. Then,
reward is given to every children node — by calculating how
close the output of their random decision was from the final
output that we need to win the game.
• For example: In the game of Tic-Tac-Toe. Does the random
decision to make cross(X) next to previous cross(X) in the
game results in three consecutive crosses(X-X-X) that are
needed to win the game? 74
• Backpropagation: After determining the value
of the newly added node, the remaining tree
must be updated. So, the backpropagation
process is performed, where it
backpropagates from the new node to the
root node. During the process, the number of
simulation stored in each node is
incremented. Also, if the new node’s
simulation results in a win, then the number
of wins is also incremented.
75
By Mrs. Priyadarshani Doke, AI&DS,
76
DYPIEMR, Pune
• Let’s say the simulation of the node gives
optimistic results for its future and gets a
positive score 1/1.
• Updating|Back-propagation — Due to the new
nodes and their positive or negative scores in
the environment. The total scores of their
parent nodes must be updated by going back
up the tree one-by-one.
• The new updated scores changes the state of
the tree and may also change new future node
of the selection process.
77
Advantages of Monte Carlo Tree
Search:
• MCTS is a simple algorithm to implement.
• Monte Carlo Tree Search is a heuristic algorithm.
MCTS can operate effectively without any
knowledge in the particular domain, apart from the
rules and end conditions, and can find its own
moves and learn from them by playing random play
outs.
• The MCTS can be saved in any intermediate state
and that state can be used in future use cases
whenever required.
• MCTS supports asymmetric expansion of the
search tree based on the circumstances in which it
is operating.
78
Disadvantages of Monte Carlo Tree
Search:
• As the tree growth becomes rapid after a few
iterations, it requires a huge amount of memory.
• There is a bit of a reliability issue with Monte Carlo
Tree Search.
• In certain scenarios, there might be a single branch or
path, that might lead to loss against the opposition
when implemented for those turn-based games.
This is mainly due to the vast amount of combinations
and each of the nodes might not be visited enough
number of times to understand its result or outcome in
the long run.
• MCTS algorithm needs a huge number of iterations to
be able to effectively decide the most efficient path.
So, there is a bit of a speed issue there. 79
Stochastic Games
80
81
The branches
leading from each
chance node
denote the
possible dice rolls;
each branch is
labeled with the
roll and its
probability.
82
Constraint satisfaction
problems (CSPs)
Constraint satisfaction problems (CSPs)
• Constraint is a logical relation among variables.
• Eg. The sum of three angles of a triangle is 180
degrees.
85
Constraint satisfaction
problems (CSPs)
Example: Map-Coloring
• Variables WA, NT, Q,
NSW, V, SA, T
• Domains Di = {red, green,
blue}
• Constraints: adjacent
regions must have
different colors
• e.g., WA ≠ NT, or
(WA,NT) in {(red, green),
(red, blue), (green, red),
(green, blue), (blue,
red), (blue, green)}
86
Constraint satisfaction
problems (CSPs)
• Example: Map-Coloring
• Solutions are complete
and consistent
assignments
• e.g.,
• WA = red,
• NT = green,
• Q = red,
• NSW = green,
• V = red,
• SA = blue,
• T = green
87
Constraint satisfaction problems
(CSPs)
• Constraint graph:
• The map-coloring
problem represented
as a constraint graph
• Constraint graph:
nodes are variables,
arcs are constraints
88
Varieties of CSPs
• Basis on various combinations of different types of variables and domains, we have
varieties of CSPs.
• Discrete variables:
finite domains:
» n variables, domain size d then O(d^n) complete assignments
• e.g., Boolean CSPs, incl. Boolean satisfiability (NP-complete) [SAT is the
problem of deciding (requires a yes/no answer) if there is an assignment
to the variables of a Boolean formula such that the formula is satisfied ]
• infinite domains:
• integers, strings, etc.
• e.g., job scheduling, variables are start/end days for each job
• need a constraint language, e.g., StartJob1 + 5 ≤ StartJob3
• Continuous variables:
– e.g., start/end times for Hubble Space Telescope observations[Scientists have
used Hubble to observe the most distant stars and galaxies as well as the
planets in our solar system. ]
• linear constraints solvable in polynomial time by Linear programming. 89
Constraint satisfaction problems
(CSPs)
• Varieties of constraints:
• Unary constraints involve a single variable,
• e.g., SA ≠ green
• Binary constraints involve pairs of variables,
• e.g., SA ≠ WA
• Higher-order constraints involve 3 or more
variables,
• e.g., cryptarithmetic column constraints
90
Constraint satisfaction problems (CSPs)
Crypto-arithmetic:
• It is a mathematical puzzle, where nos. represented by
letters or symbols.
• Goal is to find the digits such that mathematical equation is
verified.
• There is no specific algorithm to solve Crypto arithmetic, but
it’s a trial and error method backtracking strategy.
• Constraint:
1. Variables: can take values from 0-9.
2. No two variables should take the same value
3. Selected such a way that it should comply with
arithmetic properties.
• There should be a unique digit to be replaced with
a unique alphabet.
O 0
R 8
Y 2
2. B A S E
+ BALL
--------------
GAME
Real world Constraint satisfaction
problems (CSPs)
• Assignment Problem:
-Who teaches what class.
• Timetable problem:
-which class is offered when and where?
• Transportation scheduling
-Factory scheduling
Real-world Constraint Satisfaction
Problems (CSP):
• Scheduling: A fundamental CSP problem is how to efficiently
and effectively schedule resources like personnel, equipment,
and facilities. The constraints in this domain specify the
availability and capacity of each resource, whereas the
variables indicate the time slots or resources.
98
Backtracking search
• Variable assignments are
commutative, i.e.,
[ WA = red then NT = green ] same
as [ NT = green then WA = red ]
• Only need to consider
assignments to a single
variable at each node
• Depth-first search for CSPs
with single-variable
assignments is called
backtracking search
• We can solve n-queens for n
≈ 25
99
Backtracking example
100
Backtracking example
101
Backtracking example
102
Backtracking example
103
Improving backtracking efficiency
• General-purpose methods can give huge gains in
speed:
• Which variable should be assigned next?
Answer to this leads to two simple strategies to select next
variable for assignment, called Most constrained variable,
And Most constraining variable.
106
Least constraining value
• Given a variable to assign, choose the
least constraining value:
• the one that rules out the fewest values
in the remaining variables
107
Forward checking
• Idea:
• Keep track of remaining legal values for unassigned variables
• Terminate search when any variable has no legal values
108
Forward checking
• Idea:
• Keep track of remaining legal values for
unassigned variables
• Terminate search when any variable has no
legal values
109
Forward checking
• Idea:
• Keep track of remaining legal values for
unassigned variables
• Terminate search when any variable has no
legal values
110
Forward checking
• Idea:
• Keep track of remaining legal values for
unassigned variables
• Terminate search when any variable has no
legal values
111
Constraint propagation
• Forward checking propagates information from assigned to
unassigned variables, but doesn't provide early detection for all
failures:
114
Arc consistency
• Simplest form of propagation makes each
arc consistent
• X 🡪Y is consistent iff
115
Arc consistency
• Simplest form of propagation makes each arc
consistent
• X 🡪Y is consistent iff
for every value x of X there is some allowed y
116
Arc consistency
• Simplest form of propagation makes each arc consistent
• X 🡪Y is consistent iff
118
Thank You
119