Minimax
A game can be thought of as a tree of possible future game states. The current state of the game is the root of the tree (drawn at the top). In general this node has several children, representing all of the possible moves that we could make. Each of those nodes has children representing the game state after each of the opponent’s moves. These nodes have children corresponding to the possible second moves of the current player, and so on. The leaves of the tree are final states of the game: states where no further move can be made because one player has won, or perhaps the game is a tie.
Search
Suppose that we assign a value of positive infinity to a leaf state in which we win, negative infinity to states in which the opponent wins, and zero to tie states.
Usually expanding the entire game tree is infeasible because there are so many possible states. The solution is to only search the tree to a specified depth. Define and evaluate
function (the static evaluator) returns a value between
If we can traverse the entire game tree, we can figure out whether the game is a win for the current player assuming perfect play: we assign a value to the current game state by we recursively walking the tree. At leaf nodes we return the appropriate values (either declare win evaluate
). At nodes where we get to move, we take the max of the child values because we want to pick the best move; at nodes where the opponent moves we take the min of child values because we assume perfect play on the part of the opponent and they’ll pick the smallest number. This gives us the following schematic evalution of a game tree
and the following pseudo-code procedure for minimax evaluation of a game tree (remember the player is trying to drive the score positive and the opponent is trying to drive the score negative):
def minimax(n: node, depth: int, max_depth: int) -> int:
if leaf(n) or depth == max_depth: return evaluate(n)
if is_max_node(n):
value = # initialization for max
for each child of n:
value = max(value, minimax(child, depth+1, max_depth))
elif is_min_node(n):
value = # initialization for min
for each child of n:
value = min(value, minimax(child, depth+1, max_depth))
return value
Alpha-beta pruning
The full minimax search explores some parts of the tree it doesn’t have to: for example in this game tree
Keep in mind that the opponent chooses minimum amongst children and player choose maximum amongst children.
In the right most subtree it’s known that so far (as far as the subtree has been explored) the best that the opponent can do is a 5
. As the player (not the opponent) is evaluating moves/game-outcomes (below gray square) they discover they can make a move that’s worth 8
. It’s at this point the player can stop exploring because the opponents choice has already been made, seeing as has how the opponent is trying to minimize amongst children (therefore will never pick the 8
over the 5
) and seeing as how the player themselves is trying to maximize amongst the children (therefore will pick at least that 8
).
Therefore the heuristic to prune by calls for keeping track of
That leads to this pseudo-code for minimax with alpha-beta pruning:
def minimax_ab(n: node, depth: int, max_depth: int, : int, : int) -> int:
if leaf(n) or depth == max_depth: return evaluate(n)
if is_max_node(n):
value = # initialization for max
for each child of n:
value = max(value, minimax_ab(child, depth+1, max_depth, , ))
if value : break # will never get picked by opponent
= max( , value)
elif is_min_node(n):
value = # initialization for min
for each child of n:
value = min(value, minimax_ab(child, depth+1, max_depth, , ))
if value: break # will never get picked player
= min( , value)
return value
Alternative (slicker) implementation
def minimax_ab(node, depth, max_player, , ):
if depth == 0: return evaluate(node)
if max_player:
value =
for each child of node:
value = max(value, minimax_ab(child, depth-1, max_player, value, ))
if value : break
else:
value =
for each child of node:
value = min(value, minimax_ab(child, depth-1, max_player, , value))
if value: break
return value
Max-min inequality
Imagine a continuous zero-sum game with two players: if the first player chooses
Suppose that player 1 makes the first choice
which results in the payoff
from player 1 to player 2. Now suppose the order of play is reversed: player 2 must choose
The max-min inequality states the (intuitively obvious) fact that it is better for a player to go second, or more precisely, for a player to know their opponent’s choice before choosing. In other words, the payoff to player 2 will be larger if player one must choose first.
Minimax theorem
Let
Then we have that
i.e. neither player has any advantage in terms of going first or second.