28

Introduction to Fictitious Play

 4 years ago
source link: https://towardsdatascience.com/introduction-to-fictitious-play-12a8bc4ed1bb?gi=f8514c095b00
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

u2Mveqz.jpg!web

Photo by Mpho Mojapelo on Unsplash

Fictitious play is a game theory concept. It consists of analyzing the game to figure out what is the best strategy to adopt when facing an opponent in a zero sum game.

This is usually a heavy subject so we will start by some important definitions, then we will explain the Fictitious play algorithm.

Zero Sum Game

A zero sum game is a game where the points gained by one player is the loss of the other(s). In such a way that the sum of all points attributed to the players is equal to zero.

Game Value

The Game Value (V) is the number of points, money, credits, etc… that a player can expect to win (or lose) on the average, after playing a sufficiently large number of times. If V is positive we consider it in favor of the player I (so player II must pay), if V is negative we consider it favor of Payer II (so player I must pay).

Nash Equilibrium

Nash equilibrium is a state where no player has any interest in changing his strategy because any change will be countered by others. Nash Equilibrium does not mean optimal equilibrium, there could be a strategy for one or more players that is more favorable to them, but the reason they can’t adopt it is because opponents (assumed smart enough) will counter them and the end result will become unfavorable. You can think of it as a deadlock, but it also can be beneficial for all.

A simple example is to imagine 2 robbers splitting their robbery in half. If one of them denounce the other to the police he can get the full spoils, but he has no interest in doing so because the other one will denounce him as well and they will end up both in jail. So splitting in half is the best solution for both of them.

Why would we seek Nash Equilibrium in AI ?

Theoretically, Nash Equilibrium will guarantee, on average, no loss. Meaning over a considerable number of games, on average the AI will draw or will win.

However, in practice, this is more optimistic. When playing against humans there is a high probability that the human player will make an error at one point, which the AI will exploit to win.

Another important question is why the AI instead of seeking Nash Equilibrium does not study the human strategy and exploit it to win.

The risk of this approach is that humans can learn to trick the AI, by giving it the impression that they are using some strategy, then switch to another.

For example suppose that in the game Rock Scissor Paper game, the human gives 3 Scissor in a row, which lead the AI to assume that this is the strategy of the human. In the next move, AI will counter by a Rock, but the human (who laid the trap) will use Paper.

So the best strategy in this game is to stick to the Nash Equilibrium and use stochastic strategy (randomly choosing the item).

Fictitious play

Fictitious play is a method defined by George W. Brown in 1951, it consists of zero sum game and each player plays the best response to his opponent strategy. The aim of the method is to find the Game Value in an iterative way.

As usual iterative methods are easier to compute than analytical methods when the problem gets complicated.

Fictitious method is proven to converge to the theoretical game value (V). It is also proven that the Fictitious Play converges to Nash equilibrium in two-player zero-sum game .

Playing a Fictitious Game

Consider the following Odd or Even game:

Two players I and II each can draw either numbers “1” or “2”, if the sum of the drawn numbers are even player I pays player II the sum, represented in the above matrix by (-2 and -4), if the sum is odd, then player II pays player I that sum represented by (+3 and +3).

RnuQZzU.png!web

This problem can be solved analytically, and the advantage goes to player I who on the average will win 1/12 if he plays “1” with probability 7/12, and “2” with probability 5/12 (details of this method are not important here).

The image below details how the iterations unfold:

2U7fQvU.png!web

Iteration #1 Player I counters a move of player II who chooses “1”, the aim of Player I is to maximize his payoff so he chooses “2” which has a value of max(-2, 3) = 3. The chosen value is marked by cyan color.

On Iteration #2, Player II has to counter Player I who chose “2”, his aim is to minimise Player I gain, so he chooses “2” which results in min(3, -4) = -4. The chosen value is marked by yellow.

Iteration #3, Player I counters Player II by choosing to draw “1” or “2”, the values will be taken from the second column (remember that player II chose “2” in the previous iteration).The values of this column will be added to the expected values of player I meaning (-2 + 3 =1; 3 -4 = -1). So from these expected values, Player I must choose the best for him which +1, so he draws the corresponding number “1” (PS “1” is not number to draw while +1 is the payoff).

Iteration #4, Player II will have to choose from the first row (because Player I played “1”). So he adds those values to the ones he already has (3–2=1; -4+3=-1) so he draws “2”, and so on…

Doing this enough number of times will result in value too close to the game value.

Here are the explicit steps:

  1. Select a column and write it at the right of the grid.
  2. Select the maximum value and write its ROW at the bottom.
  3. Select the minimum value of the ROW, ADD its column to the values on the right, write the SUM.
  4. Select the maximum value of the COLUMN, ADD its row to the values at the bottom, write the SUM.
  5. Repeat steps 3 and 4 as long as needed.
  6. Compute lower (L) and upper (U) limits by taking the last selected values and divide them by number of iterations. This will give range for the Game Value (V) such as L ≤ V ≤ U

The following code will help you figure out how the algorithm works:

Javascript implementation of Odd/Even game Fictitious Play. It can be easily tested using any online JS editor

Running the above code with MAX_ITER = 1000 and MAX_ITER = 10000 yields the results below:

e2aEVr2.png!web

Remember that the theoretical Game Value is 1/12 = 0.083333… which clearly is within the lower and upper limits of the iterative method.

Conclusion

This article explained in a simple way the Fictitious Play algorithm, but does not mention anything related to Deep Learning or Neural Networks which will be the subject of future articles.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK