## Game theory & the (not quite) Mexican standoff

Somebody on the internet asked about the game theory of a mexican standoff (he meant two people pointing guns at each other)… and why both players don’t just shoot immediately. I got a bit carried away with my response, which turned into this 4000 word mini-thesis.

================

If shooting first was guaranteed to kill the other person, and that is something that you want to do, then you would be right. But most gunshot victims don’t die… so let’s re-think the game with a hypothetical payoff matrix. Assume two players {Robert; Dermot} with two strategies {shoot; smile}.

We’ll make the simplifying assumption (for now) that both have the same preferences and skills. If both smile then they walk away from the conflict, go home and party. Their payoff is (5, 5). If they both shoot, then both get hurt (sad -8) but they both get to hurt the other person (happy +10)… so the net payoff is (2, 2). If one person shoots, then the shooter smites his enemy and gets a benefit of 10… well done that man. The poor schmuck who was shot gets hurt (sad -8) which is the worst of all worlds for him, and it leaves him a broken man. Hollow. Dejected. Just an empty hole where his soul once existed.

The normal form game matrix looks something like this, with Robert on the rows and Dermot as the columns.

…………..Shoot Smile

Shoot… (2, 2) (10, -8)

Smile… (-8, 10) (5, 5)

The outcome is a standard prisoners’ dilemma. No matter what the other person does, you have an incentive to shoot first. The Nash equilibrium answer is (shoot, shoot) with outcome (2, 2), just as Robert implied in his original post. Pack of uncivilized brutes.

**BUT…**

That assumes this is a one-off game. If the first shot doesn’t kill, then this game will be repeated. Robert shoots Dermot… he gets a flesh wound and cries for a while… one minute later (let’s assume they make decisions once per minute to make the hypothetical easier to understand) they are both still standing there with guns drawn, deciding what to do in the 2nd round of the game: shoot or smile.

If the game is infinitely repeating, then it is simple enough to show that the “trigger strategy” (starting with a smile, but threatening to always shoot in the future if the other person shoots) is the best option, as long as neither player has a sufficiently large discount rate. I’m not going to give the proof here; you’ll just have to trust me, or pay me, or find your own textbook.

Given that this game is being repeated once a minute… then we can reasonably assume that both players care a lot about those future outcomes too. More formally, the time value discount between each game is low, so the discount factor approaches one, and it can safely be ignored. So the new equilibrium is (smile, smile) with a threat that if the other bastard starts trouble, you’ll fight back.

**BUT…**

The assumption of “infinitely repeated games” seems a bit silly. And we know that if the game is finitely repeated (say, 10 times) then both players know the other will shoot them on the last game… and with backward induction they can work out that they should shoot now to beat them to it. Unfortunately, the equilibrium is back to (shoot, shoot) and Robert & Dermot are responsible for everything wrong in the world.

**BUT…**

Fear not, the talk of “infinite games” is just typical economic hyperbole. The important point is that neither player knows if or when the standoff will stop. It is the lack of the all important “final game” that means the above sad “finite game” outcome is avoided and the happy “infinite game” comes back with it’s friendly equilibrium of (smile, smile). Peace is restored.

**BUT…**

If we assume that both players are are fickle or frail (or something similar), then we can consider the possibility that they might not be able to stay in the game for long — let’s say they could get hit by a passing bus, or pass out drunk, or might have to go home for dinner at any time. All of this means that there is a real probability (between zero and one) that any game could be the last game. If there is a high enough chance that the game is going to end soon, then we’re back to the “finite game” solution of (shoot, shoot). Damn punks. I blame your parents.

**BUT…**

The above (shoot, shoot) scenario only happens if one (or both) of the players think there is a good chance that the game will end soon. It might not. Basically, we have re-introduced the issue of a discount factor. Earlier we assumed it away because the short time period (one minute) meant that both players cared a lot about their near future. But the discount factor can also incorporate each players assumption about the probability that the game is about to end.

Now, whether the players stick with the trigger strategy (smile, smile) or the Nash equilibrium (shoot, shoot) depends on the value of the discount factor, which falls somewhere between zero and one. If it’s high enough, there’s peace… if it’s low enough, there’s war. We don’t know their discount factors so now we can just shrug and say that we don’t know what they’re going to do. All this bloody theory just so that we can admit we don’t know anything.

**BUT…**

We can work out what their discount factors would need to be to encourage them to choose (shoot, shoot) or (smile, smile) and think about whether those are realistic.

The marginal discount factor = (a-x)/(a-y)

Where:

a = benefit from smiting your enemy

x = benefit from cooperating (smile, smile)

y = benefit from Nash (shoot, shoot)

Since this is a prisoners’ dilemma, we know that a > x > y … so the marginal discount factor will be positive and between 0 and 1. This special number tells us the cut off point for whether the game is treated as “infinite” or “finite”. If the probability of another round is higher than the discount factor, then the game is effectively infinite.

Based on the payoffs from above, the magic number for the discount factor rounds to 0.63… so if the players think there is more than a 63% chance that they will both still be playing the game in the next minute, then the game reverts to the “infinity” solution, which is the peaceful strategy of (smile, smile). However, if the players think there is more than a 37% chance that this is the final round (maybe they are being called for dinner), then they should start shooting.

Basically, while two people are running around town pointing guns at each other… they will have an incentive not to shoot. But if, for some reason, it looks like their merry little game is about to come to an end… then they’ll start shooting. It is the “not knowing when the game ends” part of the puzzle that helps keep the peace.

I imagine that in most mexican standoffs, there is more than a 63% chance that both players will still be playing the game one minute later, which would explain why they might choose (smile, smile) instead of (shoot, shoot).

**BUT…**

The above assumed that the chance of a game ending was equal irrespective of strategy. That is what we assume in most games, but most games don’t involve potentially killing each other. The bus accident, alcohol poisoning, and going home for dinner have nothing to do with the game — they are exogenous risks of the game ending. Those types of risk exist in any game.

However, in this game one of the strategies might change the probability of the game ending, which gives a different discount factor to players depending on which strategy they choose. This can take us to a whole new world of complicated, but let’s try and do it the easy way.

Assume that fighting (shoot, shoot) slightly increases your chance of dying. I guess we should expand that to include being exiled to a desert island or becoming a helpless cripple or any other “game ending” scenario… but for simplicity, I will just refer to “dying” as a catch all term to cover the specific ways that (shoot, shoot) can artificially end the game.

By increasing the chance of the game ending, then the discount factor (i.e. probability the game will continue) will decrease, and so we can say that the discount factor for (shoot, shoot) will be lower than the discount factor for (smile, smile). Basically, the (smile, smile) game will go on for longer, so in a smiling equilibrium the players will continue to get the benefits for a longer period of time… but the (shoot, shoot) game is likely to end quicker, so the benefits from playing will end quicker.

This observation actually strengthens our last prediction. Given the payoffs in this game, under normal (not killing each other) assumptions, the players should probably cooperate… and if we factor in the additional risk of death that stops the game (and stops the benefit from playing the game) then they have even more incentive to cooperate.

**BUT…**

Of course, the above analysis all depended on the payoffs. If a player’s benefit from shooting the other guy and getting away with it increases from 10 to 100… that changes their incentives. In that situation, if either player thinks there is at least a 3% chance that one (or both) players will stop the game in the next minute, then they should shoot now.

Another change that would make violence more likely is if we make it so that our players don’t really care about peace. Given that (shoot, shoot) has a payoff of (2, 2)… if Robert and Dermot have a pretty shitty life, then perhaps the strategy (smile, smile) only gives them a payoff (3, 3), instead of the (5, 5) we assumed before. Sad. In that situation, the all important marginal discount factor goes up to 0.88 and so if either player thinks there is at least a 12% chance that the game will stop in the next minute, they should shoot now.

In both of the above hypotheticals (more benefit from shooting, less benefit from peace), the incentives changed so that the players are more likely to shoot. But personally, I still doubt that most people in a mexican standoff have a 3% (or 12%) chance of stopping the game in any one minute. So I’m still betting on (smile, smile) as being a more sensible outcome.

**BUT…**

There is another weird and complicated way that different payoffs can change the best strategy. The above example has positive numbers for (shoot, shoot) as well as (smile, smile). While the different strategies give different payoffs, all payoffs are greater than zero. But if we create a new hypothetical situation where the Nash equilibrium (shoot, shoot) is has a negative payoff, then the incentives change in an interesting way.

A negative payoff from (shoot, shoot) is easy to imagine — for example if you don’t get much benefit from hurting the other person, or you’re a super-sensitive petal who gets very sad from fighting. Indeed, the original prisoners’ dilemma had negative values for everything. The problem with negative numbers is that you have to ask “negative compared to what”… which raises questions such as “if you can avoid negative numbers by suicide, is that a good thing”? But that’s a philosophical debate for another day. Let’s skip that drama, and just imagine negative numbers in the (shoot, shoot) Nash equilibrium.

Now we have to go back to the situation where different strategies had different discount factors. Remember, the (shoot, shoot) approach lowers the discount factor because the game is more likely to end… and in the story above that was considered a bad thing because it stopped you getting future benefits. But in our new hypothetical, the players will be suffering costs in the future. So if the (shoot, shoot) option makes the game more likely to end, then that will decrease your future accumulation of losses. Yay for you. From this perspective, one of the benefits of fighting is that one of you may die (and yes, it might be you) and that death will stop the game so that you no longer have to suffer the ongoing costs of fighting.

For the moment, please tolerate the strange assumption that you prefer death to a perpetual standoff. The important point here is that the deadliness of the game has now become a virtue because it decreases your lifetime costs and so increases your lifetime net benefit.

A quick example. Imagine you enjoyed punching somebody (+20) but because of that first punch, now you must battle that person every day, which you hate (-3 per day)… if the game is to go on for seven days then you have lost out and you really shouldn’t have punched them. But if you can somehow stop the game after only four days (though suicide, homicide, or maybe something more mundane like apologizing) then you are left with a net benefit of (+8). That means the original punch might have been worth it.

The point from this very long section is that when the (shoot, shoot) strategy has a negative future payoff, then decreasing the probability of future games existing makes (shoot, shoot) relatively more attractive. Remember that the original prediction from above was that under normal (not killing each other) assumptions you should probably cooperate… in this new hypothetical, when we factor in the extra risk that killing might stop the game, then you should be less wiling to cooperate.

Battle becomes slightly more likely. How much more likely depends on the probability of death. If you know that you will kill your enemy at the first shot, then the probability of future games drops to zero, and we have returned to the original static Nash equilibrium. But that’s a pretty extreme position… let’s just say that the easier it is for you to kill somebody (or die trying) then the more incentive you have to shoot. As long as you are indifferent to dying yourself.

**BUT…**

The situation above says you would be equally happy killing or dying, so long as the bloody game would just stop. Obviously that’s not realistic. The problem is that the above game doesn’t factor in the other benefits that the players get from their lives, loves, and laughs… and so the game doesn’t notice when they lose them.

One way to fix this is to only use positive numbers, compared to a baseline of killing yourself — which we assume is the “ultimate bad”. Let’s leave discussion of rational suicide for another day, and have a “no suicide” rule in this game. Just work with me here people. The payoff for (shoot, shoot) includes the weighted average of all possible consequences, including your death… but even with the possibility of your death included, the weighted average payoff for (shoot, shoot) will be positive because it is better than guaranteed death. This is true irrespective of whether death is 1% or 99% likely… and this is true irrespective of whether you are 1% or 99% likely to be the victim. For the moment we’ll stick with our assumption that our players have equal skills, so their chance of dying or killing is 50%… but those percentages do matter as we’ll see later.

With positive payoffs, this takes us back to our original answer, where cooperation (smile, smile) is likely, and the risks from killing each other just makes cooperation more likely, because it stops the game and stops the future benefits from the game. That means we had a long detour for nothing. Sorry.

There is another way to fix the “not wanting to die” problem. However, it is basically the same as above, but dressed up in different clothes. Skip it if you want and go to the next “but”… or read on if you love your game theory machinations.

In this discussion, we allow negative payoffs for the Nash (shoot, shoot) game. We consider two possible worlds — one is the “I might kill” world and the other is the “I might die” world. In the “I might die” world, you add a large negative number to any strategy that has a non-zero probability of death (or conversely you add a large positive number to any strategy that has no probability of death)… so that the “non death” strategy becomes relatively more desirable. Since we are allowing negative numbers, it is important that the “death penalty” (cost of death when probability of death = 100%) should be a bigger negative than anything else that can happen in the game. As mentioned above, we’re ignoring the option of rational suicide.

If you knew which world you were in, the game would be easy:

- If you knew you couldn’t die, then you follow the incentives in the section directly above, where you (smile, smile) if the killing takes a long time (high discount factor) and you (shoot, shoot) if the killing is quick (low discount factor). Remember that there is still some incentive to be nice… but if you can kill the other player quickly in this game, then shooting becomes relatively more attractive. Let’s say this marginal improvement in the payoff from the shooting strategy is “k” for kill.
- In contrast, if you knew that any death would be your death, then the big number we added to factor in your “cost of death” will radically bias your choices away from the deadly strategies. Remember that even without this change, there is still some incentive to be nice… but a quick death in the game makes shooting even less attractive, because the death is yours. Let’s say this marginal loss in the payoff from the shooting strategy is “d” for dying.

Since we have said that the cost of death must be larger than any other cost (the “no suicide” rule) then we know that “d” must be larger than “k”. If we stick with the assumption that Robert and Dermot have equal skills (as above), then their chance of being in either world (dying or killing) is 50%. Given that “d” > “k” then in the weighted average outcome, the “d” effect (wanting to avoid death) trumps the “k” effect (wanting the game to end) and so an increase in the probability of killing will increase the incentive to cooperate, once again giving us (smile, smile) as the result.

I told you it was the same as the “always positive payoffs” approach above. Now wasn’t that a waste of your time and mine. You should have just believed me in the first place.

**BUT…**

The kicker to all of this is that we have assumed that the two players both really hate being shot more than anything else in the game. If we change some of their preferences then they might end up with different strategies. Above we considered what would happen if the payoff for fighting (shoot, shoot) was negative. But at least the payoff was still better than just being shot while smiling. But that doesn’t have to be the case.

For example, if one (or both) of the players decides that the consequences of a full on fight (shoot, shoot) are really horrible (after the two-way fight your opponent also kills your pet dog which creates -10 worth of sadness), then you could end up with an outcome where it is optimal for one person to shoot, but the other decides to cry a little and then just go home. At least they still have their dog.

In other words, the new payoffs have turned the “prisoners’ dilemma” into a “hawk-dove” game. These games are noticeable for the fact that player(s) really want to avoid a two-way fight, but they still have an incentive to be somewhat of a bastard. In a single (or fixed finite) situation, the shooter gets the gravy and the victim runs away without a fight.

**BUT…**

The new “hawk-dove” game might also look different in an infinitely repeated scenario. The trick here is reputation — to convince the other player that you’re a little bit crazy, and if they start a fight (and kill your dog) then you’ll continue the fight (and kill their dog and burn their house down) even if it means mutual destruction. If you can both convince each other that you’re willing to let the whole world burn and damn the consequences, then once again you might find yourself back at the happy (smile, smile) equilibrium. This depends on the discount factor again, but I’m not going through the maths anymore. Let’s just assume it adds up and peace has been restored.

**BUT…**

The second kicker is that we have assumed that our players have similar preferences and abilities. If they have different preferences (attitude to each other, pain tolerance, reason for living) and/or different abilities (to shoot straight or dodge/recover) then we could get nearly anything. If Dermot loves shooting Robert and Robert loves getting shot by Dermot (but not vice versa) then we have a pretty obvious equilibrium of (smile, shoot) with a big payoff for everybody. Nice.

More realistically, if we assume that one of them is better at attacking, then their aggression might be unstoppable. Earlier we assumed that in a battle to the death, both guys had a 50-50 chance of winning. But if we change that percentage, then their incentives change.

I explained the “kill or die” situation two ways above, so I’ll do it again here. The first way was to use only positive payoffs for the fighting scenario (against a baseline of suicide which isn’t allowed)… if Robert happens to be the better killer (80% chance of victory) then he has reduced his probability of death by 30%, and so he has increased his benefit from (shoot, shoot) by the “death penalty” * 0.3… which makes the shooting strategy relatively more attractive. This is basically the same as the earlier example when we considered what would happen if the players got more benefit from shooting people. Unsurprisingly, if you get more benefit from doing something, then your incentive to do it increases. Meh.

The other way I described the “kill or die” situation was with two different worlds, each with different incentives. We previously assumed that each player had a 50-50 chance of being in either world. Now we can look at what happens if Robert has an 80% chance of being in the “can’t die” world and only 20% chance of being in the “might die” world. Given that the “can’t die” world had a higher chance of fighting… then increasing the weighting of that world will make the weighted average slightly more likely to fight. As above. Pretty obvious eh?

**BUT…**

The final kicker to all of this is that we have assumed that we actually know the payoffs for each player. If we don’t know what Robert and Dermot actually want in the world, then we can’t really work out what they’re going to do. The best we can do is guess about what sort of world we think we are in (what type of game, what size payoffs) and then take the best action based on that guess. If the above payoffs given in this article are anywhere near right, then you probably want to stick with (smile, smile). Give peace a chance.

Of course, our guess about the world might be wrong. It might be that Robert gets 1000 worth of benefit from shooting Dermot and he doesn’t mind a shooting war and he’s likely to stop the game soon because it’s almost dinner time anyway… so we thought we were in a (smile, smile) world but Robert gets trigger happy and proves our assumption wrong, forcing us to upgrade our assumptions about the different payoffs and discount rates.

And that’s where Bayesian updating comes into the equation. Which is a topic for another day.

===========================

Some students are paying a lot of money to hear me teach this at UQ on Wednesday. I might just point them to this post instead and take the day off. 🙂