The Monty Hall Problem
the Principle of Indifference

University of Minnesota, Twin Cities     School of Statistics     Charlie's Home Page

This is a letter to my brother-in-law squared Steve Houlgate forwarded via my sister Ruth Shaw.

Dear Steve,

I have been reading The Curious Incident of the Dog in the Night-Time because Ruth and Frank told me it was the talk of Cedar Lodge and of your problem with the infamous Monty Hall problem.

This morning, while taking a shower, a very simple answer to your question (as it was explained to me) occurred to me and I would like to share it.

It is true by the principle of indifference (which really has no formal justification and is not part of mathematical probability theory) that if you have two doors to choose between and have no information that allows you to distinguish them, then your subjective probabilities ought to be 1/2, 1/2. But here you can distinguish. One door is the one you originally chose, one isn't. Now it is also true, that if you have no idea how to use this information to adjust your probabilities, then your personal subjective probabilities are still 1/2, 1/2. But they did, after all, explain how to use this information to adjust your probabilities. Hence, it is easy to see that the principle of indifference is wrongly applied.

I myself, detest this puzzle and never use it in class, because there are many loose ends in the usual telling of the story (for example, the Ask Marilyn version quoted in the novel). Depending on how creative one is in playing with those loose ends, one can justify almost any answer. But if one actually nails down every loose end, the story is too long and confusing to be of any interest.

By the way, I prefer what I think is the simplest explanation of all. Suppose that at the beginning of the game you have no information about the placement of the car (note that this is one of those loose ends not nailed down by the Ask Marilyn version -- perhaps there is some giveaway about the location of the car, its door sags or whatever). Then the principle of indifference is correctly applied at the beginning of the game and the probability that your original choice is correct is 1/3 (no matter what strategy you use to choose). Time passes and more stuff happens (the details are irrelevant). But the location of the car does not change (another one of those loose ends, what if they move the car in response to your choice?) Now what is the probability that your original choice is correct? Still 1/3. The association of car and door has not changed. You are right now if and only if you were right originally. Hence the probability of winning if you stick with your original choice is 1/3 -- no case splitting and no calculation necessary. Hence the probability of winning if you switch is 2/3 because probabilities add to one, and there is no other option.

The other reason I both detest and enjoy this problem is that it says something deep about human evolutionary psychology. At least, the emotional heat attached to the puzzle does. This puzzle reveals a major defect in an apparently hard-wired (biological) human decision making heuristic.

If you have made a decision and as time passes further information acquired does not contraindicate your decision, then stick with it.

I call this the you dance with the gal what brung ya heuristic using what some Texas football coach once said (about whether to switch quarterbacks in the second half of a key game). The Monty Hall Problem shows this is just wrong. Even though incoming information may favor your prior decision, it may favor alternatives even more! But this heuristic was good enough for australopithecines wandering around on the savanna, so it is hard-wired into us (so I conjecture, no proof!) And people get very upset to have their thinking, expecially thinking that occurs at an unconscious level and so seems magical or God-given, questioned. But there is no doubt the heuristic is wrong, as is obvious from any clear statement of the issue. Moreover the Monty Hall Problem shows that the heuristic is not just a little bit wrong. Circumstances can be constructed to make it as wrong as you please.

I can't resist adding one more comment about the principle of indifference. There is a reason why it isn't part of the mathematical theory of probability. It is very slippery. It is impossible to know what it says even in fairly simple situations. Here is one from the core of elementary probability theory (covered in the first week of an undergraduate course). Suppose I toss two coins and count the number of heads. The possible outcomes are 0, 1, and 2. Applying the principle of indifference the probabilities are 1/3, 1/3, and 1/3. But suppose I distinguish the two coins and now see four outcomes: TT, HT, TH, and HH. Applying the principle of indifference the probabilities are 1/4, 1/4, 1/4, and 1/4. But this conflicts with the earlier analysis, because now the addition rule says the probabilities of 0, 1, and 2 heads are 1/4, 1/4 + 1/4 = 1/2, and 1/4. So the principle of indifference leads to two conflicting analyses. Which is correct? The principle doesn't say. That's why I don't even mention the principle of indifference when I teach probability.

The formal issue is this. Suppose X is a random variable having a discrete uniform distribution (all values equally probable -- the probability theoretic notion that corresponds to the principle of indifference). And suppose Y = g(X) is another random variable (where g is a function). Then Y is generally not uniformly distributed, because

Pr(Y = y) = Pr{g(X) = y} = ∑x such that g(x) = y Pr(X = x)

so unless g is a one-to-one function, there is no reason for Y to be uniformly distributed. So do we apply the so-called principle of indifference to X or to Y? How can one tell? A serious mathematico-philosophical issue directly related to this is the bogosity of so-called noninformative priors in Bayesian inference. If we are noninformative about θ, then we will be informative about g(θ) for any function g that does not have constant Jacobian. It's the same issue: a mapping (g) can map uniform random variables to non-uniform random variables, and on which coordinate system are you to be uniform (apply the principle of indifference)? The principle doesn't say, and you are left in a quandary.

Your obedient servant and brother-in-law squared,


P. S. The generally accepted answer to the question about coin flips is 1/4, 1/2, 1/4, what probability theorists call the Binomial Distribution. But the reasoning for it actually applies a much stronger principle, the principle of independence (sometimes in elementary texts called the multiplication rule). We apply the principle of indifference to each coin separately, getting 1/2, 1/2 for the probabilities of T, H. And then we multiply 1/2 * 1/2 = 1/4 to get the probabilities for TT, HT, etc., because the coin flips are independent (the outcome of one cannot influence the outcome of the other in any way). The principle of indifference is, by itself, just too weak to say anything about even this simple a problem.