versus

the Principle of Indifference

University of Minnesota, Twin Cities School of Statistics Charlie's Home Page

This is a letter to my brother-in-law squared Steve Houlgate forwarded via my sister Ruth Shaw.

Dear Steve,

I have been reading
The Curious Incident of the Dog in the Night-Time

because Ruth and Frank told me it was the talk of Cedar Lodge and of your
problem with
the infamous Monty Hall

problem.

This morning, while taking a shower, a very simple answer to your question (as it was explained to me) occurred to me and I would like to share it.

It is true by the principle of indifference

(which really has no
formal justification and is not part of mathematical probability theory)
that if you have two doors to choose between and have *no information*
that allows you to distinguish them, then your subjective probabilities ought
to be 1/2, 1/2. But here you can distinguish. One door is the one you
originally chose, one isn't. Now it is also true, that if you have no
idea how to use this information to adjust your probabilities, then
your *personal subjective* probabilities are still 1/2, 1/2. But they
did, after all, explain how to use this information to adjust your
probabilities. Hence, it is easy to see that the principle of indifference

is wrongly applied.

I myself, detest this puzzle and never use it in class, because there
are many loose ends in the usual telling of the story (for example, the
Ask Marilyn

version quoted in the novel). Depending on how creative
one is in playing with those loose ends, one can justify almost any answer.
But if one actually nails down every loose end, the story is too long and
confusing to be of any interest.

By the way, I prefer what I think is the simplest explanation of all.
Suppose that at the beginning of the game you have no information about
the placement of the car (note that this is one of those loose ends *not*
nailed down by the Ask Marilyn

version -- perhaps there is some giveaway
about the location of the car, its door sags or whatever). Then the
principle of indifference is correctly applied at the beginning of the
game and the probability that your original choice is correct is 1/3
(no matter what strategy you use to choose). Time passes and more stuff
happens (the details are irrelevant). But the location of the car
*does not change* (another one of those loose ends, what if they move the
car in response to your choice?) Now what is the probability that your
original choice is correct? Still 1/3. The association of car and door
has not changed. You are right now if and only if you were right originally.
Hence the probability of winning if you stick with your original choice
is 1/3 -- no case splitting and no calculation necessary.
Hence the probability of winning if you switch is 2/3 because probabilities
add to one, and there is no other option.

The other reason I both detest and enjoy this problem is that it says
something deep about human evolutionary psychology. At least, the emotional
heat attached to the puzzle does. This puzzle reveals a *major defect* in
an apparently hard-wired (biological) human decision making heuristic.

If you have made a decision and as time passes further information acquired does not contraindicate your decision, then stick with it.

I call this the you dance with the gal what brung ya

heuristic using
what some Texas football coach once said (about whether to switch
quarterbacks in the second half of a key game). The Monty Hall Problem
shows this is just wrong. Even though incoming information may favor
your prior decision, it may favor alternatives *even more!* But this
heuristic was good enough for australopithecines wandering around on
the savanna, so it is hard-wired into us (so I conjecture, no proof!)
And people get *very upset* to have their thinking, expecially thinking
that occurs at an unconscious level and so seems magical or God-given,
questioned. But there is no doubt the heuristic is wrong, as is obvious
from any clear statement of the issue. Moreover the Monty Hall Problem
shows that the heuristic is not just a little bit wrong. Circumstances
can be constructed to make it as wrong as you please.

I can't resist adding one more comment about the principle of
indifference. There is a reason why it isn't part of the mathematical
theory of probability. It is very slippery. It is impossible to know
what it says even in fairly simple situations. Here is one from the
core of elementary probability theory (covered in the first week of an
undergraduate course). Suppose I toss two coins and count the number
of heads. The possible outcomes are 0, 1, and 2. Applying the principle
of indifference the probabilities are 1/3, 1/3, and 1/3. But suppose
I distinguish the two coins and now see four outcomes: TT, HT, TH, and HH.
Applying the principle of indifference the probabilities are 1/4, 1/4, 1/4,
and 1/4. But this conflicts with the earlier analysis, because now the
addition rule says the probabilities of 0, 1, and 2 heads are 1/4,
1/4 + 1/4 = 1/2, and 1/4. So the principle of indifference leads to
two conflicting analyses. Which is correct? The principle

doesn't say.
That's why I don't even mention the principle of indifference when I
teach probability.

The formal issue is this. Suppose `X` is a random variable having a
discrete uniform distribution (all values equally probable -- the probability
theoretic notion that corresponds to the principle of indifference).
And suppose `Y` = `g`(`X`) is another random
variable (where `g` is a function).
Then `Y` is generally *not* uniformly distributed, because

Pr(`Y` = `y`)
=
Pr{`g`(`X`) = `y`}
=
∑_{x such that
g(x) = y}
Pr(`X` = `x`)

so unless `g` is a one-to-one function, there is no reason for `Y` to be
uniformly distributed. So do we apply the so-called principle of indifference
to `X` or to `Y`? How can one tell? A serious mathematico-philosophical issue
directly related to this is the bogosity of so-called noninformative priors
in Bayesian inference. If we are noninformative

about θ, then we
will be informative about `g`(θ) for any function `g` that does not have
constant Jacobian. It's the same issue: a mapping (`g`) can map uniform random
variables to non-uniform random variables, and on which coordinate system

are you to be uniform (apply the principle of indifference

)? The principle

doesn't say, and you are left in a quandary.

Your obedient servant and brother-in-law squared,

Charlie

P. S. The generally accepted answer to the question about coin flips
is 1/4, 1/2, 1/4, what probability theorists call the Binomial Distribution.
But the reasoning for it actually applies a much stronger principle,
the principle of independence (sometimes in elementary texts called the
multiplication rule

). We apply the principle of indifference to
each coin separately, getting 1/2, 1/2 for the probabilities of T, H.
And then we multiply 1/2 * 1/2 = 1/4 to get the probabilities for TT, HT, etc.,
because the coin flips are *independent* (the outcome of one cannot influence
the outcome of the other in any way). The principle of indifference

is, by itself, just too weak to say anything about even this simple a problem.