I’ve seen three examples of the same strange thing happening over the past few months, and I wanted to compile them here. The basic set-up is this: in an optimal stopping problem, we usually have some kind of stochastic process, say on a state space , and for each state , we take to be the payoff from stopping at and to be the expected payoff from continuing from . Then, , the “value” of state , should be . Between stopping and continuing, we pick whichever is better. If they’re the same, it doesn’t matter which one we choose.
This formalism is really just a manifestation of the way I think most people think about the kinds of optimal stopping problems they meet in the real world: am I better off stopping or not? But the weird thing I’ve seen is that this strategy can actually give completely counterintuitive and incorrect results. The first time I saw this, I brushed it off as a weird edge case, but then I saw two more such examples.
This is weirdly unsettling, because expected value is (maybe implicitly) held up as this great tool for thinking about how to play little games like this and make decisions, but it clearly has some serious drawbacks that I hadn’t been fully aware of until seeing these.
1. Three Examples
Example 1. Consider the secretary problem (this is actually what’s called the cardinal payoff variant of the problem), where we see some sequence of applicants for a position, each applicant has an associated value measuring their quality, and our payoff is this quality value. We must decide whether to accept/reject an applicant immediately after seeing them, and cannot call them back.
If the qualities of the applicants are drawn from a uniform distribution, then it can be shown that the best strategy is to go through candidates, and then pick the next best one. But if the qualities are drawn from the negative half of the Cauchy distribution, that is, with PDF
then strange things happen. In particular, we can check that
so the expected quality of any applicant is negative infinity! What this means is that when we see the first applicant, the expected payoff from continuing is negative infinity, so we always accept the first candidate we see. But the payoff from using this strategy is the expected quality of the first candidate, which is still negative infinity!
Example 2. Consider a simple random walk on starting at some with an absorbing boundary i.e. once you reach zero, you stop walking. If we have the payoff function for stopping at , then
so the expected payoff from continuing is always greater than the payoff from stopping. But the walk is recurrent, so if we continue walking for long enough, we are guaranteed to reach 0 and win nothing.
Example 3. Consider a series of double-or-nothing bets on the results of a flip of a biased coin that comes up heads with probability . The expected value of taking the bet when you have dollars is
so we should always take the bet. But the coin will eventually come up tails with probability 1, and we will lose everything.
(Credits: I saw the first example in a Facebook post by Kevin Dong, the second on a problem set for my stochastic processes class, and the third in a conversation with a friend.)
What’s going on? How did this seemingly foolproof plan of just doing the thing that maximized our expectation go so awry?
2. The Secretary Problem
For the secretary problem, the apparent issue is that having infinite expectations just gives meaningless results. It’s not too bad when you’re trying to compare an infinite and finite expectation, because then you can at least say something like “we should do anything to avoid something with infinitely negative expected value,” but the example here is trying to compare two infinite expectations (the expectation from stopping after one candidate and the expectation from stopping after two candidates), and it all falls apart. The only consolation is that this expected value will never be attained – the first candidate can only be finitely bad, after all.
3. The Random Walk
For the random walk, I think a small computation is helpful. Let’s say we adopt a strategy where we a set two markers, say at , and stop whenever we reach them. To compute the expected payoff of such a strategy, let be i.i.d. Bernoulli random variables with parameter . Then our simple random walk is given by the recurrence with .
We can check that if is the -algebra generated by , then is measurable with respect to , and furthermore,
so is a martingale with respect to . Now, if we let
the optimal stopping theorem gives us . Then,
With this in mind, our expected payoff is
The issue is a little clearer now: is linear in both , so it’s extrema are at the endpoints – that is, we can always do better by making smaller or larger. This corresponds exactly to the “we’re always better off taking another step” mentality that the one step expectation calculation gave, except that now we see that this problem isn’t just an artifact of only looking one step ahead.
I think what’s going wrong here is that the expectation isn’t accounting for the opportunity cost involved. When we take , the fact that state 0 has zero payoff doesn’t sufficiently capture the fact that we’ll also lose the opportunity to continue playing. This can also be thought of as a local/global type issue, where the thing that’s ruining everything is the recurrence of the walk, which is a global property, but our expected value computations are all local.
4. The Coin Game
It makes sense to do a computation similar to the previous one, but the computation is much easier here: if we decide that we’re going to stop after flips, the expected payoff is
This has the same qualitative behavior as the payoff function from the random walk, where it increases as we play for longer, and we can explain it in the same way, by saying that the expected value doesn’t realize that the game ends when we flip tails. But this behavior is arising from a different kind of payoff function, so there’s an underlying structural difference between the two. I’d be curious to know if there are other examples of games that have such payoff functions, or if these are in some sense the only two ways this can happen.