Hindsight bias is not rational

But maybe outcome bias is.

\(\newcommand{\avg}{\mathbb{E}}\)Do you believe in magic? Here’s a trick that’s guaranteed to make you think you’re smarter than you actually are. The best part: it’s rational.

I’m going to ask you to give your confidence (as a 0–100% probability) in the truth of a bunch of true/false statements, like “Trump will be impeached in the next three months” and “There are more than 200 countries in the world.” A year later, I’ll ask you to recall these probability estimates to the best of your ability, and you’ll be rewarded for how accurate you are in your memory of what you said. But before you try to recall the numbers you wrote down, I’ll tell you the answers to all the statements. You’ll know whether Trump was impeached, whether there are more than 200 countries in the world, and so on.

Based on experimental data of “hindsight bias,” I can predict how your new knowledge of the truth of these statements will influence your memory of your original stated probabilities: your original estimates will shift, on average, towards 0% if the statements were false and 100% if the statements were true. As a result, your remembered confidences will appear more prescient than your actual confidences.

This distortion of your memory may sound like a textbook example of self-serving irrationality. But let me try to persuade you that it’s actually reasonable. First, we assume that you’re at least somewhat decent at anticipating what’s true or false. That is, if a statement is true, you’re more likely to give a probability estimate like 70% than 30%, and vice versa if the statement is false. Second, when I ask you to recall your original judgments, you’ve forgotten exactly what you said and therefore acknowledge that there are multiple possible numbers you could have written down. The uncertainty that’s introduced into your memory is more or less random around your stated estimate, so if you wrote down a number like 32%, you could just as well ‘remember’ that as 25% or 40%. But suppose you now learn that the statement you were estimating is true (e.g., Trump did get impeached in those three months). This gives you useful information to resolve some of your uncertainty about the probability you initially gave. As we just assumed, you’re generally more likely to give high confidences than low confidences when statements are true. Thus, if your memory can’t distinguish between low numbers like 25% and high numbers like 40%, it’s reasonable to expect that you gave a number on the higher end, since that’s more consistent with what you would typically do. Indeed, Bayes’ rule — the formula for rational updating — tells you to combine your noisy memory with your background knowledge in this way. The end result is that your answers tend to be ‘biased’ towards the true extremes of 0% and 100%.

Convinced? Well, I lied. This isn’t rational. But this style of argument has persuaded some philosophers that hindsight bias is reasonable. While I don’t categorically disagree, we need to be more careful about what we mean by “hindsight bias” and related phenomenon. This starts with the classic memory paradigm described above, which produces a pattern of responses that I can show you is unreasonable. Then we can return to cases in which the philosophers have a point.

Your current belief is the future you expect

Here’s the simplest way to see why you can’t get rational hindsight bias in the memory experiment. Right after you give your original confidence judgments, you have some expectation of what your average error will be. We could define “error” in a number of ways, and it doesn’t matter for the sake of the argument. But suppose we define it as the absolute distance between your probability estimate and 0% when the statement is false or 100% when the statement is true. So if you predict a 32% chance that Trump will be impeached in the next three months, your error is 32% if he isn’t impeached and 68% if he is impeached. Your “expected” error for any given question is a weighted average of these two possibilities. You believe there’s a 68% chance that you’ll have an error of 32% and a 32% chance that you’ll have an error of 68%. For a given question, this reduces to \(2*p*(1-p)\), where \(p\) is your estimated probability of the statement’s truth. And your expected error for the task is the average of expected errors across possible questions.

It may be difficult for you to come up with an exact number that you ‘expect’ to be your error. But even if you’re not sure what the exact number is, it must obey a property known as the law of iterated expectations. In words, this law implies that the error you expect on the task right after you complete your initial predictions should be equal to the expected error you expect to have about the task a year later after you’ve been given the answers and forgotten your initial estimates.

This is a mouthful, so let’s get more precise. Right after you do the task, you have only one expectation for your error. Maybe you think your probabilities are off by around 30%, on average. But, of course, your belief about your average error may change in the future if you get new information. For example, suppose I tell you later that this was a trick quiz, and the answer to all the questions was “true.” Presumably, you will come to believe that your average error is larger than what you initially thought, even if you’ve forgotten exactly what numbers you wrote down. Likewise, you could learn that all the answers are “false.” Or, more likely, you could learn that some particular answers are “true,” and others are “false” in a way that somewhat correlates with your confidence judgments. In each of these possible future worlds, you would have some belief about your average error on the original task. The law of iterated expectations tell us that the average error you believe to have in the present must be a weighted average of these future possible beliefs, where the weights are given by the probabilities you assign to each possible future.

This is simplest to understand if we imagine that the prediction phase of the experiment is just one question. For example, suppose the task is just to assign a probability to the statement that there are more than 200 countries in the world. You write down 32%. Using the formula we defined above, this probability implies that you expect, right after writing down this probability, an error of 44%.1 By the law of iterated expectations, though, you should also believe that your weighted future expected errors should average out to 44%. I tell you that a year from now, you’re going to be asked to recall what what probability you wrote down. You probably won’t remember the exact number, and I’ll also tell you whether there are more than 200 countries in the world. You predict a 32% chance of learning that there are more than 200 countries, and a 68% chance that there are not. What probabilities do you predict recalling as your probability estimates a year from now? These remembered estimates could be different from your true estimate of 32%, but the errors you expect to observe in each possible future world must average out to 44%.

Now let’s see why rational hindsight bias is impossible in this setup. Although it sounds reasonable to suppose that learning the truth value of the statements for which you had estimated probabilities should bias your recalled estimates towards 0% or 100% (depending on whether a statement was false or true, respectively) when your memory is imperfect, this bias would contradict the law of iterated expectations. Right after you give your 32% estimate, you could reason as follows: “A year from now, I will forget exactly what number I wrote down and only remember that my estimate was somewhere vaguely below 50%. If I’m told that there are less than 200 countries in the world before giving my recollection, I’ll come to believe that the number I wrote down was something lower than 32% (e.g., 25%). If I’m told the opposite, I’ll come to believe that the number I wrote down was something higher than 32% (e.g., 40%). So I think there’s a 68% chance that I’ll remember saying 25% and a 32% chance that I’ll remember saying 40%. In the former case, my error would be 25%, and in the latter case, my error would be 60%. Hence, my expected error at this future point would be 36%. But 36% is smaller than the 44% I expect now. Therefore, I’ve violated the law of iterated expectations.”

This proves that Bayesian updating in a fully ‘rational’ model can’t produce hindsight bias in this paradigm. You may be feeling puzzled as to why this is irrational, though. Let’s zoom in further to explore how Bayesian updating should work.

Bayesian memory reconstruction

We can divide up the memory task into three stages of inference. First, you give your confidences (probability estimates) for a set of binary propositions, which establishes a baseline expectation of your error on the task. Second, you are asked to recall the probability estimates you gave for the specific questions after a long period of time, during which noise has been introduced into your memory. Lastly, you are asked to do the same memory task while knowing the binary truth values of the statements you had evaluated. The law of iterated expectations entails that your expected error on the original task should remain constant at each stage.

Let’s build a toy model to simulate what happens at each of these stages. We’ll need to pick a distribution for your expected errors across question. For simplicity, we’ll use a beta distribution. Specifically, because you assume that you have some skill in the estimation task, we can choose Beta(1, 2) for the error distribution, which implies that your errors across questions decrease linearly in distance from 0% or 100%:

That is, if the answer to the question is “true,” you’re about twice as likely to give a confidence around 100% than to give a confidence of 50%, and in turn, you’re twice as likely to give a confidence of 50% than a confidence of 25%, and so on. The expected confidences are mirrored when the answer to the question is “false.” (Note that the beta distribution precludes confidences of exactly 0% or 100%.)

With errors distributed in this way, your expected error for a generic question is slightly greater than 33% (i.e., 1/3), which I will refer to henceforth as “33%” for simplicity. This is the expected error that should remain stable even as your memory of your original answers degrades or you acquire knowledge of the truth value of the statements you were estimating. Moreover, if you reasonably assume that any given statement in the task is equally likely to be true or false, this error distribution implies that your confidence ratings should follow a uniform distribution across questions. That is, if we looked at your probability estimates across a huge number of questions, we would see that you are equally likely to give 32% as to give 52% as to give 99%, and so on. The uniform distribution is simply the result of averaging two sloping lines: the downward sloping line when statements are false and upward sloping lines when statements are true.

We need to further mathematically specify how noise gets introduced into your memory for your estimates. But let’s put that on hold for now and ask what would happen if you completely forgot everything about the details of what you said when you are asked to recall your original estimates. This limiting case will help establish why the law of iterated expectations should hold.

What if you forget everything?

To recap, you gave probability estimates to a bunch of true/false questions a year ago. Now you are asked to recall the estimates you gave, first without knowing whether each of the statements you rated was true or false. Indeed, suppose that your original estimates were indexed 1, 2, 3, etc., so you aren’t even given a reminder of what quantities you were predicting, which would allow you to use contextual clues to figure out what you may have thought at the time. Your task is to guess what estimate you gave for “Question #1” and so on.

While you don’t remember the specific answers you gave, you do remember (or are reminded of) the Beta(1, 2) distribution of errors you expect across the population of questions and your prior belief that any given statement was equally likely to be true or false. With this knowledge, what should you ‘remember’ as your estimate for any given question? As explained above, the error distribution combined with the assumption that statements are equally likely to be true or false implies that your estimates follow a uniform distribution — they could be anything with equal probability. If we assume that you want to report the expectation (i.e., average) of your uncertain belief, the uniform distribution implies a best guess of 50% for what you said on all questions. Of course, this doesn’t mean that you think you actually guessed 50% — you know that you’re equally likely to have said anything else. This just happens to be the center of mass of your flat distribution over probabilities.

If the estimate you ‘expect’ for all questions is 50%, it might seem like your expected error should now be 50% rather than the 33% you initially expected. But this is incorrect for a crucial reason: the error you expect is different from the error of your expectation. To calculate your expected error, you need to average your possible errors over all possible probability estimates you could have given. Since you think any estimate is as likely as the other, we just take the average of the error function, \(2*p*(1-p)\), over all probability values. This results in the same 33% expected error you started with.

Now for the more interesting part. We suppose that you’re asked to do the same recollection task after you’re told the true/false answers to the statements you evaluated. For example, you know that Question #1 was “true,” Question #2 was “false,” and so on. This changes the way you think about error. If you said “30%” on Question #1, your expected error is no longer an average of two possibilities; you know that the statement is true, and therefore your error is 70% (i.e., the distance between 30% and 100%). Likewise, if you said “30%” on Question #2, your expected error is simply 30%.

Your uniform uncertainty over probabilities now entails an error greater than 33%. If a statement is false, the average distance from 0% of a uniform distribution is 50% — the same as the expectation of your distribution. For true statements, we can just flip your probability estimate (e.g., change 30% to 70% and so on) and treat the statement as false. By symmetry, the uniform distribution in this case also implies an error of 50%. The law of iterated expectations has been violated!

This is the critical step where we must invoke Bayes’ rule: knowing whether a statement is true or false should alter your beliefs about your original probability estimates. If you know that errors follow a linear relationship from the beta distribution and you know whether the statement you were estimating is true or false, you can make a more educated guess about what you initially said. That is, your probability estimates should follow Beta(1, 2) when the statement is false and Beta(2, 1) when the statement is true. (The latter is just the mirror image of the former, with the line sloping up and reaching a maximum just before 100%.) With this information, you should expect a probability estimate of 33% for false statements and 67% for true statements. If we use the trick of flipping the probability estimates for true statements to calculate error, we can see that the expected error is 33% in both cases.

In other words, you need to adjust your memory in the direction of your knowledge in order to preserve the expected error you originally had. And yet, you do not show hindsight bias. Your memories haven’t moved your estimates closer to the boundaries, which would lower your average error. The variance in your estimates has merely decreased. Whereas you originally gave probabilities ranging uniformly from just above 0% to just below 100%, your recollections take on only the two possible values: 33% (for false statements) and 67% (for true statements).

An imperfect memory

Now let’s consider the slightly more complex set of cases in which you remember some details of your estimations, but you are uncertain of the exact probabilities you wrote down. (This section is a bit more technical and can be skipped if the logic from above has convinced you.) In other words, when trying to recover your original judgment in hindsight, you rely on two pieces of information: (i) your knowledge of the truth or falsity of the statement you were rating and (ii) a noisy memory signal of what you said. This noisy memory signal could take various forms, but for simplicity, let’s imagine it’s just a number that pops into your head that is a function of the actual value you wrote down. For example, if you actually indicated 32% confidence, perhaps the number that pops into your head — prior to applying Bayes’ rule or considering your knowledge of whether the statement is true or false — is 38%. Critically, this is not necessarily the number that you’ll indicate as your best guess of your original estimate. It’s just a starting point that will get passed to Bayes’ rule.

Because of the bounded nature of probabilities, we model your memory noise in a slightly convoluted way. We take a standard approach of assuming this noise is normally distributed, with a variance of \(\sigma^2\), in log-odds space. As explained in a previous post, log-odds are a way to transform probabilities, which are bounded between 0% and 100%, to unbounded values, thereby avoiding the possibility that you could get a memory signal less than 0% or greater than 100%. We need to further assume — perhaps somewhat unrealistically — that you have knowledge of this noise distribution in your head. For example, you ‘know’ to some approximation that the 38% number that popped into your head was a sample from a normal distribution whose mean in log-odds was equal to the number you actually wrote down. (To give a sense of magnitude, 38% is about 0.25 away from 32% in log-odds space.)

Suppose you don’t yet have knowledge of the truth of the statement you estimated. Your best guess of what you initially wrote down will be inputted as ‘data’ for Bayes’ rule. Bayes’ rule says that your posterior belief in a given confidence probability estimate \(C\) is proportional to the likelihood of receiving this data and a prior probability in an estimate:

\[ P(C \mid D) \propto P(D \mid C)P(C). \]

Here, the likelihood \(P(D \mid C)\) is given by the normal distribution in log-odds space, and your prior is a uniform distribution over probabilities, as explained in the previous section. Your best guess (i.e., your mathematical expectation) of the number you wrote down is a weighted average over all possible probability estimates \(C\), with the weights given by the posterior \(P(C \mid D)\).

Through simulation, let’s explore how this computation shapes the distribution of guesses you might make about your initial estimates. (Again, we’re assuming you haven’t yet learned whether the statement was true or false, and therefore, that information hasn’t yet been incorporated.) For these simulations, we’ll take \(\sigma^2\) to be 0.25.

The plot below shows how the average best guess (blue line) relates to your original estimate (x-axis). Critically, this best guess is not a simple straight line. Original estimates that are near the extremes get pulled towards the center, implying that an original guess like 32% is more likely to be estimated as a slightly higher value. This reflects the regularization baked into Bayes’ rule: when your memory is noisy, it’s reasonable to bias your memory towards your prior belief of 50% (the mean of the uniform distribution).

To compute your expected (absolute) error in this context, we again need to distinguish the error of your expectation from your expected error. Each simulation of a noisy memory signal is associated with a full posterior distribution over possible estimates you could have given. Each of these possible values implies an expected absolute error \(2*p*(1-p)\). To compute your expected error for a single simulation, therefore, we use \(P(C \mid D)\) as weights in a weighted average across all possible errors for each \(C\) value.

Finally, to approximate your expected error prior to receiving a memory signal, we zoom out one step further by averaging the expected errors across all simulations. Lo and behold, this average is equal to that 33% number we started with (with a tiny bit of simulation error). In other words, the law of iterated expectations is confirmed: you can expect that your expected error will remain 33% after you’ve forgotten exactly what number you wrote down and are trying to recover that value using your imperfect memory.

Now let’s introduce the hindsight component, where you’re further told whether the statements you were ratings were true or false. Bayes’ rule must combine your noisy memory signal with this knowledge of the statements’ truth values. We can think of this knowledge \(K\) as just one more piece of ‘data’ that gets incorporated into the formula. The posterior we computed above is the new prior distribution that gets multiplied by a new likelihood \(P(C \mid K)\) to get a new posterior \(P(C \mid K, D)\).

What is this new likelihood function \(P(C \mid K)\)? It’s simply the linear Beta(1, 2) or Beta(2, 1) distribution, depending on whether the statement is false or true, respectively. That is, learning that the statement is false should lead you to believe that you’re more likely to have given lower estimates than higher ones, and vice versa if you learn that the statement is true.

Let’s see what the incorporation of this knowledge does to your original estimates. To simplify things, we can imagine that the knowledge you receive is always that the statement you were estimating is false. For simulations in which the statement was true, we can just flip your probability estimates as we did earlier (e.g., 70% becomes 30%).

Again, your average guess of what you initially said is not a simple linear function of your original estimate; original values above 33% get shrunk towards the truth (0%), while original values below 33% get inflated away from the truth. Although your guesses do exhibit a hindsight shift towards 0% relative to your guesses without knowledge \(K = 0\), your expected error — which is just the average retrospective guess in this plot — remains at 33%. Thus, there is no hindsight bias relative to your original judgments. Your ‘remembered’ estimates clump around that 33% number more than your actual estimates, but the average remains the same.

Summary

Even if some of the mathematical details in this section were opaque or glossed over quickly (see my code in the footnote if interested in a more thorough exploration), I hope I’ve convinced you that Bayes’ rule is not magic. When information is lost, key properties of the original distribution of responses should be preserved. A so-called ‘rational’ model preserves the mean of the distribution and its average error, which precludes a predictable hindsight shift in the direction of future knowledge.

Of course, it’s reasonable to question whether the kind of idealized model I’ve summarized here is cognitively plausible. I have focused on an ideally rational model because this is often the focus of arguments attesting to the rationality of hindsight bias. In the next section, I’ll consider an adjacent phenomenon that could more plausibly be viewed as rational.

Hindsight bias vs. outcome bias

I suspect that much of the confusion over the rationality of hindsight bias can be traced to some ambiguity in what the phrase means. I have taken this bias to be about a person’s reconstruction of one of their past estimates upon learning the truth of what was being estimated. But the term is often conflated with “outcome bias” — a more general phenomenon whereby learning the outcome of a decision can influence one’s belief about the rationality of that decision.

These concepts are closely related, but also importantly different. Imagine I’m watching the Patriots take the risky decision of going for a touchdown on fourth down rather than kicking an easy field goal for three points. Before they run the play, I have a slight hunch that this is a foolish decision, but am mostly unsure whether it’s wise, not knowing much about the analytics that the team is using to assess the probability of success. Then I watch the play happen. The Patriots try to run the ball down the middle of the field and get stopped well behind the goal line, forcing a turnover. Frustrated, I think to myself, “Wow, what a stupid play! I knew that was a terrible decision!”

My reaction exhibits both hindsight bias and outcome bias. The second statement — “I knew that was a terrible decision” — is an instance of hindsight bias. Although I was actually uncertain about whether this was a good decision, I am now convinced that my past self “knew all along” that the fourth-down attempt was foolish. In contrast, the first statement — “what a stupid play” — is an instance of outcome bias, as it targets the quality of the decision to run the play, not my beliefs about it.

As we have seen, hindsight bias can’t be rational in the strong sense we’ve been considering. But outcome bias can be. The critical difference lies in my original uncertainty about the quantity of interest. Before the Patriots run their play, there is no ambiguity about what I believe about the strength of the decision. (Or maybe there is; hold that thought.) If I were in a hindsight bias experiment, I would write down a confidence value like 55%, which indicates my “slight hunch” that the Patriots would be better off kicking the field goal. If I then remember this number as much closer to 100%, we can point to a clear divergence between my true past belief and my erroneous reconstruction of it.

In contrast, prior to the play, I don’t know whether going for the touchdown is a wise move because I don’t have introspective access to the ideal decision model that would indicate the expected payoffs from kicking a field goal versus going for the touchdown. Indeed, my stated 55% belief indicates that I am unsure what this model would say. Given my initial uncertainty about whether this is a smart choice, Bayes’ rule implies that the outcome of the decision (the Patriots’ failing to score the touchdown) should lead me to update my beliefs about what the ideal model would recommend. Granted, even a wise decision leads to failure plenty of the time, so this evidence may be extremely weak, and I should not overweight it. But some updating is rational whenever there is uncertainty.

Hindsight bias for your inner crowd

In light of this contrast between hindsight bias and outcome bias, we might consider a kind of bias that lies between these extremes. Hindsight bias for your stated opinion isn’t possible because this stated opinion was stated without ambiguity before you forget what you wrote down. But, as Kevin Dorst notes, there could be a kind of ‘hindsight bias’ for a latent belief state you have that’s not fully accessible to conscious introspection.

The idea is that when you indicate that you’re 32% confident that there are more than 200 countries in the world, you’re uncertain about whether that’s your ‘true’ confidence. For example, perhaps you actually are 30% confident or 34% confident, but you can’t perfectly tap into this state.

I must admit that I find Dorst’s idea of a ‘true’ belief somewhat metaphysically dubious. But there’s a version of the idea that seems more palatable and even has some empirical support. Perhaps when you give your confidence of 32%, you’re actually sampling a number from a noisy probability distribution over confidences you might have. So if I asked you to guess again, you might ‘sample’ a slightly different confidence. Indeed, this has been tested, and it turns out that the average of these two guesses is (on average) more accurate than either guess on its own. In other words, there’s a “crowd within” your own mind that’s a bit smarter than the version of yourself that appears on paper.

Although this is a small effect and your different sampled guesses are unlikely to vary much, it’s nevertheless possible for you to be uncertain about what your inner crowd thinks even right after you give a confidence like 32%, as this inner crowd’s ‘belief’ is an average over a large number of hypothetical confidences you might give for this question. To the extent that you are unsure what this average would be, there can be rational ‘hindsight bias.’ However, the effect should be very small given the limited variance in samples. Moreover, it’s not really much of a flex. Should anyone care that there’s a version of you that’s secretly smarter than the version of you that others observe based on your stated responses? This version of hindsight bias is essentially just an acknowledgement that your estimates would be a bit more accurate with more time to think and, therefore, more time to sample a variety of possible confidence values. But we live in the real world. The accuracy that usually matters is the one you exhibit in real life, and that is also the accuracy that is reasonable for you to benchmark against others.

Reasonable or not, maybe people do think of themselves in this idealized way when committing hindsight bias. They believe they have a ‘true’ self that’s a little bit better than the self that is visible to others — a self that could implicitly see how stupid it was to go for fourth down rather than take an easy three points on a field goal. This kind of reasoning seems to come naturally to us, but I’m not convinced it’s worth rationalizing.2


  1. Remember that we’re using the mathematical sense of “expect” here, which is simply an average over errors given the two possible answers to the question. You know that your error will either be 32% or 68%, so in a colloquial sense, you don’t expect 44% — in fact, you know that your error must be a different number.↩︎

  2. Code available here.↩︎

Avatar
Adam Bear
Research/Data Scientist