Don't turn the LSAT into a lottery

Imagine you have a friend who boasts about his remarkable success in predicting the outcome of baseball games. He brags that he correctly predicted all but two of the winners of last month’s Yankees games. What he neglects to tell you is that he had much less success the month before, and an intermediate amount of success the month before that.

No doubt, your friend’s predictions last month are impressive and demonstrate real skill. But it’s also clear that, by selectively reporting only his best month of predictions, he exaggerated his prediction ability and underemphasized his luck. If he’s asked to predict next month’s ballgames, you’d be surprised for him to replicate this exceptional performance.

Like sports prediction, standardized testing is subject to measurement error. If a person takes a standardized test multiple times, she will sometimes do better or worse just due to random chance, without a change in underlying ability. Moreover, this test taker will have an incentive to stop testing once she gets a flattering score. But, as in the example above, this score is probably not the best representation of the tester’s underlying ability.

Don’t take the best

Surprisingly, as I find myself preparing to apply to law school by taking the Law School Admission Test (LSAT), I often hear from blogs and social media that law schools only care about your best score. While I suspect this is an oversimplification — some schools explicitly state otherwise on their websites — schools do seem to have an incentive to focus on applicants’ max scores, as this is the score that ultimately matters for law school rankings.

This is unfortunate. The LSAT is already a noisy test, especially for scores that are far from the population average. Indeed, the 95% confidence interval for a single score spans over 10 points on a 120 to 180 scale. So, even if one’s ‘true’ LSAT ability were a perfect measure of preparedness for law school (which nobody believes), one or two observed scores is likely to deviate at least a few points from that test taker’s expected long-run performance, and it wouldn’t be surprising for it to deviate even more than that. To the extent anything about the LSAT matters, it’s the expectation (i.e., average) of an applicant’s distribution of scores, not a lucky score in the right tail of the distribution. Indeed, on their website, the LSAC (the council that administers the LSAT) “advises law schools that a candidate’s average LSAT score is the best predictor of their ability.”

Of course, an applicant’s best score will be correlated with her average, but not in a straightforward way. Of most concern, the maximum of a probability distribution will typically increase with more observations. Based on LSAC’s previously estimated within-person variation in test scores (assuming no change in true ability), simulations reveal that taking the test three times is expected to increase an applicant’s best test scores by over two points, and taking the test five times is expected to increase it by over three points.1 And these are just average increases in reported best scores. Individual lucky applicants could benefit much more than this, all without improving their underlying test ability at all.

While test-prep companies and tutors may benefit from this system, it’s unfair and wasteful. Wealthier, more privileged applicants will be better positioned to pay for multiple attempts and have extra opportunities to study. Moreover, the additional time investment of retaking the test over the course of months is wasteful for everyone. Even if there’s some benefit to studying for the LSAT to improve logic and reading skills needed for law school, there’s no benefit to compelling applicants to keep studying at the point of diminishing returns, solely in hopes of getting a lucky score.

For all of these reasons, if law schools want to look at only the applicant’s best score, they ought to compensate for the effects of luck by penalizing multiple attempts (e.g., by subtracting two points from the max score for every extra test). This will increase the odds that a score improvement represents an actual change in ability. But I believe there’s an even better approach.

A modest update

An obvious alternative is to look at the applicant’s average score, as the LSAC recommends. Unfortunately, while this approach is better than taking the maximum, the observed average is also likely to be an overestimate of the applicant’s underlying ability. If applicants are allowed to take the test many times over, they can stop retaking it as soon as they hit the score they want. What looks like improvement in ability over time could, therefore, just be selection bias.

Ideally, then, applicants wouldn’t have discretion to choose the number of times they take the test (at least within a single application cycle). If everyone took the test only once per cycle, everyone’s test scores would be an unbiased estimate of their underlying test average.

At the same time, it’s understandable that people have bad days, and one wildly unlucky test score shouldn’t destroy one’s hopes of getting into law school. Let’s find a compromise.

I propose that law schools advise applicants to take the LSAT only once, unless this first test is far below — say, at least four or five points — what the applicant expected to get. Except in extreme circumstances, no applicant should take the test more than twice in a given year, and fee waivers should be granted for this second attempt, as well, so this option is available to everyone. If the applicant does retake the test, the second score should be given more weight than the first, since the applicant only chose to retake the test because her first test was unrepresentative of her ability.

A stable equilibrium

Is the above proposal stable, or could it be gamed by applicants? To answer this question, we need to consider a little game theory. Granted, I’m not suggesting that law schools plug scores into Bayes’ theorem or try to solve for a Nash equilibrium. We can take a more qualitative approach to show why applicants wouldn’t typically have an incentive to retake the test more than specified.

Here’s the critical idea: applicants have private knowledge about their LSAT ability that law schools don’t have. Most applicants will have taken several practice tests, and if these tests were taken under realistic conditions, applicants should be able to estimate their true average ability to within a few points by the time of the actual exam. By giving applicants the option (but not a requirement) to retake the test, law schools can encourage applicants to implicitly reveal this private information, thereby mitigating effects of measurement error.

Consider the asymmetric incentives. Applicants want law schools to believe they have the highest ability possible. Law schools want to estimate applicants’ testing ability as accurately as possible. Law schools know that the first test may be a lucky or unlucky draw from a noisy distribution centered on the applicant’s true mean. But because law schools lack applicants’ private knowledge about their true ability, they don’t know what counts as (un)lucky for any given applicant. They do know, however, that applicants who believe they were unlucky because they got an unexpectedly low score will be more incentivized to retest. For such applicants, their second score is likely to regress back to their true, higher average score. By the same logic, those applicants who got a higher-than-expected first score will not want to retest, as their second score would probably fall, and law schools would take note of the drop.

What about applicants who first score near, but a bit below, their true (estimated) average? If retaking the test were costless, applicants might as well take the test again, as their second score would likely boost them at least one or two points. But, under my proposal, law schools are aware that the test has substantial measurement error. Indeed, the LSAC cautions that law schools should not “place excessive significance” on small score differences between applicants, which could be as much as three or four points at minimum. If schools take this perspective, applicants should only want to suffer the pain of retaking the test if they believe their first score is more than a few points below their true average.

Hence, to a first approximation, law schools and applicants would be in equilibrium. Law schools tell applicants to only retake the LSAT if they scored well below their average on their first test, and applicants don’t have an incentive to deviate from this arrangement because the second score will only help them if they expect to score substantially higher the next time. Moreover, law schools accurately acknowledge the role of luck in any given score and don’t put too much weight on differences of a few points, which discourages applicants from wasting time trying to squeeze out one or two extra points.

(For more mathematically inclined readers, I provide further analysis of this approximate equilibrium in the appendix.)

Conclusion

To the extent the LSAT should matter for admissions at all, law schools ought to use scores in a way the test designers intended. Applicants’ maximum scores will be highly sensitive to luck, leading schools to make admissions or scholarship decisions on the basis of randomness rather than skill. Moreover, more advantaged applicants can game the system by taking the test multiple times. Ideally, then, schools should discourage retakes by trying to infer applicants’ true average performance. Retakes should be rare and should be interpreted strategically, as a signal that indicates an applicant believes she can perform much better a second time.

On its face, it may sound more generous for law schools to focus on applicants’ best LSAT scores. Nervous test takers can go into the exam knowing that they can retake the test if they have a bad day. They might even take the test before they feel ready, treating it as just a ‘warm-up’.

But this policy isn’t actually forgiving to anyone. By using applicants’ best LSAT scores, schools must raise their standards for what constitutes a competitive score. On my proposal, applicants go into the test knowing that they only have to score within a few points of their practice average — something that should feel attainable. Isn’t that preferable, and more honest, to repeatedly playing a noisy lottery in hopes of achieving an unexpected and unrepresentative high score?

Appendix

\(\newcommand{\avg}{\mathbb{E}}\)In this appendix, I walk through some of the math of the toy game-theoretic model I propose and show how there can be an approximate Nash equilibrium in which applicants only retake the LSAT if their first score falls below roughly three points from their estimated average ability.

We’ll assume that scores follow a discrete normal distribution, censored at 120 and 180, with a standard error of measurement of 2.6, which has been cited by the LSAC in the past. In practice, this standard error will be larger for extreme scores, but we’ll focus on more typical score ranges. (Needless to say, someone scoring around 120 should probably retake the test, and someone scoring around 180 won’t need to retake.) Shown below are a couple of examples of the likelihood of getting different scores given underlying ability (what I’ll call \(\theta\)) of 148 vs. 162.

We assume that an applicant’s likelihood of retaking the test will decrease as the first score rises above the applicant’s estimated true ability. That is, an applicant will be more likely to retake the test if the score seems unlucky. Since probabilities must be bound at 0 and 1, the logistic function is a natural choice to represent this probability. Given all the complications involved in the decision to retest, along with the applicant’s own uncertainty about her true ability, we want the probability to change gracefully over the span of a few points, which is controlled by the logistic function’s temperature parameter. For the sake of my example, I somewhat arbitrarily choose a temperature of 4, but results shouldn’t vary too much for similar values. The probability of retaking the test looks like this for an applicant with an indifference point at 157:

One final simplifying assumption: we’ll stipulate that all abilities (\(\theta\)) are equally likely ex ante. This isn’t true in real life, as LSAT scores across test takers are known to follow a normal distribution centered around 150 with a standard deviation of around 10. But applicants to law schools are not random samples from the population of LSAT test takers. Harvard Law School, for example, will receive a lot more applications from high scorers than average scorers. Given that these selection effects will somewhat cancel out the base rate effects, we’ll stick with the uniformity assumption for the sake of this example.

Testing the proposed equilibrium

We now consider whether there is an approximate Nash equilibrium in which

  • Law schools expect applicants to be indifferent to retaking the LSAT when their first score is around three points below their estimated true ability \(\theta\), with the odds of a retest increasing gradually (according to the logistic function above) as the first score falls below this threshold.

  • Given law schools’ assumptions of applicant behavior, applicants whose first score is exactly three points below their \(\theta\) are, in fact, roughly indifferent between retesting and not retesting.

We assume that law schools’ goal is to accurately estimate applicants’ underlying \(\theta\). Thus, their ‘payoff’ is maximized if their model of applicants’ decisions is correct. To explore whether the above description is an equilibrium, we can imagine that applicants assume law schools believe their retake threshold is three points below their \(\theta\), and then we can assess whether applicants would have an incentive to deviate from this indifference point, given law schools’ assumed beliefs.

In particular, we assume law schools update their beliefs about an applicant’s \(\theta\) according to Bayes’ rule. Suppose an applicant’s first score is 157. (The same logic would apply for a different score, so long as the relative differences are the same.) If this applicant decides not to retake the test, the law schools’ posterior belief would be proportional to the product of two probabilities:

\[ P(\theta \mid s_1 = 157,s_2= \_) \propto P(s_1 = 157 \mid \theta)P(s_2=\_ \mid \theta, s_1). \]

That is, the probability of the applicant having any particular \(\theta\) value is proportional to the product of (a) the probability of observing a 157 given that particular \(\theta\) and (b) the probability of deciding not to retake the test,which we denote as \(s_2 = \_\), given \(\theta\) and the first score. This latter probability is given by the logistic function. Law schools assume that the probability of retesting would be exactly 50% for someone whose \(\theta\) is 160 (i.e., three points above the first score). Under these assumptions, the posterior probabilities of \(\theta\) given a first score of 157 would look like this:

The applicant who is deciding whether to retest after a 157 score can infer what the law schools would come to estimate as her \(\theta\) by computing an expected value:

\[ \avg[\theta \mid s_1=157, s_2=\_]= \sum_{x=120}^{180}P(\theta=x\mid s_1=157, s_2 =\_)*x. \]

In this example, the schools would infer \(\theta \approx 156.5\). Note that this is slightly below the test score of 157 because of the information contained in the applicant’s decision not to retest. An applicant with a \(\theta\) much higher than 157 would be motivated to retake, and so the fact that the applicant didn’t retake is evidence against \(\theta\) values much larger than 157.

Is this a good or bad outcome for the applicant? It depends, of course, on her private knowledge about her ability \(\theta\). If she went into the test believing \(\theta = 152\), the first score of 157 is a positive surprise. If \(\theta = 160\), it’s a bit worse than expected. Indeed, this is precisely the \(\theta\) at which we assume the applicant is indifferent to retesting.

To test for indifference, the applicant needs to forecast what the law schools would infer after her second test without knowing what this second score would be. That is, the law schools’ posterior about \(\theta\) following a retake would be proportional to the product of three terms, the last of which is unknown at the time of deciding whether to retest:

\[ P(\theta \mid s_1 = 157,s_2\neq \_) \propto P(s_1 = 157 \mid \theta)P(s_2 \neq \_ \mid \theta)P(s_2 = \text{?}\mid \theta) . \]

The applicant can use her private estimate of \(\theta\) (which we’re imagining is 160) to estimate the probability of different values of \(s_2\). She uses these probabilities to take a weighted average over all possible expectations that the law schools could have for \(\theta\) after the second score is revealed:

\[ \avg[\theta \mid s_1 = 157,s_2\neq \_]=\sum_{x=120}^{180}P(s_2=x \mid \theta=160)*\avg[\theta \mid s_1 = 157,s_2= x]. \]

Following this procedure, we find that the law schools’ expected belief about \(\theta\) is now just below 159 — a couple of points better than the 156.5 from not retaking.

If retaking the test did not require time, money, or stress, then this situation is not an equilibrium: our applicant has an incentive to retake the test if she scored three points lower than what she believed to be her true ability (160). But retaking the test does have costs, which could easily be ‘worth’ two or three expected points. Indeed, given the large standard error of measurement, we’re assuming that law schools shouldn’t make a big deal of a two or three point difference in scores in the first place. In this world, the cost to retake is probably equivalent to at least three points. Thus, our applicant would be roughly indifferent to retesting after a 157 score.


  1. This is assuming a 2.6 standard error of measurement, cited by LSAC in the past. This standard error is even larger for exceptionally high or low scores, so these estimates are conservative.↩︎

Avatar
Adam Bear
Research/Data Scientist