While O. J. Simpson was on trial for murdering his ex-wife, his lawyer Alan Dershowitz argued that Simpson’s history of domestic abuse against her was irrelevant because only around 1 in 2500 domestic abusers go on to murder a spouse. Given this small probability, Dershowitz claimed, Simpson’s record of abuse provided scant evidence that Simpson was guilty of murder.
But there was a key piece of background information missing from this argument: Simpson’s wife had been murdered! Even without the knowledge of Simpson’s history of abuse, this background information would raise the base rate of guilt to around 1 in 4. And once the domestic violence is taken into account, the appropriate probability that the spouse is responsible for murder is around 4 in 5. Not quite “beyond reasonable doubt,” but more than enough to establish Simpson as a prime suspect even without additional forensic evidence.
It’s easy to forget that statistics from the general population often do not apply in the courtroom. Criminal trials are statistically unusual events, and naive appeals to population base rates ignore critical information. Indeed, in a blistering critique of Bayesian approaches to legal process, Laurence Tribe highlighted numerous ways in which statistical evidence can be misapplied in trial because of such confusions. For example, if jurors reasonably assume at the beginning of a trial that the defendant is more likely than a random person off the street to be guilty of the crime of which they are accused, this presumption should weaken the persuasiveness of certain evidence presented during the trial. A simplistic use of base rates, however, could lead jurors to double-count evidence and thereby to overestimate the defendant’s probability of guilt.1
Here, I want to talk about an even more extreme example of background information influencing the strength of new evidence. In a courtroom, demographic information about a suspect that would normally point in one direction could actually point in the opposite direction — an example of Simpson’s paradox.2
Paradoxical selection effects
Demographic groups sometimes vary on traits that are relevant for assessing guilt or culpability. For example, adolescents are known to be, on average, more impulsive and susceptible to peer influence than adults. Given that a lack of developmental maturity is outside of a person’s control, legal scholars have argued that age should factor in to judges’ sentencing decisions, all else equal. The callow adolescent who commits murder may be less deserving of a life sentence than an adult presumed to be more capable of slow, rational judgment.
This line of reasoning depends on an association between age and cognitive maturity in the general population. Can we assume that this association holds within the population of people on trial? While I’m not familiar with any data sets that explore this question, there’s a subtle statistical reason to be cautious.
Can smoking save a baby’s life?
Consider an analogy from epidemiology. It’s well-known that pregnant women should not smoke, as smoking can harm the fetus and increase the likelihood of health problems in her child. And yet, at some point, epidemiologists found data that apparently told a different story. Low birth-weight babies in a particular hospital were more likely to survive if their mothers smoked. That is, among this group of babies, having a smoking mother was positively associated with a baby’s survival.
But this data doesn’t show that smoking helps save a baby’s life. This positive association between maternal smoking and survival has a non-causal explanation, which depends on critical contextual information about the population of babies studied.
Low birth-weight babies are different from typical babies in that they are unhealthier than average and, therefore, have a heightened risk of death. Because smoking is harmful to the fetus, babies from smoking mothers will be overrepresented in the subsample of low birth-weight babies, relative to the general population. Nevertheless, the underrepresented set of low birth-weight babies who come from non-smoking mothers will also be unhealthy because of genetic abnormalities, complications from pregnancy, or other non-smoking causes.
It turns out that the non-smoking factors that contribute to low birth-weight tend to be even more harmful than smoking. A baby who ends up in the NICU despite not having a smoking mother is therefore likely to have an even higher risk of death than a baby with a smoking mother in this situation. But the babies with smoking mothers still have a higher risk of death than they would have had if their mothers had not smoked.
In other words, the following two things can both be true: (a) maternal smoking increases babies’ probability of death; yet (b) learning that a baby from a high-risk population has a mother who is a smoker is a positive signal for the baby’s chance of survival. This latter ‘paradox’ is due to selection bias from sampling only high-risk babies. In the general population, the association would reverse; that is, having a smoking mother is a negative signal for the baby’s chance of survival.
Reversed demographic associations for criminals
Now let’s consider a statistically similar situation in the legal system. Just as smoking mothers generally increase the odds that their babies will have low birth-weight and die, being young is generally associated with cognitive immaturity and a higher crime rate, which is in part due to this immaturity. But what happens if we restrict our sample to low birth-weight babies or, analogously, crime-prone individuals? In the smoking case, the association between smoking mothers and dying babies reverses: smoking now has a seemingly protective effect on the baby’s health. Could the baseline association between youthfulness and cognitive immaturity also reverse among criminal defendants?
To see how this is possible, we need to consider that there are several causal pathways by which youthfulness contributes to crime. Not only do younger people have less developed brains, which can make them more impulsive and prone to peer pressure, but they are also situated in environments that facilitate crime. For example, compared to adults, children and teenagers tend to have more free time and fewer responsibilities from factors like work or children. They are also surrounded by young peers who commit crime at higher rates, which will increase the probability that any given adolescent gets involved in crime through social contagion and group hangouts, rather than internal features of their psychology. This effect will be exacerbated in high-crime neighborhoods, where even an unusually mature teenager might get swept up in criminal activity (or activity that the police believe is criminal) because his friend group is involved in crime.
In short, although cognitive immaturity is clearly a major reason that adolescents commit crime at higher rates than adults, it is far from the only factor. Therefore, among the young, the correlation between cognitive immaturity and crime may be relatively weak, as this relationship is swamped by environmental factors. The typical teenager who is charged with a crime may be cognitively similar to the typical teenager from the general population, just as the typical baby from a smoking mother who has low birth-weight may have a similar health profile to the general population of babies from smoking mothers.
In contrast, the typical adult is less prone to committing crimes for all of the reasons mentioned above. They tend to have busy jobs, children to look after, fewer criminal peers, and so on. They also tend to be more cognitively mature and less impulsive. This tells us something about the subset of adults who nevertheless do commit crimes: they are very likely to be unusual for their demographic group, just as low birth-weight babies from non-smoking mothers must be unusually unhealthy compared to the set of all babies from non-smoking mothers. Specifically, such an adult is likely to be far less cognitively mature than a typical adult.
If criminal teenagers are cognitively similar to average teenagers, but criminal adults are cognitively dissimilar to average adults, we can end up in the situation depicted below. Cognitive maturity in the general population is represented by the two bell curves. Because adolescents are, on average, less cognitively mature than adults, the red curve is slightly offset from the blue curve. However, there is substantial overlap in maturity between these two populations. An adolescent with superb grades and SAT scores may be more cognitively mature than a typical adult, and some adults may be far more impaired than a typical teenager. The average adult who commits a crime will be much more likely to be impaired, as indicated by the dotted blue line to the left of the center of the blue curve. On the other hand, the average adolescent charged with the same crime will be only slightly impaired relative to a typical adolescent, as indicated by the dotted red line. Thus, even though the blue curve is far to the right of the red curve, the blue dotted line is to the left of the red dotted line, indicating less cognitive maturity for the criminal adult.
Of course, without data, we cannot confirm that the association between cognitive maturity and adolescence reverses in this way. But, as the real-world low birth-weight example illustrates, it is a plausible state of affairs. We cannot assume that demographic relationships from the general population mimic those in the highly selective population of criminals.
The burden of proof
In the aforementioned paper, Faigman and Geiser suggest that group-level differences in cognitive maturity between adolescents and adults should shift the burden of proof that judges use to sentence different age groups for the same crime. These authors argue that, because developmental psychology and neuroscience has shown a general trend towards greater cognitive maturity with age, judges should presume that any given adolescent found guilty of a crime is less likely to be deserving of harsh punishment than an adult found guilty of that same crime — even though individual factors about specific defendants could undermine these presumptions.
Their proposal is summarized in the table below. Based on statistical facts about general age-related cognitive shifts, the authors propose that the state should have to prove that defendants under 24 are developmentally mature, whereas the defense should have to prove that defendants 24 and over are not developmentally mature. Moreover, the evidentiary standards are strongest for the most extreme ages.
But why should the burden of proof be set by an association found in the general population rather than the population of criminal defendants (or, perhaps, criminal defendants found guilty of the specific crime in question)? As we have seen, these associations need not be similar; in fact, they may reverse. The average adult found guilty of the crime in question could be more cognitively immature than the average adolescent. If scientific studies confirmed this fact, shouldn’t the burden of proof depend on this most specific group-level information that is available?
Importantly, I am not trying to intimate that adolescents deserve harsher punishment or a stricter burden of proof than Faigman and Geiser propose. After all, as the smoking analogy suggests, the typical adolescent found guilty of a crime will be even less cognitively mature than the typical adolescent from the general population, who is himself prone to all the developmental shortcomings that have been scientifically established. Moreover, as the authors convincingly note, there can be other justifications to err on the side of leniency for youth defendants, such as their greater potential for rehabilitation or our greater uncertainty about their developmental trajectory.
Instead, maybe we need to shift the burden of proof for adult defendants to a similar standard as adolescents. The typical adult defendant is probably not developmentally mature for his age group. His judgment and impulsivity may more closely mirror that of a noncriminal adolescent than that of a noncriminal adult. And this cognitive immaturity likely played a role in the decision to commit a crime, which could provide a basis for less retributive punishment.
Granted, just as there are independent reasons to be more lenient with young defendants, there are potentially independent reasons to be harsher with older defendants in spite of their cognitive deficits. For one thing, an adult’s impairment may be more likely to have been caused by poor decisions earlier in life, rather than genetic or early environmental factors that would typically be viewed as outside of one’s control and, therefore, mitigating. That is, in early adulthood, the adult defendant may have been in a mature enough state to decide to make reckless decisions like abusing drugs or driving dangerously, which led to brain damage later on. If the cognitive impairments that contributed to crime were in part a reflection of brain injury that was caused by the adult’s prior ‘free will,’ the court might reasonably judge this impairment differently than that of an adolescent who simply hasn’t aged into maturity. Likewise, an adult in an impaired state is less likely than an adolescent to become more cognitively mature in the future, thereby undermining arguments for potential rehabilitation based on brain plasticity.
Conclusion
I am a big supporter of using scientific and statistical approaches to improve the justice system. Unfortunately, as Tribe noted over 50 years ago, the challenges of doing so are often more subtle than they initially appear. The issues with selection bias and age that I have highlighted here are not unique. In general, the more unusual it is for a group to commit a crime, the stronger is the inference that the suspect is fundamentally different than the typical individual from that group on traits like cognitive maturity. Moreover, cognitive maturity itself is a multifaceted trait, which relates to legally relevant concepts like intentionality and culpability in complex ways. The issues I’ve raised here are just hypothetical, but I hope they encourage more empirical exploration into the unique statistical regularities that arise in the courtroom.
Here’s an example inspired by Tribe’s paper. Suppose there’s been a hit-and-run accident involving a bus driver. Based on camera footage of the driver, the police have narrowed down the suspects to four possible drivers. However, only one of the four drivers works for the Blue Bus company, which operates 90% of the buses on the route where the accident occurred.
If, before considering the information about the Blue Bus company, the police started with a 25% prior belief (i.e., 1:3 odds) that each of the drivers is guilty of the crime, then the information that one of the suspects drives a Blue Bus should raise the 1:3 prior odds by an approximate likelihood ratio of 9 that this particular driver is guilty (because Blue Bus drivers are 9 times more likely than non-Blue Bus drivers to have driven on the route of the accident), resulting in posterior odds of 3:1 (75%) in favor of guilt. Suppose that, on the basis of this 75% posterior belief, prosecutors choose to charge the Blue Bus driver.
Now take the jurors perspective. Even before encountering any evidence, jurors might presume that the suspect has a relatively high chance of guilt on the basis of the fact that the suspect is being prosecuted. For example, suppose jurors start with a 50% prior belief (1:1 odds) that the suspect is guilty, simply because the suspect is on trial, and usually prosecutors only bring charges against people for whom there’s reasonable evidence of guilt. During the trial, the prosecution introduces the evidence that 90% of buses along the route of the accident are operated by the Blue Bus company. If jurors treated this evidence as independent of their prior beliefs, then they’d multiply their prior odds by an approximate factor of 9 and end up with posterior odds of 9:1 (90% porbability) in favor of guilt — considerably higher than the prosecutors’ assessment.
The problem is that the jurors’ prior belief already implicitly incorporated evidence that prosecutors gathered that induced them to bring charges. Thus, the jurors cannot treat new statistical evidence presented in court as if it were completely novel. Unfortunately, as Tribe points out, it would be quite complicated to try to correct for this bias, even if jurors were trained statisticians.↩︎
No relation to O. J.↩︎