I recently read a book about how Big Data can help you “get what you really want in life” and avoid over-relying on your gut instincts when making important choices like whom to marry or what hobbies to take up. As the author suggests in this interview, the goal of the book is not to tell people to completely ignore their intuitions, but to get them to adopt more of a Moneyball approach to life decisions based on the latest rigorous Big Data analyses.
At a big-picture level, I enthusiastically support this approach. I am a burgeoning data scientist myself and am optimistic about using the latest cutting-edge machine learning techniques on Big Data to improve medical diagnosis, event forecasting, and so on. Moreover, I am not one of those Panglossian psychologists who believes that people’s gut decisions are consistently rational or as close to optimal as possible. I have no doubt that people make systematic reasoning errors that careful scientific study can help uncover and improve upon.
Nevertheless, some of the central claims of this self-described “self-help [book] for data geeks” made me bristle. The scientific work that inspired these claims seemed perfectly rigorous; the self-help advice that the author dispensed did not. Specifically, the author wanted to draw strong causal claims from what was ultimately observational data — no matter how big this data was or how sophisticated the statistical analyses that were run on it.
Most scientifically-minded people have heard the adage “correlation does not imply causation,” yet are not aware of the myriad ways that even the most rigorous-seeming Big Data techniques on non-experimental data can fail to deliver causal truths or even outright mislead one into drawing the exact opposite conclusions they should be drawing. Indeed, there is a voluminous academic literature on how to properly draw causal inferences from observational data, using arcane mathematical tools like “directed acyclic graphs.” These tools are great for hardcore researchers, but not so great for typical consumers of self-help books. Fortunately, I believe it is possible to appreciate the crux of many of these issues without relying on scientific jargon or fancy math.
For the sake of this discussion, I want to focus on a particular fallacious style of inference that is most pertinent to the book I read and, more generally, to consumers of self-help advice. When trying to optimize a personal decision, we seek out answers to questions like
What features of the options I’m currently considering are most impactful for my happiness, satisfaction, or whatever else I might be optimizing?
What options would other people in my current situation typically benefit the most from choosing?
In short, we want to focus on features of our decision that are most relevant to whatever we’re optimizing and pursue options that other people (or our past selves) would find the best.
Here’s the problem, though. It seems like we might be able to answer the first question by looking just at the options people have already chosen and assessing how the various features of these options predict whatever outcome we care about. It also seems like we might be able to answer the second question by simply asking people how satisfied they are after making their choices. But neither of these strategies effectively answers the causal questions we actually care about. In most cases, we cannot prospectively evaluate options or features of those options by retrospectively considering only those options that have been chosen.
Let’s explore the issues with each of these strategies by considering two topics discussed at length in the book: choosing who to date and choosing what to do for fun. For the sake of brevity, I simplify these examples a little bit (e.g., focusing specifically on height as a dimension of interest for dating), but the core problems generalize more broadly.
Picking a romantic partner (based on height)
Many people care about the height of their (potential) romantic partner. On average, heterosexual women tend to prefer taller men.1 In fact, on dating apps, some women filter out men who fall belong a stringent threshold, such as 6 feet. This is quite a costly behavior: only around 15% of men are above this threshold. Presumably the people who factor height into their decisions of whom to date do so because they believe that they will be more satisfied in a relationship with a taller partner than with a shorter one, all else equal. If not, these hapless singles would be shrinking the available dating pool to a massive degree for no benefit.
But what if it turned out that women in heterosexual relationships would be just as satisfied with their male partners if these men were to lose a few inches? The initial fixation that many of these women had with the their partner’s height would seem silly. They threw away so many potential matches on the basis of a feature that doesn’t influence their happiness at all!
How might we find out whether partner height affects happiness? In a magical world, we might run an experiment where some men in heterosexual relationships lose or gain a few inches, and then we observe how this influences their partners' ratings of relationship satisfaction, relative to a control group of men whose height is unmanipulated. We can’t do this. Instead, perhaps we can look at whether the height of one’s partner predicts relationship satisfaction. Is a woman with a 6'2" male partner more satisfied, on average, than a woman with a 5'8" partner? A Big Data analysis along these lines has been conducted, and — surprise! — height doesn’t seem to matter much at all.2
Should all the single women who are currently fixated on their future partner’s height second-guess their decisions? Is there a massive arbitrage opportunity in the dating market for short men who have been getting passed over by many women for no good reason?
Maybe. But the kind of analysis described above shouldn’t convince us that this is the case. To see why, we need to consider several possible explanations for the variation in the height of women’s male partners. Why do some (heterosexual) women have taller partners than others?
The answer is surely quite complicated, so let’s see how we immediately run into trouble even if we imagine a far simpler world of dating in which women follow strict rules about which men they are willing to entertain as partners. Specifically, suppose that each woman on the market sets some minimum height threshold their partner must meet in order to even be considered. Different women can set different thresholds, and they can also optimize for other traits among the subset of men who are tall enough.
In this fictional world, we could only measure the relationship satisfaction of women who have partners who exceed their personal height threshold. Remember, these (made up) women are never willing to date men below this threshold, so it is impossible to ask about relationship satisfaction for relationships that couldn’t exist. This poses a problem for any observational analysis of our data that naively looks at the statistical connection between partner height and relationship satisfaction. At most, we might find that height doesn’t seem to affect relationship satisfaction among women in relationships, who have already filtered out possible partners who do not meet their height threshold. This is far from implying that height doesn’t matter to relationship satisfaction at all! It is akin to saying that the skill of professional pilots doesn’t predict the likelihood they will get in a plane crash. If we found such a pattern among professional pilots, we wouldn’t draw the inference that flying skill is irrelevant to safety and that we therefore ought to allow anyone to fly a plane. Rather, there’s just a stepwise relationship between flying ability and crash risk: once a pilot reaches a certain level of skill, which is carefully selected to maximize passenger safety, skill is irrelevant. Partner height could plausibly work similarly: perhaps it would influence relationship satisfaction for the worse if women removed their stringent height requirements and entertained relationships with men who fell below their threshold. Our data simply can’t tell us what would happen if these women stopped filtering partners by height.3
OK, but wouldn’t our data at least show us that height doesn’t matter above the stringent height thresholds? That is, if we took every woman’s current partner — who was already selected to be above a certain height — and added a few inches to him, can we be confident that this manipulation would not increase average relationship satisfaction? (Imagine these women woke up with amnesia and forgot their partner’s old height after this intervention occurs, so they aren’t distressed by what has just happened.) No, we cannot even be sure of this.
Let’s think about what a lack of a statistical correlation between partner height and relationship satisfaction (for women) is really telling us. In brief, we have learned that the women in our sample with shorter partners are, on average, just as happy in their relationships as the women in the sample with taller partners. This might seem to imply that if we moved one of the women with a shorter partner into the tall-partner camp (while keeping everything else about her partner the same), this wouldn’t make her more satisfied. But here’s the issue: we have no guarantee that the women in our sample with shorter partners are similar to those with taller partners, nor can we be confident that the shorter men themselves are similar to the taller men across all dimensions besides height. If there are systematic differences between women with tall partners and women with short partners — or between the tall men and short men they are dating — one of these differences could be covertly affecting relationship satisfaction in such a way that masks any positive influence of partner height on satisfaction.
There are several reasons to believe that there could be important differences between the two types of women in our sample and, separately, between the two types of men they are dating. All of them result from the fact that people don’t enter relationships randomly; they only enter them with people who they find desirable, and they are accepted into relationships only by people who find them desirable.
Let’s consider some possible troublesome differences between the women with short partners and the women with tall partners. For starters, not everyone cares about the height of their partner the same amount. Wouldn’t we expect someone who cares less about height to be less bothered about having a short partner, all else equal? This implies that the women in our sample with shorter partners are, on average, less likely to care about the height of their partner. If they care less, then they are probably more capable of being happy with a shorter partner, as compared to someone with a taller partner who cares a lot about their partner’s height. In other words, a difference in how much people care about the height of their partners could be masking a genuine effect of height on relationship satisfaction. Women — especially the ones with already tall partners — would generally be happier if their partner grew an inch or two, all else equal. But in our sample, all else isn’t equal: the people with shorter partners are probably more tolerant of shorter heights and are, therefore, perhaps capable of being just as happy as those with taller partners.
This is just one possible confounding story, though. It should be easy to think of several others. For example, related to the possibility above, it could be that the women with shorter partners care just as much about height as those with taller partners, but are less desirable to men, which forces them to be less choosy. These less desirable women with lower standards may be happy with what they can get and, in turn, just as happy as the more desirable women with their taller partners.
Another possibility: perhaps some of the women with shorter partners are just as choosy and just as desirable as those with taller partners, but these shorter men are more likely to have other traits — humor, intelligence, exceptional moral character, or whatnot — that compensate for their height deficit. After all, if the women with shorter partners do care just as much about height as those with taller partners and are just as capable of attracting taller men, what other systematic explanation could there be for the differences in partner height? Remember, relationships are selective. Almost by definition, the differences in partner height cannot be random if this is a trait people are selecting for. And it is random variation that we require in order to answer the question we care about, namely, what would happen to average relationship satisfaction if we made every woman’s partner taller, holding everything else constant?
At this point, you might be thinking, “Come on, our data provides at least some prima facie evidence that the folk wisdom about partner height influencing happiness is wrong! You’re just coming up with annoying alternative explanations, and you haven’t provided any evidence for these explanations yourself.” We can quibble about whether the null correlational relationship between partner height and relationship satisfaction provides any evidence for a lack of a causal effect. I am skeptical. Suppose I granted that it provided some evidence for an absence of cause, though. Still, we didn’t ask this question under a backdrop of pure ignorance. We live in a world in which many single women make extremely costly sacrifices to find tall partners, presumably because they believe it will make them happier. It is an extraordinary claim to argue that none of this matters and that these women are making a grievous mistake. Extraordinary claims require extraordinary evidence. If our data are basically just as consistent with the folk view as they are with a radical new view promoted by a self-help book, you might hesitate a bit before throwing away your strong gut intuitions on such shaky grounds.
But wait, now you might be thinking the following: can’t scientists just handle these issues by controlling for a bunch of variables in their analysis? Couldn’t we ask a question like, “Holding constant the desirability of the women answering these questions, do women with taller partners report higher relationship satisfaction than those with shorter partners?” A lot of researchers will try such an approach, but in this case, I believe it is a fool’s errand. Immediately, you might ask, how do we even accurately measure a construct as complex as “desirability”? We could attempt to control for proxies like income, education, perceived physical attractiveness (as rated by third parties), and so on. But what about wit and charm and all those other tricky-to-measure qualities that affect a person’s desirability? And we haven’t even tried to control for partner attributes, which (as you will recall) are likely to be imbalanced between shorter and taller partners. Sooner or later, you will realize that this task of controlling for every possible imbalance in such a multifaceted data set is a hopeless game of whack-a-mole. And, by the time you are done controlling for a million variables, you have poor statistical power and many other reason to doubt the results of your convoluted analysis.
In short, no matter how much data we have, we’re not going to be able to get a satisfactory answer to a question like “Does your partner’s height influence your relationship satisfaction?” by looking at the traits and ratings of people who are in relationships. There is no easy way to assess the impact of partner height on satisfaction when all else is held constant because of the selective nature of dating.
With all of that behind us, let’s re-evaluate this advice from chapter 1 of Don’t Trust Your Gut:
Beauty … is the single most valued trait in the dating market; Hitsch, Hortaçsu, and Ariely found in their study of tens of thousands of single people on an online dating site that who receives messages and who has their messages responded to can, to a large degree, be explained by how conventionally attractive they are. But Joel and her coauthors found, in their study of more than 11,000 long-term couples, that the conventional attractiveness of one’s partner does not predict romantic happiness. Similarly, tall men, men with sexy occupations, people of certain races, and people who remind others of themselves are valued tremendously in the dating market. … But ask thousands of long-term couples and there is no evidence that people who succeeded in pairing off with mates with these desired traits are any happier in their relationship.
If I had to sum up, in one sentence, the most important finding in the field of relationship science, thanks to these Big Data studies, it would be something like as follows (call it the First Law of Love): “In the dating market, people compete ferociously for mates with qualities that do not increase one’s chances of romantic happiness.”
I can see how it is appealing to believe in a world where the “First Law of Love” is gospel. It is certainly possible that people overemphasize superficial traits when searching for a long-term partner. Then again, the opposite is also possible. In the absence of good evidence one way or another, we may as well trust intuition.
What else can Big Data supposedly do for us? Perhaps by measuring many people’s moment-to-moment happiness as they go about their days, we can figure out what kinds of activities are likely to make us the happiest.
In contrast to the relationship example just discussed, I don’t mean to suggest that collecting such data is useless as a self-help tool. Nevertheless, this kind of data poses similar challenges to causal inference — it isn’t going to unlock the secrets to a happy life. To see why, we again need to think carefully about what question we want to answer and the relationship between this question and the nature of the data that’s being collected.
Let’s get concrete about the problem we’re trying to solve here. We go about our day engaging in various activities, such as eating breakfast, going to work, and watching television. Some of these activities are more controllable than others — we might choose to read a book on the couch after work as opposed to going rock climbing, but we are typically obligated to go to work. If we’re miserable at work, we can’t do much about that. In contrast, if we learn that we’re more likely to be happy rock climbing than reading on the couch, we can change our routine to maximize our happiness (if that’s what we want to optimize).
More generally, at any moment in which we’re engaged in a controllable leisure activity, we can ask: Relative to the default leisure activity I’m currently engaged in, how happy would I be doing some alternative activity? Just as in the dating example, this is an impossible counterfactual question; we can’t rewind time and try a different activity in the very same moment. But we can get an approximate answer. For example, we could try mixing up our evening routine every other day. We might discover that we’re, on average, happier in the climbing gym than on the couch, which provides some evidence that we would tend to be happier rock climbing on those days that we’re reading on the couch.
We can pursue a few potential hobbies by implementing this strategy, but it is infeasible to try out many things while also being confident in our impressions of them. After all, if we try climbing just once or twice and rate it as 3 out of 10 on our happiness scale, we might have just found it frustrating at first, but it could become our favorite activity if we stick with it. Yet we can’t continue to try rock climbing for too long if we also want to try cycling, gardening, video games, and so on.
Can Big Data help us? We now have an incredible database of tens of thousands of people giving multiple happiness ratings a day while reporting the activities they’re engaged in. Of course, these people aren’t going to be exactly like us, but — assuming we don’t have super weird preferences — a typical person in such a study is likely to find many of the same activities fun or boring. It seems natural, then, to just break this data up by activity (rock climbing, gardening, cooking, etc.), average the happiness ratings4, and rank activities on the basis of these ratings. If we find that rock climbing has the highest average of, say, 8.4, we might select that as the first activity to try, with the expectation that we, too, will be likely to find rock climbing to be among the most fun leisure activities.
This approach faces roadblocks, though. As with dating, leisure activities are selected, not randomly assigned. Hence, the kind of people who choose rock climbing as an activity are likely to be different than the kind of people who don’t choose it as an activity. Specifically, they are likely to be the kind of people who enjoy rock climbing, just as women who choose especially tall male partners are likely to care more than average about having a tall partner. As a result, we should be suspicious of this 8.4 average happiness rating — it is probably an overestimate of the happiness we should expect to receive if we decide to adopt the hobby simply on the basis of what we see in the data.
The extent to which this is a problematic overestimate will depend on the extent to which people vary in how much they enjoy this hobby. We can take this fallacious logic to an absurd conclusion. Suppose one of the activities in our data set is the specific type of rock climbing known as “free soloing” (i.e., climbing at dangerous heights without any protective gear). Very few people in our data set will have dared to free solo. Maybe it’s just Alex Honnold and that crazy alpinist guy who died in an avalanche. Both of them consistently rate free soloing as a 10 out of 10 — it is their absolute favorite activity by far. And since they are the only two people in our data set who have free soloed, this activity has the highest average happiness rating out of any activity: a perfect 10. Yet I suspect that most people with a normal fear of heights and regard for their lives would not race to take up this practice. The small number of people who choose to free solo are very unusual and are poor guides for normal people who want to find fun hobbies.5
While this is a silly example, it demonstrates an important challenge in treating average happiness ratings of activities as estimates of how happy we would be doing those activities. We won’t think twice about free soloing, but what about taking up a hobby like gardening, which is rated as one of the most happiness-inducing activities according to Don’t Trust Your Gut (Ch. 8)? I’m not sure I’ve ever gardened in my life even though I’ve had plenty of opportunities to and am exposed to the idea every time I leave my house and see my neighbors cultivating their community garden. Am I close-minded for thinking I wouldn’t enjoy it? The author of our book gives straightforward advice about what to do in this situation:
[E]very time I am considering an activity, I … look at the back of my phone, see how much happiness I can expect from said activity, and make a data-driven decision as to whether to partake.
So is it settled, then, that I should take up gardening in place of just about any other hobby? While it probably can’t hurt, I suspect that there are many other better hobbies to try first, despite being far lower on this master happiness list. The reason is heterogeneity of preferences: the kind of people who like gardening probably feel extremely satisfied doing it. But we don’t have data from all the people like me, who may have reasonably good gut intuitions that gardening is not for them. Besides, running and exercise are even higher up on the list. Good luck convincing ordinary people to try that for fun!
What about activities that just about everyone partakes in, such as watching television? If we find that people are, on average, 5.4 happiness while engaged in this activity, should we expect the same for ourselves if we override our default behavior of reading on the couch to watch television instead? The answer is still no. Indeed, the answer would still be no if we looked solely at our own past ratings of happiness when we were watching television! Remember, we chose to watch television in those past moments. We don’t make choices randomly; we are responding to an internal signal in our head that is trying to estimate what would be the most desirable activity at that moment. This signal changes from moment to moment. Perhaps on stressful work days, crashing on the couch to watch a funny TV show is much more relaxing than reading a book or venturing out to the climbing gym. On lighter work days, we are more likely to want to do something adventurous.
There are actually two distinct issues here. First, our average happiness may just be lower on those days that we choose to watch television for reasons that have nothing to do with the activity itself (e.g., we were already in a bad mood from work, so we rate everything low). If this is the case, we will end up underestimating the happiness we will feel from watching television on one of those happier days that we would have read a book by default. Second, even if our average happiness is the same on stressful work days and easy work days, the specific activities that we find most enjoyable could change. So, perhaps on stressful work days, watching television would typically be rated a 5 or 6 out of 10 whereas reading a book would typically be rated a 4 or a 5; on less stressful days, these ratings reverse. So on a day in which we would typically read a book, we cannot use our average happiness rating on days we watched television as a forecast of how happy we would be if we were to choose television right now. (Not to mention, we could easily get sick of watching TV multiple days in a row if we made that a daily routine.) In sum, there are major difficulties with applying a “Big Data” strategy even to just our own observational data.
Again, I am not suggesting that your current intuitive mode of choosing what to do is ideal. Especially if you’re the kind of person who mindlessly selects activities or does the same thing day after day out of habit, you could probably benefit from trying something new. And, with the exception of free soloing, you probably won’t hurt yourself by picking an activity off of the giant happiness list. For all of the reasons we have already discussed, these average ratings are likely to substantially exceed the happiness you can expect for yourself. Nonetheless, this kind of exploration is wise not because you necessarily expect the new activity that you try to be more enjoyable than what you typically do, but because you are highly uncertain about how much you will like it. If you don’t like it, there is only a small cost to pay for trying; if you do like it, it could become your new favorite hobby for years to come. In this sense, using “Big Data” to inspire you to step outside of your comfort zone could be marginally beneficial. Then again, I suspect you’d be better off just asking a friend for some ideas.
Big Data has great potential to improve our lives, and it is already being used for a number of exciting applications. I’m optimistic that sophisticated machine learning and artificial intelligence that can take advantage of data at this scale may one day help us overcome many of our natural foibles and biases. However, I hope this post has convinced you that we ought not celebrate prematurely. Be weary of extraordinary claims like “the physical attractiveness of your partner is irrelevant to your relationship satisfaction” or “you would be much happier gardening than watching television right now.” Without careful attention to the causal questions that we actually want to answer, Big Data analyses can distract or even mislead.
Indeed, there is a bitter irony we must grapple with here. If people chose their romantic partners or leisure activities (or anything else) randomly, it would be much easier to learn from the data about the features and choices that make us happy. But it is precisely because our gut decisions are already quite good that it is difficult to assess whether and how we can improve on them. Maybe it’s not such a mystery that even the most advanced machine learning algorithms struggle to discover a simple formula for relationship satisfaction among people who are already in long-term relationships of their own choosing. Most of the relationships that are predictably doomed are likely to have never begun or to have failed before entering this more serious phase, in part because people can instinctively sense incompatibility.
I’m not suggesting we just give up, though. Rather, we should rely on the tools that we’ve always relied on to make good causal inferences — first and foremost, randomized experiments. Okay, maybe we can’t ethically randomize people into long-term relationships (though the randomness of speed dating presents some opportunities to study romantic attraction). But for many everyday decisions, such as what leisure activities to try, it wouldn’t be difficult to design interventions that arbitrarily suggest different types of activities to different groups of people at different times and then measure people’s happiness (or anything else) afterwards. While this approach is not without its own challenges, it would be a big advance on the observational data we have now. Moreover, economists have been devising increasingly clever ways to identify quasi-random variation from observational data sets, which license causal inference if certain assumptions are met. Maybe one of these clever economists will one day find such variation in the world of long-term relationships.
As an unabashed technophile, I look forward to the day that I can outsource many of my most important decisions to machine learning algorithms and AI companions that know me better than I know myself. But, for now, I’m sticking with my gut.
Stephens-Davidowitz, S. (2022). Don’t Trust Your Gut: Using Data Instead of Instinct to Make Better Choices. Bloomsbury Publishing.
This fact is referenced in chapter 1 of the book that this post is about. But it goes without saying that nothing in this post should be read as an endorsement or rejection of any dating strategy. ↩︎
The analysis that was actually done is substantially more complicated than I am describing here. Moreover, as far as I can tell, the original paper authors do not make any strong causal claims about how height or similar features directly affect relationship satisfaction. ↩︎
Of course, in the real world, most women don’t set strict height thresholds like this. But the basic statistical point still holds: women who care about height will generally be filtering out men who are unsatisfactorily short, so we can’t assess whether having an unsatisfactorily short partner influences relationship satisfaction. We observe only a restricted range of data above the threshold. ↩︎
Some people may rock climb much more often than other people, which means that their data would be overweighted if we took a simple average of all observations. We can fix this problem by first averaging at the person level, and then averaging all these person-level averages. Of course, we can only do this for people who have tried rock climbing at least once, which leads to a problem I discuss below. ↩︎
To be clear, the worry here is that these people find free soloing particularly enjoyable, in contrast to other activities they engage in (voluntarily or not). There’s a different worry that these people are just sanguine freaks who get insane pleasure out of life in general and rate every activity close to 10 out of 10. This latter issue is easy to correct for, however, since we can calculate each person’s average happiness score across all activities and subtract it out. ↩︎