Psychology's Replication Crisis

A chalk sketch of a man (1575).
( Santi di Tito / Metropolitan Museum of Art )

BOB GARFIELD: This is On the Media. I’m Bob Garfield.

BROOKE GLADSTONE: And I'm Brooke Gladstone. The relentless pursuit of truth is the guiding spirit of the scientific method. That's why a series of failures to replicate the findings of several classic studies in behavioral science led to a crisis of confidence - and conscience.

Uli Schimmack, professor of psychology at the University of Toronto, is a leading watchdog in the effort to identify and discourage unethical research practices. He began scrutinizing research behind psychology’s landmark theories after a shakeup in 2011.

ULI SCHIMMACK: In 2011, a very controversial article was published in a top journal by a famous social psychologist, Daryl Bem, who made his name in the 1970s. And this article claimed that people have the ability to foresee future events that haven't even happened, and demonstrated this in ten studies, with nine studies showing successfully this amazing ability. So in one study, people were shown erotic pictures. They had to guess where the picture would appear. And, according to the results, people could foresee at an above-chance probability where the picture would appear before the computer even randomly generated the location of the picture.

A lot of psychologists experienced what we call dissonance, like a conflict: Either I have to believe this phenomenon, given all this evidence, or I have to question the scientific method that led to these findings. And that led to the uncomfortable conclusion that maybe many other findings that, you know, were presented with similar strong evidence might be also questionable.

BROOKE GLADSTONE: And when some researchers tried to replicate Bem's findings and failed to support his claims, people became very concerned.

ULI SCHIMMACK: Right. Somehow, his evidence was very strong, but others couldn’t replicate it. So we have to explain that. The problem is that people use what we call questionable research practices. They use statistical methods in a liberal, flexible way, and that increases the chance of getting these successes. And for a long time those methods were known that they were being used but it was considered like speeding only 10 miles over the speed limit. You know, people do this once in a while a little bit.

But since 2011, it has become apparent that, really, people are speeding at 50 miles over the speed limit in reckless ways, and many of those findings then are not replicable and are polluting basically the scientific record and the theories that we are trying to build on to understand human behavior.

BROOKE GLADSTONE: And so, ultimately, your frustration with Bem’s paper led you to develop the R-Index, the Replication Index, which you’ve called a “doping test” for science? If you have the highest rank, which is 1, it means the study can be replicated easily; it was well-designed and the data was expressed clearly and honestly. A low R-Index of, say, .1 or .35 means a researcher inflated his or her results and the study will be hard to replicate. How does it work?

ULI SCHIMMACK: The basic principle is that we can actually make predictions about the success rate that a researcher should have, using the exact numbers that are reported in that article that claims to have all the success and then we’re seeing what the actual success rate is. And what we typically often find is that the actual success rate is much higher than the expected success rate.

BROOKE GLADSTONE: You’re finding that the published success rate is higher than –

[BOTH SPEAK/OVERLAP]

ULI SCHIMMACK: Yeah, than what we would expect.

BROOKE GLADSTONE: Mm-hmm. So how do you know how a study is supposed to turn out?

ULI SCHIMMACK: Well, yeah, without going too much into the statistical details, really I –

BROOKE GLADSTONE: Oh, you can. I have a PhD from MIT. It’s okay.

ULI SCHIMMACK: Okay.

BROOKE GLADSTONE: No, I don’t. I’m totally lying about that.

[LAUGHTER]

ULI SCHIMMACK: But basically, the chance to get a successful result in a study is based on two main things, how strong is the effect, so to notice, for example, that men are taller than women, you don't need a big sample. You know, you can see that pretty quickly. Or you need a large sample. So if you have a small effect, you need samples of 1,000 people or 2,000 people. So given that researchers publish, actually, information about the sample sizes and effect sizes, it's possible to get an estimate of what the success rate should be, and then we can compare that by just looking at the one that we’re actually observing in the journals, which is over 95% success rate.

BROOKE GLADSTONE: Wow.

ULI SCHIMMACK: To get accepted into a top journal, you have to present only successful studies. Typically, the studies don't have the effect sizes or the sample sizes to warrant these high success rates.

BROOKE GLADSTONE: Let's say a researcher’s data supports his hypothesis one out of three times, they might just focus on the outcomes that fit their hypothesis or they might change their hypothesis to fit the data.

ULI SCHIMMACK: Right.

BROOKE GLADSTONE: How do you know that your Index works?

ULI SCHIMMACK: [LAUGHS] Nowadays, a lot of researchers use simulations. We’re just simulating scenarios, and we have demonstrated in simulation that the Index performs well under typical scenarios that you would encounter in the literature.

BROOKE GLADSTONE: There are all kinds of psychological theories that have been thrown into question lately. Here's one that you might hear from a science writer or in a TED Talk.

[CLIP]:

WOMAN: You’ve probably heard that, that when you smile it can make you feel happy.

STEPHEN COLBERT: Can it really, smiling make me happy?

WOMAN: It can also make you seem creepy.

[AUDIENCE LAUGHTER]

STEPHEN COLBERT: That’s true. [LAUGHS] That’s the part that makes me happy.

[AUDIENCE LAUGHTER/END CLIP]

BROOKE GLADSTONE: So if I smile, I’ll feel happier?

ULI SCHIMMACK: This is a classic theory, and in the 1970s and the 1980s some experimental studies, you know, manipulated people's facial expressions, and the study suggested that it made people rate cartoons as funnier and they felt more amused.

BROOKE GLADSTONE: Mm-hmm.

ULI SCHIMMACK: And this became a textbook finding, until last year some researchers had a big effort to replicate the study in close to 20 different laboratories. None of those laboratories could actually reproduce the finding. And what we did then is to go back to the literature and we found massive evidence for selection bias, that there must have been many studies that didn't work out that weren't published, and so on.

BROOKE GLADSTONE: Your most famous R-Index check involved the 2011 bestseller by Nobel laureate Danny Kahneman, Thinking, Fast and Slow. Describe what priming is and how do psychologists feel about those landmark studies now?

ULI SCHIMMACK: Priming is the idea that our mind works by association, and so, even little things in our environment might suddenly trigger an association and alter our behavior. And this might even happen outside of awareness. Studies claimed that just seeing the word “professor” would make people perform better on some intelligence tests.

BROOKE GLADSTONE: A few words might change, for a while, your response on issues of race, on issues of finance, basically anything. It's been applied across the board. It's really quite an important theory.

ULI SCHIMMACK: Right, so we went back to the actual research articles on which the chapter was based, and individually, out of, I think, the 30 studies there, 29 already individually showed a red flag with the R-Index. It showed that basically there is no credible evidence that substantiates, you know, the broader theoretical claims about priming effects.

BROOKE GLADSTONE: Why is it so important that we mitigate this replication crisis?

ULI SCHIMMACK: Well, I became a psychologist because I wanted to understand human behavior and thought and feelings, but if you're not using the scientific method appropriately and we just, in the end, pick hypotheses that we like or that are popular amongst our peers and then we just find confirming evidence for it, then we’re not doing the service that psychology should be doing. What the field is looking right now to do is to see how can we improve things?

BROOKE GLADSTONE: Like making all the data available, publishing what your hypotheses are, to begin with, so people can compare them with the conclusions that you draw.

ULI SCHIMMACK: The top journal in psychology started having badges for sharing your data or for preregistration. It sends a signal that this research is more credible, more trustworthy and will create some incentive to do so. The crisis has created a sense of doubt because, you know, now we doubt a lot of what happened in the past. But I think all the new initiatives will reduce doubt because we can actually trust what is being published.

[MUSIC UP & UNDER]

BROOKE GLADSTONE: Uli, thank you very much.

ULI SCHIMMACK: Okay, great, bye.

BROOKE GLADSTONE: Uli Schimmack is a professor of psychology at the University of Toronto and the creator of the Replication Index blog.

Hosted by Brooke Gladstone

Produced by WNYC Studios