Why is the replication crisis centered on social psychology? In a recent post, Andrew Gelman offered a list of possible reasons. Although I don’t agree with every one of his answers (I don’t think data-sharing is common in social psych for example), it is an interesting list of ideas and an interesting question.
I want to riff on one of those answers, because it is something I’ve been musing about for a while. Gelman suggests that in social psychology, experiments are comparatively easy and cheap to replicate. Let’s stipulate that this is true of at least some parts of social psych. (Not necessarily all of them – I’ll come back to that.) What would easy and cheap replications do for a field? I’d suggest they have two, somewhat opposing effects.
On the one hand, running replications is the most straightforward way to obtain evidence about whether an effect is replicable.1 So the easier it is to run a replication, the easier it will be to discover if a result is a fluke. Broaden that out, and if a field has lots of replicability problems and replications are generally easy to run, it should be easier to diagnose the field.
But on the other hand, in a field or area where it is easy to run replications, that should facilitate a scientific ecosystem where unreplicable work can get weeded out. So over time, you might expect a field to settle into an equilibrium where by routinely running those easy and cheap replications, it is keeping unreplicable work at a comfortably low rate.2 No underlying replication problem, therefore no replication crisis.
The idea that I have been musing on for a while is that “replications are easy and cheap” is a relatively new development in social psychology, and I think that may have some interesting implications. I tweeted about it a while back but I thought I’d flesh it out.
Consider that until around a decade ago, almost all social psychology studies were run in person. You might be able to do a self-report questionnaire study in mass testing sessions, but a lot of experimental protocols could only be run a few subjects at a time. For example, any protocol that involved interaction or conditional logic (i.e., couldn’t just be printed on paper for subjects to read) required live RAs to run the show. A fair amount of measurement was analog and required data entry. And computerized presentation or assessment was rate-limited by the number of computers and cubicles a lab owned. All of this meant that even a lot of relatively simpler experiments required a nontrivial investment of labor and maybe money. And a lot of those costs were per-subject costs, so they did not scale up well.
All of this changed only fairly recently, with the explosion of internet experimentation. In the early days of the dotcom revolution you had to code websites yourself,3 but eventually companies like Qualtrics sprung up with pretty sophisticated and usable software for running interactive experiments. That meant that subjects could complete many kinds of experiments at home without any RA labor to run the study. And even for in-lab studies, a lot of data entry – which had been a labor-intensive part of running even a simple self-report study – was cut out. (Or if you were already using experiment software, you went from having to buy a site license for every subject-running computer to being able to run it on any device with a browser, even a tablet or phone.) And Mechanical Turk meant that you could recruit cheap subjects online in large numbers and they would be available virtually instantly.
All together, what this means is that for some kinds of experiments in some areas of psychology, replications have undergone a relatively recent and sizeable price drop. Some kinds of protocols pretty quickly went from something that might need a semester and a team of RAs to something you could set up and run in an afternoon.4 And since you weren’t booking up your finite lab space or spending a limited subject-pool allocation, the opportunity costs got lower too.
Notably, growth of all of the technologies that facilitated the price-drop accelerated right around the same time as the replication crisis was taking off. Bem, Stapel, and false-positive psychology were all in 2011. That’s the same year that Buhrmester et al published their guide to running experiments on Mechanical Turk, and just a year later Qualtrics got a big venture capital infusion and started expanding rapidly.
So my conjecture is that the sudden price drop helped shift social psychology out of a replications-are-rare equilibrium and moved it toward a new one. In pretty short order, experiments that previously would have been costly to replicate (in time, labor, money, or opportunity) got a lot cheaper. This meant that there was a gap between the two effects of cheap replications I described earlier: All of a sudden it was easy to detect flukes, but there was a buildup of unreplicable effects in the literature from the old equilibrium. This might explain why a lot of replications in the early twenty-teens were social priming5 studies and similar paradigms that lend themselves to online experimentation pretty well.
To be sure, I don’t think this could by any means be a complete theory. It’s more of a facilitating change along with other factors. Even if replications are easy and cheap, researchers still need to be motivated to go and run them. Social psychology had a pretty strong impetus to do that in 2011, with Bem, Stapel, and False-positive psychology all breaking in short order. And as researchers in social psychology started finding cause for concern in those newly-cheap studies, they were motivated to widen their scope to replicating other studies that had been designed, analyzed, and reported in similar ways but that hadn’t had so much of a price-drop.
To date that broadening-out from the easy and cheap studies hasn’t spread nearly as much to other subfields like clinical psychology or developmental psychology. Perhaps there is a bit of an ingroup/outgroup dynamic – it is easier to say “that’s their problem over there” than to recognize commonalities. And those fields don’t have a bunch of cheap-but-influential studies of their own to get them started internally.6
An optimistic spin on all this is that social psychology could be be on its way to a new equilibrium where running replications becomes more of a normal thing. But there will need to be an accompanying culture shift where researchers get used to seeing replications as part of mainstream scientific work.
Another implication is that the price-drop and resulting shift in equilibrium has created a kind of natural experiment where the weeding-out process has lagged behind the field’s ability to run cheap replications. A boom in metascience research has taken advantage of this lag to generate insights into what does7 and doesn’t8 make published findings less likely to be replicated. Rather than saying “oh that’s those people over there,” fields and areas where experiments are difficult and expensive could and should be saying, wow, we could have a problem and not even know it – but we can learn some lessons from seeing how “those people over there” discovered they had a problem and what they learned about it.
- Hi, my name is Captain Obvious. ↩
- Conversely, it is possible that a field where replications are hard and expensive might reach an equilibrium where unreplicable findings could sit around uncorrected. ↩
- RIP the relevance of my perl skills. ↩
- Or let’s say a week + an afternoon if you factor in getting your IRB exemption. ↩
- Though admirably, some researchers in those fields are now trying anyway, costs be damned. ↩
- Selective reporting of underpowered results. ↩
- Hidden moderators. ↩
Really interesting observations and thoughts. Sure it is speculative, but time (and replications) will tell us whether you are right.
There’s another factor though, that might be worth considering. In sub-fields of psychology (or any other science) where running experiments is, or becomes, cheap, there is an incentive to run lots of studies. Not just replications, but primary studies designed to test hypotheses. In fact, the cheaper the studies are to run, the more incentive there is to test just about any speculative hypothesis out there. And that creates a big problem in terms of false discoveries: lots of studies (problem 1) testing hypotheses with a low apriori probability (problem 2). If publication bias remains as it is, then that will create a lot of published false positives.
In such an environment, the ease of replications might not be sufficient to control the field-wide false positive rate (FW-FPR). Because there will be lots of replications of lots of (false positive) primary studies, yielding a large number of false positive replications.
The only solution I can see to this situation is pre-registration of all hypothesis-testing studies. That’s the only way to make sure that all results are published, both “negative” and “positive”, which is the only way to control the FW-FPR.
Many replication studies in social psychology are easy and cheap because the original studies were easy and cheap. Think, for example, small and convenience samples of undergraduates, a couple of funny cartoons, a pen, a chair, a quiet room, an instruction to “hold the pen with the nondominant hand, with the teeth, or with the lips” (Strack, Martin, & Stepper, 1988, p. 771), and a rating on a scale from 0 (not at all funny) to 9 (very funny). Social psychologist didn’t really need a sudden drop in the cost of research. They didn’t really need the cheap and easy to get cheaper and easier. It wasn’t fiscal. No, attitudes changed, social psychologists started doing direct replications, and journals started publishing them.
One minor item: the capital infusion Qualtrics received was to expand to the corporate world, not academia. The reason is they could charge much higher prices to private companies.