Self-selection into online or face-to-face studies

A new paper by Edward Witt, Brent Donellan, and Matthew Orlando looks at self-selection biases in subject pools:

Just over 500 Michigan State University undergrads (75 per cent were female) had the option, at a time of their choosing during the Spring 2010 semester, to volunteer either for an on-line personality study, or a face-to-face version…

Just 30 per cent of the sample opted for the face-to-face version. Predictably enough, these folk tended to score more highly on extraversion. The effect size was small (d=-.26) but statistically significant. Regards more specific personality traits, the students who chose the face-to-face version were also more altruistic and less cautious.

What about choice of semester week? As you might expect, it was the more conscientious students who opted for dates earlier in the semester (r=.-.20). What’s more, men were far more likely to volunteer later in the semester, even after controlling for average personality difference between the sexes. For example, 18 per cent of week one participants were male compared with 52 per cent in the final, 13th week.

Self-selection in subject pools is not a new topic — I’ve heard plenty of people talk about an early-participant conscientiousness effect (though I don’t know if that’s been documented or if it’s just lab-lore). But the analyses of personality differences in who takes online versus in-person studies are new, as far as I know — and they definitely add a new wrinkle.

My lab’s experience has been that we get a lot more students responding to postings for online studies than face-to-face, but it seems like we sometimes get better data from the face-to-face studies. Personality measures don’t seem to be much different in quality (in terms of reliabilities, factor structures, etc.), but with experiments where we need subjects’ focused attention for some task, the data are a lot less noisy when they come from the lab. That could be part of the selection effect (altruistic students might be “better” subjects to help the researchers), though I bet a lot of it has to do with old-fashioned experimental control of the testing environment.

What could be done? When I was an undergrad taking intro to psych, each student was given a list of studies to participate in. All you knew was the codenames of the studies and some contact information, and it was your responsibility to arrange with the experimenter to take the experiment. It was a pain on all sides, but it was a good way to avoid these kinds of self-selection biases.

Of course, some people would argue that the use of undergraduate subject pools itself is a bigger problem. But given that they aren’t going away, this is definitely something to pay attention to.

A very encouraging reply

Who knew letter-writing could actually make a difference?

In response to the letter I sent yesterday to the CITI program, I got a prompt and very responsive reply from someone involved in running the program. She explained that the module had originally been written just for biomedical researchers. When it was adapted for social/behavioral researchers, the writers simply inserted new cases without really thinking about them. Most importantly, she said that she agreed with me and will revise the module.


UPDATE (7/6/2011): Not cool. Despite their promises, they didn’t change a thing.

Milgram is not Tuskegee

My IRB requires me to take a course on human subjects research every couple of years. The course, offered by the Collaborative Institutional Training Initiative (CITI), mostly deals with details of federal research regulations covering human subjects research.

However the first module is titled “History and Ethics” and purports to give an overview and background of why such regulations exist. It contains several historical inaccuracies and distortions, including attempts to equate the Milgram obedience studies with Nazi medical experiments and the Tuskegee syphilis study. I just sent the following letter to the CITI co-founders in the hopes that they will correct their presentation:

* * *

Dear Dr. Braunschweiger and Ms. Hansen:

I just completed the CITI course, which is mandated by my IRB. I am writing to strongly object to the way the research of Stanley Milgram and others was presented in the “History and Ethics” module.

The module begins by stating that modern regulations “were driven by scandals in both biomedical and social/behavioral research.” It goes on to list events whose “aftermath” led to the formation of the modern IRB system. The subsection for biomedical research lists Nazi medical experiments and the PHS Tuskegee Syphilis study. The subsection for social/behavioral research lists what it calls “similar events,” including the Milgram obedience experiments, the Zimbardo/Stanford prison experiment, and several others.

The course makes no attempt to distinguish among the reasons why the various studies are relevant. They are all called “scandals,” described as “similar,” and presented in parallel. This is severely misleading.

Clearly, the Nazi experiments are morally abhorrent on their face. The Tuskegee study was also deeply unethical by modern standards and, most would argue, even by the standards of its day: it involved no informed consent, and after the discovery that penicillin was an effective treatment for syphilis, continuation of the experiment meant withholding a life-saving medical treatment.

But Milgram’s studies of obedience to authority are a much different case. His research predated the establishment of modern IRBs, but even by modern standards it was an ethical experiment, as the societal benefits from knowledge gained are a strong justification for the use of deception. Indeed, just this year a replication of Milgram’s study was published in the American Psychologist, the flagship journal of the American Psychological Association. The researcher, Jerry M. Burger of Santa Clara University, received permission from his IRB to conduct the replication. He made some adjustments to add further safeguards beyond what Milgram did — but these adjustments were only possible by knowing, in hindsight, the outcome of Milgram’s original experiments. (See:

Thus, Tuskegee and Milgram are both relevant to modern thinking about research ethics, but for completely different reasons. Tuskegee is an example of a deeply flawed study that violated numerous ethical principles. By contrast, Milgram was an ethically sound study whose relevance to modern researchers is in the substance of its findings — to wit, that research subjects are more vulnerable than we might think to the influence of scientific and institutional authority. Yet in spite of these clear differences, the CITI course calls them all “scandals” and presents them in parallel, and alongside other ethically questionable studies, implying that they are all relevant in the same way.

(The parallelism implied with other studies on the list is problematic as well. Take for example the Stanford prison experiment. It would arguably not be approved by a modern IRB. But an important part of its modern relevance is that the researchers discontinued the study when they realized it was harming subjects — anticipating a central tenet of modern research ethics. This is in stark contrast to Tuskegee, where even after an effective treatment for syphilis was discovered, the researchers continued the study and never intervened on behalf of the subjects.)

In conclusion, I strongly urge you to revise your course. It appears that the module is trying to get across the point that biomedical research and social/behavioral research both require ethical standards and regulation — which is certainly true. But the histories, relevant issues, and ramifications are not the same. The attempt to create some sort of parallelism in the presentation (Tuskegee = Milgram? Nazis = Zimbardo?) is inaccurate and misguided, and does a disservice to the legacy of important social/behavioral research.

Sanjay Srivastava

UPDATE: I got a response a day after I sent the letter. See this post: A very encouraging reply.

UPDATE 7/6/2011: Scratch that. Two years later, they haven’t changed a thing.

The perverse incentive structure of IRBs

As a researcher at a university, all of my human subjects research has to go through my university’s IRB. I believe that IRBs have an important role in research. However, in practice I sometimes find dealing with an IRB to be frustrating.

Pretty much all of the research that I do is very low risk. Yet I have to go through a review system that was invented as a response to Nazi medical experiments and other horrific incidents half a century ago. You might think that should make my behavioral research easier to get approved — I could just say, “hey, guess what, I’m not secretly giving people syphilis or anything” and get the thumbs-up. Sadly, though, it doesn’t work like that. Even when I have a study that is eligible for expedited review, there is a heck of a lot of paperwork to fill out, and time to wait, and often pointless revisions to make — all in order to do something as simple as asking people a few questions about what kind of day they had yesterday.

So why are university IRBs so inefficient? There are a number of reasons, but I believe that one of the core problems is that the system is built on a foundation of perverse incentives for the IRB.

The IRB’s task can be thought of like a signal detection problem. Simplifying a little bit, you can think of the protocols that researchers submit as being either worthy or unworthy. For any given protocol, the IRB has to decide to approve or reject. So there are two kinds of correct decisions (approve a worthy protocol or reject an unworthy one) and two kinds of mistaken decisions (reject a worthy protocol or approve an unworthy one). And the big problem is that the IRB’s potential costs associated with the two different kinds of mistakes are severely imbalanced.

If the IRB mistakenly rejects a worthy protocol, what is the worst thing that could happen? The investigator might make a phone call and resubmit the application, taking up some extra staff time, but the IRB will not get into any serious trouble. And the costs of this mistake are chiefly borne by the researcher, not the IRB. Furthermore, within a university, there is no appeals process or oversight authority empowered to act on a rejected protocol.

By contrast, if the IRB mistakenly approves an unworthy protocol, all kinds of bad things could happen. Even if no subjects are harmed, an audit could turn up the mistake and the IRB could get in trouble. And in more serious cases — if subjects do get exposed to inappropriate risks, or actually get harmed — things can get much, much worse. The IRB could get shut down (halting all research at the university), the professional IRB staff could get fired, and the university could get sued by the harmed subjects.

These asymmetric incentives mean that IRBs have a very strong incentive to err on the side of rejecting too much research. So it’s no wonder that the process is so slow and clunky, and even simple low-risk protocols are routinely sent back for revisions. The staff at my IRB are good people who want to help researchers when they can. But the actual review board members are often people with no personal stake in seeing that research gets done efficiently, and some have no formal science training at all (which can lead them to imagine harmful effects of research that have no basis in reality). And for both the paid staff and the board members, even those with the best intentions work within an incentive structure that is completely out of whack.

So a big part of me was outraged (and a tiny, naughty part of me jealous) to learn that in commercial medical settings, the IRB incentives are out of whack too — but in the opposite direction. If you are a researcher a private, for-profit research company, you get approval for your research by paying a commercial IRB to review it. It doesn’t take a genius to look at this setup and figure out that a commercial IRB that approves lots of research is going to be popular with its customer base. So it was probably just a matter of time before a scandal erupted. And now one has.

In a test of the commercial IRB system, the Government Accountability Office submitted a fake protocol to 3 different commercial IRBs. The protocol was rigged to be full of unsafe, high-risk elements. And apparently one of the companies, Coast IRB, fell for the sting, deeming the protocol safe and low-risk and giving it approval. Upon further investigation from the GAO, it turns out that Coast has not rejected a single proposal in the last 5 years, and it made over $9 million last year. Hmmm…

In the aftermath of this incident, it is very likely that attention is again going to get focused where it always gets focused: on the possibility that IRBs might be approving bad, unsafe research. But such a focus may be misguided. The case of Coast IRB shows that even commercial IRBs face very serious costs when they get caught approving bad research. The company has just seen its entire $9-mil-a-year business evaporate while it undergoes an audit. Employees may lose their jobs. Owners may lose profits and see their shares lose value. The entire company could go out of business.

Instead, the problem with both university and commercial IRBs is on the approval side: the system does not present the right level of incentives for approving worthy research. In the university IRB case, the incentive is too low. And in the commercial IRB case, it’s too high. Hypothetically speaking, even if somebody at a Coast IRB kind of place knew the potential costs of getting caught approving bad research, in a rational cost-benefit analysis those potential costs would have been balanced against a multimillion-dollar revenue stream that depended on them approving lots of protocols, good and bad.

So what will happen next? If you are a member of Congress and you want to fix commercial IRBs, you could alter the cost-benefit balance on either side. That is, you could either diminish the profit motive associated with approving research, or you could make it even more costly for a company to mistakenly approve bad research. The problem is that any new regulatory policy designed to fix commercial IRBs could very well affect university IRBs as well, since both kinds of IRBs fall under many of the same regulations. And if you raise the costs and punishments associated with approving bad research (or institute even more intrusive regulations and oversight to try to prevent such approvals from happening), you will make the perverse incentives at universities even more perverse.

Personally, I think it’s at least a littie bit weird that IRBs — institutions designed to safeguard the interests of research subjects — can be run as for-profit businesses whose very financial existence depends upon those they are supposed to watch. If Congress wants to fix the system in the commercial medical industry, they need to look at the fundamental question of whether that is a sustainable model, and narrowly tailor any changes to apply to commerical IRBs. The answer is most definitely not to create more intrusive oversight or threaten punishments across the board. Let’s hope that is not the direction they choose to go.