Why does an IRB need an analysis plan?

My IRB has updated its forms since the last time I submitted an application, and I just saw this section, which I think is new (emphasis added by me):

Analysis: Explain how the data will be analyzed or studied (i.e. quantitatively or qualitatively and what statistical tests you plan on using). Explain how the interpretation will address the research questions. (Attach a copy of the data collection instruments).

What statistical tests I plan on using?

My first thought was “mission creep,” but I want to keep an open mind. Are there some statistical tests that are more likely to do harm to the human subjects who provided the data? Has anybody ever been given syphilis by a chi-square test? If I do a median split, am I damaging anything more than my own credibility? (“What if there are an odd number of subjects? Are you going to have to saw a subject in half?”)

Seriously though, is there something I’m missing?

Fun with Google Correlate

A new tool called Google Correlate lets you input a search term and then creates a state-by-state map of how many people search for it. It then it shows you what other search terms have similar state-by-state patterns.

A search for my name (what else would I have plugged in first?) shows the most searches coming from my home state of Oregon, and a notable lack of interest stemming from the Great Plains. Of note: interest in McBain: The Movie follows a very similar regional pattern:

Google Correlate search for sanjay srivastava and mcbain: the movie

I’m trying to think of a good scientific use for this tool, but I keep getting stuck on the fact that the top regional correlate of personality is “nipple stimulation.”

This weekend give Leon some love

This weekend, on the outside chance that the rapture does not occur (I know I know, just humor me)… Please remember to give some love to Leon Festinger. Everything that happens next, he called it.

Excerpt from When Prophecy Fails

Wikipedia entries on When Prophecy Fails, cognitive dissonance theory, and Leon Festinger

Modern examples: the vaccine-autism myth and birtherism

So we all agree Satoshi Kanazawa is a bad scientist; what next?

Yesterday morning, Psychology Today published a blog post by Satoshi Kanazawa titled Why are Black Women Less Physically Attractive than Other Women. The link has gone dead, but if you’re curious, somebody mirrored the article here. News and reactions here and here and here and here (and plenty elsewhere). It has been causing a bit of a kerfuffle online. Plenty of people are duly taking Kanazawa to task over what he wrote. I’d like to reflect on some related issues.

1. Psychology Today needs to answer for itself. Psychology Today apparently pulled the article, but so far they have offered no explanation. Marianne Kirby says that’s not enough, and I agree. With no explanation, Kanazawa or his supporters can position him as the noble truth-teller being censored in the name of political correctness. Psychology Today needs to head off that argument by directly refuting the substance of what Kanazawa wrote, not just disappearing his blog post. They need to show that their decision to spike the article was an evaluation of the science. And the statement needs to come from the editorial staff – it isn’t enough just to let this be a back-and-forth with other Psych Today bloggers.

2. Let’s stop bothering to read anything that Satoshi Kanazawa writes. A few years ago statistician Andrew Gelman spent some of his valuable time writing a critique of another of Kanazawa’s claims. Gelman took Kanazawa seriously and was evenhanded, but concluded that Kanazawa had commited some serious statistical errors. Kanazawa (or his partisans) placed a rebuttal on Wikipedia, and went on to write a popular book about the disputed research. The lesson I took from that incident was that hunting down all of Kanazawa’s errors is a thankless job. Based on his track record it is probably safer just to assume that he’s always wrong and move on.

3. Don’t blame Add Health. I’ve seen some bloggers attacking Add Health, the longitudinal study of adolescent and adult health from which Kanazawa got his data. That’s misguided. Kanazawa had nothing to do with planning or running Add Health. Add Health is a publicly funded study that makes some of its data available to the general public and other, more sensitive data available to researchers who enter into security arrangements. They provide a valuable resource and should not be held responsible for how their data gets used or misused.

4. You don’t need cultural determinism to refute him. Physical attractiveness is a reputational construct, meaning it is irreducibly defined by how people perceive one another. And plenty of studies have shown that judgments of physical beauty vary by culture, historical era, by who is the perceiver and their relationship to the target, etc. But there is a world of difference between saying “culture matters” and saying “culture is the only thing that matters.” The former is indisputably true, and that gives you everything you need to indict Kanazawa. By not considering cultural explanations or perceiver-side biases, Kanazawa committed an enormous error in reaching his conclusions. You don’t need to go the next step and claim that there are absolutely no universals in how humans make judgments of attractiveness. For one thing, that kind of blank-slate cultural determinism is much more difficult to defend; for another, no serious universalist theories say anything about race differences. Don’t do Kanazawa the favor of including him in a sophisticated scientific discussion about the bases of attractiveness judgments; his mistakes are far dumber than that.

Is it still a bad idea for psychology majors to rent their intro textbook?

Inside Higher Ed reports that the number of students who rent textbooks is increasing. Interestingly, e-books have not caught on — most students are still using printed textbooks (though iPads might change that).

When I teach intro, I have always suggested to my students that if they are going to major in psychology, it is a good idea to purchase and keep their intro textbook. My argument has been that it will be a good reference for their upper-division classes, which might assume that they already know certain concepts. For example, when I teach an upper-division class in motivation and emotion, I assume that my students understand classical and operant conditioning (and I tell them in the syllabus that they should go back to their intro textbook and review the relevant sections).

A downside of this advice is that textbooks are very expensive. Renting a book, or selling one on the used market after the term ends, is a way for students to reduce costs.

Anyway, what this got me wondering is whether it’s still helpful or necessary for students to keep their intro textbooks. Is there enough good info on the internet now that they could just google whatever topics they need to review? A few years ago I looked around on the web for a well-written, introductory-level account of classical conditioning and wasn’t impressed with what I found. I still don’t think I’d assign the current entry for classical conditioning as a review. But with the APS Wikipedia project, for example, maybe things will get better soon.

I remember finding my intro textbook especially helpful when I studied for the psychology GRE, but not many undergrads will go on to do that. Next time I teach an upper-division class I’ll probably ask my students how much use they’ve gotten out of their intro text afterward.

Jennifer Lerner is not on Twitter

Just a quick heads up – somebody set up a fake Twitter account posing as Jennifer Lerner. Don’t be fooled. If you want to follow what the real Jenn Lerner is doing, you can read her journal articles (which seem to come out about as frequently as my tweets).

How should journals handle replication studies?

Recently Ben Goldacre wrote about a group of researchers (Stuart Ritchie, Chris French, and Richard Wiseman) whose null replication of 3 experiments from the infamous Bem ESP paper was rejected by JPSP – the same journal that published Bem’s paper.

JPSP is the flagship journal in my field, and I’ve published in it and I’ve reviewed for it, so I’m reasonably familiar with how it ordinarily works. It strives to publish work that is theory-advancing. I haven’t seen the manuscript, but my understanding is that the Ritchie et al. experiments were exact replications (not “replicate and extend” studies). In the usual course of things, I wouldn’t expect JPSP to accept a paper that only reported exact replication studies, even if their results conflicted with the original study.

However, the Bem paper was extraordinary in several ways. I had two slightly different lines of thinking about JPSP’s rejection.

My first thought was that given the extraordinary nature of the Bem paper, maybe JPSP has a special obligation to go outside of its usual policy. Many scientists think that Bem’s effects are impossible, which created the big controversy around the paper. So in this instance, a null replication has a special significance that usually it would not. That would be especially true if the results reported by Ritchie et al. fell outside of the Bem studies’ replication interval (i.e., if they statistically conflicted; I don’t know whether or not that is thecase).

My second line of thinking was slightly different. Some people have suggested that the Bem paper shines a light on shortcomings of our usual criteria for what constitutes good methodology. Tal Yarkoni made this argument very well. In short: the Bem paper was judged by the same standard that other papers are judged by. So the fact that an effect that most of us consider impossible was able to pass that standard should cause us to question the standard, rather than just attacking the paper.

So by that same line of thinking, maybe the rejection of the Ritchie et al. null replication should make us rethink the usual standards for how journals treat replications. Prior to electronic publication — in an age where journal pages were scarce and expensive — the JPSP policy made sense for a flagship journal that strived to be “theory advancing.” But a consequence of that kind of policy is that exact replication studies are undervalued. Since researchers know from the outset that the more prestigious journals won’t publish exact replications, we have a low incentive to invest time and energy running them. Replications still get run, but often only if a researcher can think of some novel extension, like a moderator variable or a new condition to compare the old ones too. And then the results might only get published if the extension yields a novel and statistically significant result.

But nowadays, in the era of electronic publication, why couldn’t a journal also publish an online supplement of replication studies? Call it “JPSP: Replication Reports.” It would be a home for all replication attempts of studies originally published in the journal. This would have benefits for individual investigators, for journals, and for the science as a whole.

For individual investigators, it would be an incentive to run and report exact replication studies simply to see if a published effect can be reproduced. The market – that is, hiring and tenure committees – would sort out how much credit to give people for publishing such papers, in relation to the more usual kind. Hopefully it would be greater than zero.

For journals, it would be additional content and added value to users of their online services. Imagine if every time you viewed the full text of a paper, there was a link to a catalog of all replication attempts. In addition to publishing and hosting replication reports, journals could link to replicate-and-extend studies published elsewhere (e.g., as a subset of a “cited by” index). That would be a terrific service to their customers.

For the science, it would be valuable to encourage and document replications better than we currently do. When a researcher looks up an article, you could immediately and easily see how well the effect has survived replication attempts. It would also help us organize information better for meta-analyses and the like. It would help us keep labs and journals honest by tracking phenomena like the notorious decline effect and publication bias. In the short term that might be bad for some journals (I’d guess that journals that focus on novel and groundbreaking research are going to show stronger decline curves). But in the long run, it would be another index (alongside impact factors and the like) of the quality of a journal — which the better journals should welcome if they really think they’re doing things right. It might even lead to improvement of some of the problems that Tal discussed. If researchers, editors, and publishers knew that failed replications would be tied around the neck of published papers, there would be an incentive to improve quality and close some methodological holes.

Are there downsides that I’m not thinking of? Probably. Would there be barriers to adopting this? Almost certainly. (At a minimum, nobody likes change.) Is this a good idea? A terrible idea? Tell me in the comments.

Postscript: After I drafted this entry and was getting ready to post it, I came across this article in New Scientist about the rejection. It looks like Richard Wiseman already had a similar idea:

“My feeling is that the whole system is out of date and comes from a time when journal space was limited.” He argues that journals could publish only abstracts of replication studies in print, and provide the full manuscript online.

Dental Pain Management Theory

Abstract from a paper just submitted:

Going to the dentist is a culturally mandated yet deeply unpleasant experience. Dental Pain Management Theory (DPMT) postulates that the pain associated with dental procedures induces a state of cognitive dissonance (conflict between cultural norms and an individual desire to avoid pain), which in turn creates anxiety. The Dental Salience (DS) hypothesis states that people try to buffer the potential for dental anxiety by rejecting cultural worldviews and diminishing their sense of self-importance. A meta-analysis compiled several hundred independent effect sizes from DS experiments that compared the effects of thinking about dental pain versus thinking about something that is nearly as unpleasant but inevitable regardless of either cultural mandates or individual action (i.e., death). On average DS yielded a medium effect size across a range of subject populations and settings. Moderators included self-esteem and whether the dentist had a socially nonconforming haircut. Results are discussed in relation to alternative explanations of DPMT.

(Inspiration here and then here.)

Adventures in Wikipedia editing

The Association for Psychological Science is on a quest to get psychologists to start contributing to Wikipedia. When I first heard about it, I started to write up the story about the one time I decided to wade into Wikipedia a few years ago. It wasn’t pretty: it involved an edit war over the spelling of the word “extraversion,” and although I ultimately prevailed (woohoo!), the effort it required has kept me from going back. But Zick Rubin’s got me beat by a mile:

WHEN I Googled myself last month, I was alarmed to find the following item, from a Wikia.com site on psychology, ranked fourth among the results:

“Zick Rubin (1944-1997) was an American social psychologist.”

This was a little disconcerting. I really was born in 1944 and I really was an American social psychologist. Before I entered law school in midlife, I was a professor of psychology at Harvard and Brandeis and had written books in the field. But, to the very best of my knowledge, I wasn’t dead.

I knew that the report of my death could be bad for business, so I logged into Wikia.com and removed the “1997.” But when I checked a while later, I found the post had reverted to its prior form. I changed it again; again someone changed it back. Apparently the site had its doubts about some lawyer in Boston tinkering with the facts about American psychologists.

In spite of these kinds of episodes, I think it’s probably worth it for us academic psychologists to spend more time on Wikipedia. My impression has been that psychology is not nearly as well represented as more technical disciplines, but given the popularity of the topics we study I bet there are lots of people looking up our stuff. Maybe it’ll even help us recognize when some of our lazier (or less wealthy) students are plagiarizing.

Plus, hey, it’ll keep us from going the way of Abe Vigoda.

Everybody knows that grad school admission interviews don’t tell us anything useful, right? Right?

From time to time I have heard people in my field challenging the usefulness of interviews in grad student selection. It usually is delivered with the weary tone of the evidence-based curmudgeon. (I should note that as an admirer of Paul Meehl and a grand-advisee of Lew Goldberg, who once wrote an article called “Human Mind Versus Regression Equation” in which the regression equation wins, I am often disposed toward such curmudeonry myself.)

The argument usually goes something like this: “All the evidence from personnel selection studies says that interviews don’t predict anything. We are wasting people’s time and money by interviewing grad students, and we are possibly making our decisions worse by substituting bad information for good.”

I have been hearing more or less that same thing for years, starting when I was grad school myself. In fact, I have heard it often enough that, not being familiar with the literature myself, I accepted what people were saying at face value. But I finally got curious about what the literature actually says, so I looked it up.

And given what I’d come to believe over the years, I was a little surprised at what I found.

A little Google Scholaring for terms like “employment interviews” and “incremental validity” led me to a bunch of meta-analyses that concluded that in fact interviews can and do provide useful information above and beyond other valid sources of information (like cognitive ability tests, work sample tests, conscientiousness, etc.). One of the most heavily cited is a 1998 Psych Bulletin paper by Schmidt and Hunter (link is a pdf; it’s also discussed in this blog post). Another was this paper by Cortina et al, which makes finer distinctions among different kinds of interviews. The meta-analyses generally seem to agree that (a) interviews correlate with job performance assessments and other criterion measures, (b) interviews aren’t as strong predictors as cognitive ability, (c) but they do provide incremental (non-overlapping) information, and (d) in those meta-analyses that make distinctions between different kinds of interviews, structured interviews are better than unstructured interviews.

If you look at point “b” above and think that maybe interviews add too little variance to be worth the trouble, my response is: live by the coefficients, die by the coefficients. You’d also have to conclude that we shouldn’t be asking applicants to write about their background or interests in a personal statement, and we shouldn’t be obtaining letters of recommendation. According to Schmidt and Hunter (Table 1), biography, interests, and references all have weaker predictive power than structured interviews. (You might want to justify those things over interviews on a cost-benefit basis, though I’d suggest that they aren’t necessarily cheap either. A personal statement plus 3 reference letters adds up to a lot of person-hours of labor.)

A bigger problem is that if you are going to take an evidence-based approach, your evidence needs to be relevant. Graduate training shares some features with conventional employment, but they are certainly not the same. So I think it is fair to question how well personnel studies can generalize to doctoral admissions. For example, one justification for interviews that I’ve commonly heard is that Ph.D. programs require a lot of close mentoring and productive collaboration. Interviews might help the prospective advisor and advisee evaluate the potential for rapport and shared interests and goals. Even if an applicant is generally well qualified to earn a Ph.D., they might not be a good fit for a particular advisor/lab/program.

That, of course, is a testable question. So if you are an evidence-based curmudgeon, you should probably want some relevant data. I was not able to find any studies that specifically addressed the importance of rapport and interest-matching as predictors of later performance in a doctoral program. (Indeed, validity studies of graduate admissions are few and far between, and the ones I could find were mostly for medical school and MBA programs, which are very different from research-oriented Ph.D. programs.) It would be worth doing such studies, but not easy.

Anyway, if I’m misunderstanding the literature or missing important studies, I hope someone will tell me in the comments. (Personnel selection is not my wheelhouse, but since this is a blog I’m happy to plow forward anyway.) However, based on what I’ve been able to find in the literature, I’m certainly not ready to conclude that admissions interviews are useless.