Recently I posted a comment on a PLOS ONE article for the first time. As someone who had a decent chunk of his career before post-publication peer review came along — and has an even larger chunk of his career left with it around — it was an interesting experience.
It started when a colleague posted an article to his Facebook wall. I followed the link out of curiosity about the subject matter, but what immediately jumped out at me was that it was a 4-study sequence with pretty small samples. (See Uli Schimmack’s excellent article The ironic effect of significant results on the credibility of multiple-study articles [pdf] for why that’s noteworthy.) That got me curious about effect sizes and power, so I looked a little bit more closely and noticed some odd things. Like that different N’s were reported in the abstract and the method section. And when I calculated effect sizes from the reported means and SDs, some of them were enormous. Like Cohen’s d > 3.0 level of enormous. (If all this sounds a little hazy, it’s because my goal in this post is to talk about my experience of engaging in post-publication review — not to rehash the details. You can follow the links to the article and comments for those.)
In the old days of publishing, it wouldn’t have been clear what to do next. In principle many psych journals will publish letters and comments, but in practice they’re exceedingly rare. Another alternative would have been to contact the authors and ask them to write a correction. But that relies on the authors agreeing that there’s a mistake, which authors don’t always do. And even if authors agree and write up a correction, it might be months before it appears in print.
But this article was published in PLOS ONE, which lets readers post comments on articles as a form of post-publication peer-review (PPPR). These comments aren’t just like comments on some random website or blog — they become part of the published scientific record, linked from the primary journal article. I’m all in favor of that kind of system. But it brought up a few interesting issues for how to navigate the new world of scientific publishing and commentary.
1. Professional etiquette. Here and there in my professional development I’ve caught bits and pieces of a set of gentleman’s rules about scientific discourse (and yes, I am using the gendered expression advisedly). A big one is, don’t make a fellow scientist look bad. Unless you want to go to war (and then there are rules for that too). So the old-fashioned thing to do — “the way I was raised” — would be to contact the authors quietly and petition them to make a correction themselves, so it could look like it originated with them. And if they do nothing, probably limit my comments to grumbling at the hotel bar at the next conference.
But for PPPR to work, the etiquette of “anything public is war” has to go out the window. Scientists commenting on each other’s work needs to be a routine and unremarkable part of scientific discourse. So does an understanding that even good scientists can make mistakes. And to live by the old norms is to affirm them. (Plus, the authors chose to submit to a journal that allows public comments, so caveat author.) So I elected to post a comment and then email the authors to let them know, so they would have a chance to respond quickly if they weren’t monitoring the comments. As a result, the authors posted several comments over the next couple of days correcting aspects of the article and explaining how the errors happened. And they were very responsive and cordial over email the entire time. Score one for the new etiquette.
2. A failure of pre-publication peer review? Some of the issues I raised in my comment were indisputable factual inconsistencies — like that the sample sizes were reported differently in different parts of the paper. Others were more inferential — like that a string of significant results in these 4 studies was significantly improbable, even under a reasonable expectation of an effect size consistent with the authors’ own hypothesis. A reviewer might disagree about that (maybe they think the true effect really is gigantic). Other issues, like the too-small SDs, would have been somewhere in the middle, though they turned out to be errors after all.
Is this a mark against pre-publication peer review? Obviously it’s hard to say from one case, but I don’t think it speaks well of PLOS ONE that these errors got through. Especially because PLOS ONE is supposed to emphasize “a high technical standard” and reporting of “sufficient detail” (the reason I noticed the issue with the SDs was because the article did not report effect sizes).
But this doesn’t necessarily make PLOS ONE worse than traditional journals like Psychological Science or JPSP, where similar errors get through all the time and then become almost impossible to correct. [UPDATE: Please see my followup post about pre-publication review at PLOS ONE and other journals.]
3. The inconsistency of post-publication peer review. I don’t think post-publication peer review is a cure-all. This whole episode depended in somebody (in this case, me) noticing the anomalies and being motivated to post a comment about them. If we got rid of pre-publication peer review and if the review process remained that unsystematic, it would be a recipe for a very biased system. This article’s conclusions are flattering to most scientists’ prejudices, and press coverage of the article has gotten a lot of mentions and “hell yeah”s on Twitter from pro-science folks. I don’t think it’s hard to imagine that that contributed to it getting a pass, and that if the opposite were true the article would have gotten a lot more scrutiny both pre- and post-publication. In my mind, the fix would be to make sure that all articles get a decent pre-publication review — not to scrap it altogether. Post-publication review is an important new development but should be an addition, not a replacement.
4. Where to stop? Finally, one issue I faced was how much to say in my initial comment, and how much to follow up. In particular, my original comment made a point about the low power and thus the improbability of a string of 4 studies with a rejected null. I based that on some hypotheticals and assumptions rather than formally calculating Schimmack’s incredibility index for the paper, in part because other errors in the initial draft made that impossible. The authors never responded to that particular point, but their corrections would have made it possible to calculate an IC index. So I could have come back and tried to goad them into a response. But I decided to let it go. I don’t have an axe to grind, and my initial comment is now part of the record. And one nice thing about PPPR is that readers can evaluate the arguments for themselves. (I do wish I had cited Schimmack’s paper though, because more people should know about it.)