Does psilocybin cause changes in personality? Maybe, but not so fast

This morning I came across a news article about a new study claiming that psilocybin (the active ingredient in hallucinogenic mushrooms) causes lasting changes in personality, specifically the Big Five factor of openness to experience.

It was hard to make out methodological details from the press report, so I looked up the journal article (gated). The study, by Katherine MacLean, Matthew Johnson, and Roland Griffiths, was published in the Journal of Psychopharmacology. When I read the abstract I got excited. Double blind! Experimentally manipulated! Damn, I thought, this looks a lot better than I thought it was going to be.

The results section was a little bit of a letdown.

Here’s the short version: Everybody came in for 2 to 5 sessions. In session 1 some people got psilocybin and some got a placebo (the placebo was methylphenidate, a.k.a., Ritalin; they also counted as “placebos” some people who got a very low dose of psilocybin in their first session). What the authors report is a significant increase in NEO Openness from pretest to after the last session. That analysis is based on the entire sample of N=52 (everybody got an active dose of psilocybin at least once before the study was over). In a separate analysis they report no significant change from pretest to after session 1 for the n=32 people who got the placebo first. So they are basing a causal inference on the difference between significant and not significant. D’oh!

To make it (even) worse, the “control” analysis had fewer subjects, hence less power, than the “treatment” analysis. So it’s possible that openness increased as much or even more in the placebo contrast as it did in the psilocybin contrast. (My hunch is that’s not what happened, but it’s not ruled out. They didn’t report the means.)

None of this means there is definitely no effect of psilocybin on Openness; it just means that the published paper doesn’t report an analysis that would answer that question. I hope the authors, or somebody else, come back with a better analysis. (A simple one would be a 2×2 ANOVA comparing pretest versus post-session-1 for the placebo-first versus psilocybin-first subjects. A slightly more involved analysis might involve a multilevel model that could take advantage of the fact that some subjects had multiple post-psilocybin measurements.)

Aside from the statistics, I had a few observations.

One thing you’d worry about with this kind of study – where the main DV is self-reported – is demand or expectancy effects on the part of subjects. I know it was double-blind, but they might have a good idea about whether they got psilocybin. My guess is that they have some pretty strong expectations about how shrooms are supposed to affect them. And these are people who volunteered to get dosed with psilocybin, so they probably had pretty positive expectations. I wouldn’t call the self-report issue a dealbreaker, but in a followup I’d love to see some corroborating data (like peer reports, ecological momentary assessments, or a structured behavioral observation of some kind).

On the other hand, they didn’t find changes in other personality traits. If the subjects had a broad expectation that psilocybin would make them better people, you would expect to see changes across the board. If their expectations were focused around Openness-related traits, that’s less relevant.

If you accept the validity of the measures, it’s also noteworthy that they didn’t get higher in neuroticism — which is not consistent with what the government tells you will happen if you take shrooms.

One of the most striking numbers in the paper is the baseline sample mean on NEO Openness — about 64. That is a T-score (normed [such as it is] to have a mean = 50, SD = 10). So that means that in comparison to the NEO norming sample, the average person in this sample was about 1.4 SDs above the mean — which is above the 90th percentile — in Openness. I find that to be a fascinating peek into who volunteers for a psilocybin study. (It does raise questions about generalizability though.)

Finally, because psilocybin was manipulated within subjects, the long-term (one year-ish) followup analysis did not have a control group. Everybody had been dosed. They predicted Openness at one year out based on the kinds of trip people reported (people who had a “complete mystical experience” also had the sustained increase in openness). For a much stronger inference, of course, you’d want to manipulate psilocybin between subjects.

Do not use what I am about to teach you

I am gearing up to teach Structural Equation Modeling this fall term. (We are on quarters, so we start late — our first day of classes is next Monday.)

Here’s the syllabus. (pdf)

I’ve taught this course a bunch of times now, and each time I teach it I add more and more material on causal inference. In part it’s a reaction to my own ongoing education and evolving thinking about causation, and in part it’s from seeing a lot of empirical work that makes what I think are poorly supported causal inferences. (Not just articles that use SEM either.)

Last time I taught SEM, I wondered if I was heaping on so many warnings and caveats that the message started to veer into, “Don’t use SEM.” I hope that is not the case. SEM is a powerful tool when used well. I actually want the discussion of causal inference to help my students think critically about all kinds of designs and analyses. Even people who only run randomized experiments could benefit from a little more depth than the sophomore-year slogan that seems to be all some researchers (AHEM, Reviewer B) have been taught about causation.

A dubious textbook marketing proposal

I got an email the other day:

*****

Dear Professor Srivastava,

My name is [NAME] and I am a consultant working with the [PUBLISHING COMPANY THAT YOU HAVE ALMOST CERTAINLY HEARD OF] team on the new textbook, [TEXTBOOK], by [AUTHOR]. I am emailing to see if you would be interested in class testing a chapter from this new textbook.  In exchange for your class test, [PUBLISHER] will give you a one year membership to the APS as a stipend for your help. This is a $194 value.

If you teach the [COURSE THAT I DON’T ACTUALLY TEACH] course, please read on.

[PUBLISHER] is looking for instructors to class test either of the following chapters:

    Chapter 3: [SOMETHING ABOUT THE BRAIN]
    Chapter 8: [SOMETHING ABOUT THE MIND]

You can integrate the chapter you select into your course as you see fit – we will ask you and your students to fill out a very brief online survey after the class test.

[AUTHOR] is [IMPRESSIVE-SOUNDING LIST OF AWARDS AND CREDENTIALS]

If you would like to be considered for this class test, please click the following link and sign up for the project: [LINK]

This is a terrific way for you to learn about an exciting new textbook for the [COURSE THAT I DON’T ACTUALLY TEACH] course and see if it is a good fit for you and your students.  

I look forward to hearing from you.

[NAME]

Consultant for [PUBLISHER]

*****

This sounds ethically problematic to me, for at least two reasons:

1. It is a conflict of interest. My students are paying tuition money to my employer, and my employer is paying a salary to me, to provide a high-quality education. If I choose course materials based on outside financial compensation rather than what I think is the best for their education, that is a conflict of interest.

2. My students would be forced to participate in a marketing study without their consent. In response to my query, the consultant said the students would not be paid. But compensation or no, I can see no practical way to incorporate these materials into the course and still allow students to fully opt out. Even if students choose not to fill out the survey, it is still shaping the content of their course.

I suppose I could make the test readings optional, spend no classroom time on them, base no assignments or test questions on them, and fully disclose the arrangement to my students. But my experience of college students and non-required reading assignments tells me that exactly nobody would do the reading or fill out the survey, unless they thought it would curry favor with me (so maybe the disclosure is a bad idea). I don’t imagine that is what the consultant has in mind.

It is possible that I have misconstrued an important part of this invitation. So I have offered the emailer to write a response, and if he does I will post it. I’ve also decided to redact the identifying details. I realize that lowers the probability of getting a response, but my purpose is to make it known that this kind of thing goes on  — not to embarrass the specific parties involved.

The usability of statistics; or, what happens when you think that (p=.05) != (p=.06)

The difference between significant and not significant is not itself significant.

That is the title of a 2006 paper by statisticians Andrew Gelman and Hal Stern. It is also the theme of a new review article in Nature Neuroscience by Sander Nieuwenhuis, Birte U Forstmann, and Eric-Jan Wagenmakers (via Gelman’s blog). The review examined several hundred papers in behavioral, systems, and cognitive neuroscience. Of all the papers that tried to compare two effects, about half of them made this error instead of properly testing for an interaction.

I don’t know how often the error makes it through to published papers in social and personality psychology, but I see it pretty regularly as a reviewer. I call it when I see it; sometimes other reviewers call it out too, sometimes they don’t.

I can also remember making this error as a grad student – and my advisor correcting me on it. But the funny thing is, it’s not something I was taught. I’m quite sure that nowhere along the way did any of my teachers say you can compare two effects by seeing if one is significant and the other isn’t. I just started doing it on my own. (And now I sometimes channel my old advisor and correct my own students on the same error, and I’m sure nobody’s teaching it to them either.)

If I wasn’t taught to make this error, where was I getting it from? When we talk about whether researchers have biases, usually we think of hot-button issues like political bias. But I think this reflects a more straightforward kind of bias — old habits of thinking that we carry with us into our professional work. To someone without scientific training, it seems like you should be able to ask “Does X cause Y, yes or no?” and expect a straightforward answer. Scientific training teaches us a couple of things. First, the question is too simple: it’s not a yes or no question; the answer is always going to come with some uncertainty; etc. Second, the logic behind the tool that most of us use – null hypothesis significance testing (NHST) – does not even approximate the form of the question. (Roughly: “In a world where X has zero effect on Y, would we see a result this far from the truth less than 5% of the time?”)

So I think what happens is that when we are taught the abstract logic of what we are doing, it doesn’t really pervade our thinking until it’s been ground into us through repetition. For a period of time – maybe in some cases forever – we carry out the mechanics of what we have been taught to do (run an ANOVA) but we map it onto our old habits of thinking (“Does X cause Y, yes or no?”). And then we elaborate and extrapolate from them in ways that are entirely sensible by their own internal logic (“One ANOVA was significant and the other wasn’t, so X causes Y more than it causes Z, right?”).

One of the arguments you sometimes hear against NHST is that it doesn’t reflect the way researchers think. It’s a sort of usability argument: NHST is the butterfly ballot of statistical methods. In principle, I don’t think that argument carries the day on its own (if we need to use methods and models that don’t track our intuitions, we should). But it should be part of the discussion. And importantly, the Nieuwenhuis et al. review shows us how using unintuitive methods can have real consequences.