Two obviously wrong statements about personality and political ideology

On the heels of yesterday’s post about the link between religiosity and conservatism, I came across a New York Magazine article discussing recent research on personality, genetics, and political ideology. The article summarizes a lot of really interesting work by John Jost on ideology, Jonathan Haidt on moral foundations, David Pizarro on emotional responses and politics, etc. etc. But when it says things like…

Over the past few years, researchers haven’t just tied basic character traits to liberalism and conservatism, they’ve begun to finger specific genes they say hard-wire those ideologies.

… I just cringe. Research on personality and genetics does not support the conclusion that ideology is hard-wired, any more than our work on how political discourse ties religiosity to politics shows that ideology is a blank-slate social artifact.

Any attempt to understand the role of personality and genetics in political attitudes and ideology will have to avoid endorsing 2 obviously wrong conclusions:

1. Ideology and political attitudes have nothing to do with personality or genes.

2. Genes code for ideology and political attitudes in a clear, unconditional way.

Maybe in some distal and complex way our genes code for variations in how different psychological response systems work — under what conditions they are more and less active, how sensitive they are to various inputs, how strongly they produce their various responses, etc. In situ, these individual differences are going to interact with things like how messages are framed, how they are presented in conjunction with other information and stimuli, who is presenting the information, what we think the leaders and fellow members of our important social groups think and feel, etc.

What this interactivity means for doing science is that if you hold one thing constant (whether by experimental control or by averaging over differences) and let the other one vary, you will find an effect of the one you let vary. For example, if you look at how different people respond to the same set of sociopolitical issues, you are going to get reliable patterns of different responses that reflect people’s personalities. And if you frame and present the same issue in several different ways, and measure the average effect of the different framings, you are going to get different average responses that reflect message effects. Both are interesting experimental results, but both are testing only pieces of a plausible theoretical model.

Most researchers know this, I think. For example, from the NYMag article:

Fowler laughs at the idea that he had isolated a single gene responsible for liberalism—an idea circulated in much of the chatter about the study. “There are hundreds if not thousands of genes that are all interacting to affect complex social behaviors,” Fowler says, and scientists have only a rough sense of that process. “There’s a really long, complex causal chain at work here,” says UC-Berkeley political scientist Laura Stoker, “and we won’t get any real understanding without hundreds and hundreds of years’ more research.”

Let’s stay away from lazy and boring concepts like hard-wired. The real answers are going to be a lot more interesting.

Where does the link between religiosity and conservatism come from?

My collaborator Ari Malka has an op-ed titled Are religious Americans always conservative?

Why, then, does religiosity relate to conservatism at all? One possibility is that there is some type of organic connection between being a religious person and being a conservative person. Perhaps the traits, moral standards and ways of thinking that characterize religious people also naturally lead them to prefer conservative social outcomes and policies. Another possibility, however, is that this relation really has to do with the messages from political and religious discourse, and how some people respond to these messages.

Two pieces of evidence support this latter explanation…

The evidence comes from a new paper we have out in Political Psychology. Here’s the abstract:

Some argue that there is an organic connection between being religious and being politically conservative. We evaluate an alternative thesis that the relation between religiosity and political conservatism largely results from engagement with political discourse that indicates that these characteristics go together. In a combined sample of national survey respondents from 1996-2008, religiosity was associated with conservative positions on a wide range of attitudes and values among the highly politically engaged, but this association was generally weaker or nonexistent among those less engaged with politics. The specific political characteristics for which this pattern existed varied across ethno-religious groups. These results suggest that whether religiosity translates into political conservatism depends to an important degree on level of engagement with political discourse.

Malka, A., Lelkes, Y., Srivastava, S., Cohen, A. B., & Miller, D. T. (2012). The association of religiosity and political conservatism: The role of political engagement. Political Psychology, 33, 275-299.

Some reflections on the Bargh-Doyen elderly walking priming brouhaha

Recently a controversy broke out over the replicability of a study John Bargh et al. published in 1996. The study reported that unconsciously priming a stereotype of elderly people caused subjects to walk more slowly. A recent replication attempt by Stephane Doyen et al., published in PLoS ONE, was unable to reproduce the results. (Less publicized, but surely relevant, is another non-replication by Hal Pashler et al.) Ed Yong wrote up an article about it  in Discover, which last week drew a sharp response from Bargh.

The broader context is that there has been a large and ongoing discussion about replication in psychology (i.e., that there isn’t enough of it). I don’t have much to say about whether the elderly-walking effect is real. But this controversy has raised a number of issues about scientific discourse online as well as about how we think about replication.

The discussion has been unnecessarily inflammatory – on all sides. Bargh has drawn a lot of criticism for his response, which among other things included factual errors about PLoS ONE, suggestions that Doyen et al. were “incompetent or ill-informed,” and a claim that Yong was practicing irresponsible journalism. The PLoS ONE editors posted a strongly worded but civil response in the comments, and Yong has written a rebuttal. As for the scientific issue — is the elderly-priming effect real? — Daniel Simons has written an excellent post on the many, many reasons why an effect might fail to replicate. A failure to replicate does not need to impeach the honesty or scientific skills of either the original researcher or the replicator. It does not even mean the effect is not real. In an ideal world, Bargh should have treated the difference between his results and those of Doyen et al. as a puzzle to be worked out, not as a personal attack to be responded to in kind.

But… it’s not as though Bargh went bananas over a dispassionate report of a non-replication. Doyen et al. strongly suggested that Bargh et al.’s procedure had been contaminated by expectancy effects. Since expectancy effects are widely known about in behavioral science (raise your hand if you have heard the phrase “double-blind”), the implication was that Bargh had been uncareful. And Ed Yong ran with that interpretation by leading off his original piece with the tale of Clever Hans. I don’t know whether Doyen or Yong meant to be inflammatory: I know nothing about Doyen; and in Yong’s case, based on his journalistic record, I doubt it (and he apparently gave Bargh plenty of opportunity to weigh in before his original post went live). But wherever you place the blame, a scientifically unfortunate result is that all of the other reasonable possibilities that Simons lists have been mostly ignored by the principals in this discussion.

Are priming effects hard to produce or easy? A number of priming researchers have suggested that priming effects are hard to get reliably. This doesn’t mean they aren’t important — experiments require isolation of the effect of interest, and the ease of isolating a phenomenon is not the same thing as its importance. (Those Higgs bosons are so hard to detect — so even if they exist they must not matter, right?) Bargh makes this point in his response too, suggesting that if Doyen et al. accidentally called subjects’ conscious attention to the elderly stereotype, that could wash out the effect (because conscious attention can easily interfere with automatic processes).

That being said… the effects in the original Bargh et al. report were big. Really big, by psychology standards. In experiment 2a, Bargh et al. report t(28) = 2.86, which corresponds to an effect size of d = 1.08. And in their replication, experiment 2b, they report t(28) = 2.16, which translates to d = 0.82. So even if we account for some shrinkage, under the right conditions it should not be hard for somebody to reproduce the elderly-walking priming effect in a new study.

The expectancy effects study is rhetorically powerful but proves little. In their Experiment 1, Doyen et al. tested the same hypothesis about priming stereotypes that Bargh tested. But in Experiment 2, Doyen et al. tested a hypothesis about experimenter expectancies. That is a completely different hypothesis. The second study tells us that experimenter expectancies can affect walking speed. But walking speed surely can be affected by more than one thing. So Experiment 2 does not tell us to what extent, if any at all, differences in walking speed were caused by experimenter expectancies in Bargh’s experiment (or for that matter, anywhere else in the natural world outside of Doyen’s lab). This is the inferential error of confusing causes of effects with effects of causes. Imagine that Doyen et al. had clubbed the subjects in the elderly-prime condition in the knee; most likely that would have slowed them down. But would we take that as evidence that Bargh et al. had done the same?

The inclusion of Experiment 2 served a strong rhetorical function, by planting in the audience’s mind the idea that the difference between Bargh vs Doyen Exp 1 was due to expectancy effects (and Ed Yong picked up and ran with this suggestion by referring to Clever Hans). But scientifically, all it shows is that expectancy effects can influence the dependent variable in the Bargh experiment. That’s not nothing, but anybody who already believes that experiments need to be double-blind should have seen that coming. If we had documentary evidence that in the actual 1996 studies Bargh et al. did not actually eliminate expectancy effects, that would be relevant. (We likely never will have such evidence; see next point.) But Experiment 2 does not shed nearly as much light as it appears to.

We need more openness with methods and materials. When I started off in psychology, someone once told me that a scientific journal article should contain everything you need to reproduce the experiment (either directly or via references to other published materials). That, of course, is almost never true and maybe is unrealistic. Especially when you factor in things like lab skills, many of which are taught via direct apprenticeship rather than in writing, and which matter just as much in behavioral experiments as they do in more technology-heavy areas of science.

But with all that being said, I think we could do a lot better. A big part of the confusion in this controversy is over the details of methods — what exactly did Bargh et al. do in the original study, and how closely did Doyen et al. reproduce the procedure? The original Bargh et al. article followed the standards of its day in how much methodological detail it reported. Bargh later wrote a methods chapter that described more details of the priming technique (and which he claims Doyen et al. did not follow). But in this era of unlimited online supplements, there is no reason why in future studies, all of the stimuli, instructions, etc. could not be posted. That would enormously aid replication attempts.

What makes for a “failed” replication? This turns out to be a small point in the present context but an important one in a more general sense, so I couldn’t help but make it. We should be very careful about the language of “successful” and “failed” replications when it is based on the difference between p<.05 and p>.05. That is, just because the original study could reject the null and the replication could not, that doesn’t mean that the replication is significantly different from the original study. If you are going to say you failed to replicate the original result, you should conduct a test of that difference.

As far as I can tell neither Doyen et al. nor Pashler et al. did that. So I did. I converted each study’s effect to an r effect size and then comparing the studies with a z test of the difference between independent rs, and indeed Doyen et al. and Pashler et al. each differed from Bargh’s original experiments. So this doesn’t alter the present discussion. But as good practice, the replication reports should have reported such tests.

Secular trends in publication bias

Abstract of Negative results are disappearing from most disciplines and countries (PDF) by Daniele Fanelli, Scientometrics, 2012 (thanks to Brent Roberts for forwarding it):

Concerns that the growing competition for funding and citations might distort science are frequently discussed, but have not been verified directly. Of the hypothesized problems, perhaps the most worrying is a worsening of positive-outcome bias. A system that disfavours negative results not only distorts the scientific literature directly, but might also discourage high-risk projects and pressure scientists to fabricate and falsify their data. This study analysed over 4,600 papers published in all disciplines between 1990 and 2007, measuring the frequency of papers that, having declared to have ‘‘tested’’ a hypothesis, reported a positive support for it. The overall frequency of positive supports has grown by over 22% between 1990 and 2007, with significant differences between disciplines and countries. The increase was stronger in the social and some biomedical disciplines. The United States had published, over the years, significantly fewer positive results than Asian countries (and particularly Japan) but more than European countries (and in particular the United Kingdom). Methodological artefacts cannot explain away these patterns, which support the hypotheses that research is becoming less pioneering and/or that the objectivity with which results are produced and published is decreasing.

My reactions…

Sarcastic: together with the Flynn Effect this is clearly a sign that we’re getting smarter.

Not: There is no single solution to this problem, but my proposal is something you could call the Pottery Barn Rule for journals. Once a journal publishes a study, it should be obliged to publish any and all exact or near-exact replication attempts in an online supplement, and link to such attempts from the original article. That would provide a guaranteed outlet for people to run exact replication attempts, something we do not do nearly enough of. And it would create an incentive for authors, editors, and publishers to be rigorous since non-replications would be hung around the original article’s neck. (And if nobody bothers to try to replicate the study, that would probably be a sign of something too.)

Personality as a target for interventions and public policy

A friend just passed along an article, Is Personality Fixed? Personality Changes as Much as ‘‘Variable’’ Economic Factors and More Strongly Predicts Changes to Life Satisfaction by Christopher J. Boyce, Alex M. Wood, and Nattuvadh Powdthavee, in Social Indicators Research:

Personality is the strongest and most consistent cross-sectional predictor of high subjective well-being. Less predictive economic factors, such as higher income or improved job status, are often the focus of applied subjective well-being research due to a perception that they can change whereas personality cannot. As such there has been limited investigation into personality change and how such changes might bring about higher well-being. In a longitudinal analysis of 8625 individuals we examine Big Five personality measures at two time points to determine whether an individual’s personality changes and also the extent to which such changes in personality can predict changes in life satisfaction. We find that personality changes at least as much as economic factors and relates much more strongly to changes in life satisfaction. Our results therefore suggest that personality can change and that such change is important and meaningful. Our findings may help inform policy debate over how best to help individuals and nations improve their well-being.

I once saw a talk by a marital-interventions researcher whose work showed strong and stable individual differences in marital quality growth curves, and very little that could predict the slopes of those curves (including marital therapy!). Yet when asked whether this suggested that he should be looking in greater depth at personality, he shied away from it. He said it’s not that he thought personality doesn’t matter, but he wanted to study things he could intervene with. This is not an unusual attitude I’ve encountered, especially among some social and clinical psychologists.

From my perspective, first of all, even if that were right, wouldn’t it be important to know the boundaries of what an intervention could do? And second of all, that’s a preconception about personality rather than an empirical finding. More and more research (including Jim Heckman‘s work on early interventions) is calling that preconception into question.

We still have a lot to learn about changing personality. But growing evidence is raising the possibility that by identifying personality antecedents of important life outcomes, you can learn more about what you should try to change, rather than what you cannot change. As long as intervention and policy researchers stick to the view that personality is unchangeable, though, that could remain a missed opportunity.

Does your p-curve weigh as much as a duck?

Over at Psych Your Mind, Michael Kraus bravely reports the results of a p-curve analysis of his own publications.

p-curves were discussed by Uri Simonsohn at an SPSP symposium on false-positive findings (which I missed but got to read up about thanks to Kraus; many of the authors of the false-positive psychology paper were involved). Simonsohn has a paper forthcoming with details of the method. But the basic idea is that you should be able to tell if somebody is mining their data for significant findings by examining the distribution of p-values in their published work. A big spike of .049s and not enough <.01s could be the result of cherry-picking.

In a thoughtful but sometimes-heated discussion on the SPSP email list between Norbert Schwarz and the symposium participants, Schwarz argues — and I agree — that although p-curve analyses could be a useful tool, they will need to be interpreted cautiously. For example, Schwarz thinks that at this stage it would be inappropriate to base hiring decisions on candidates’ p-curves, something that Simonsohn apparently suggested in his talk.

A big part of the interpretive task is going to be that, as with any metric, users will have to accumulate data and build up some practical wisdom in figuring out how to interpret and apply it. Or to get a little jargony, we’ll have to do some construct validation. In particular, I think it will be crucial to remember that even though you could calculate a p-curve on a single researcher, the curve is not a property of the researcher. Rather, it will reflect the interaction of the researcher with history and context. Even setting aside measurement and sampling error, substantive factors like the incentives and practices set by publishers, granting agencies, and other powerful institutions; differing standards of different fields and subfields (e.g., in their use of NHST, in what people honestly believe and teach as acceptable practices); who the researcher was trained by and has collaborated with, etc. will affect researchers’ p-curves. Individual researchers are an important part of the picture, of course, but it would be a mistake to apply an overly simplistic model of where p-curves come from. (And of course they don’t have to be applied to individuals at all — they could be applied to literatures, to subfields, to journals, or really any way of categorizing publications).

One thing that both Schwarz and Simonsohn seem to agree on is that everybody has probably committed some or many of these errors, and we won’t make much progress unless people are willing to subject themselves to perhaps-painful soul searching. Schwarz in particular fears for a “witch hunt” atmosphere that could make people defensive and ultimately be counterproductive.

So hats off to Kraus for putting himself on the line. I’ll let you read his account and draw your own conclusions, but I think he’s impressively frank, especially for someone that early in his career. Speaking for myself, I’m waiting for Simonsohn’s paper so I can learn a little more about the method before trying it on my own vita. In the mean time I’m glad at least one of my papers has this little bit of p-curve kryptonite:

The p-values associated with the tests of the polynomial models are generally quite small, some so small as to exceed the computational limits of our data analysis software (SPSS 10.0.7, which ran out of decimal places at p < 10e–22).

Whew!

A collection of links to commentary on the Diederik Stapel fraud

I was asked to talk to a research ethics seminar about the Diederik Stapel fraud case. I pulled together a few links to circulate to the class, but I thought I’d put up a blog post and see if anyone has any other suggestions. I am especially interested in commentary and recommendations (rather than straight news coverage).

Here is what I have so far:

Jennifer Crocker ponders the slippery slope from corner-cutting to outright fraud (see also this short version)

Jelte Wicherts proposes that mandatory data sharing could prevent fraud. (And see his empirical study of data-sharing and research quality.)

Brent Roberts offers a long list of problematic practices in psychology, including undervaluing replication (see my thoughts on replication here), selective reporting, HARKing, and more.

Interviews with various psychologists (esp. Eric-Jan Wagenmakers) about problems with NHST and valuing surprising/counterintuitive findings

Andrew Gelman compares Stapel to other cheaters

Yours truly on whether our top journals have incoherent missions that produce perverse incentives.

Maybe this is all because psychology isn’t a real science. Benedict Carey of the New York Times, editorializing in a news story, suggests that psychology badly needs an overhaul. Hank Carey of Science 2.0 thinks social psychology is too fuzzy. Andrew Ferguson of the Weekly Standard detects a tendency of journalists and psychologists to like gimmicky-but-meaningless findings, which he calls The Chump Effect.

Did the liberal / progressive message of Stapel’s work help it escape scrutiny? Retraction Watch suggests the possibility; Rush Limbaugh has no doubt.

What other commentary and recommendations are floating around?

Update 8/15/2012: Add this commentary by Jenny Crocker and Lynne Cooper to the list: Crocker, J., & Cooper, M.L. (2011). Addressing scientific fraud.  Science, 334, 1182.

An editorial board discusses fMRI analysis and “false-positive psychology”

Update 1/3/2012: I have seen a few incoming links describing the Psych Science email discussion as “leaked” or “made public.” For the record, the discussion was forwarded to me from someone who got it from a professional listserv, so it was already out in the open and circulating before I posted it here. Considering that it was carefully redacted and compiled for circulation by the incoming editor-in-chief, I don’t think “leaked” is a correct term at all (and “made public” happened before I got it).

***

I recently got my hands on an email discussion among the Psychological Science editorial board. The discussion is about whether or how to implement recommendations by Poldrack et al. (2008) and Simmons, Nelson, and Simonsohn (2011) for research methods and reporting. The discussion is well worth reading and appears to be in circulation already, so I am posting it here for a wider audience. (All names except the senior editor, John Jonides, and Eric Eich who compiled the discussion, were redacted by Eich; commenters are instead numbered.)

The Poldrack paper proposes guidelines for reporting fMRI experiments. The Simmons paper is the much-discussed “false-positive psychology” paper that was itself published in Psych Science. The argument in the latter is that slippery research and reporting practices can produce “researcher degrees of freedom” that inflate Type I error. To reduce these errors, they make 6 recommendations for researchers and 4 recommendations for journals to reduce these problems.

There are a lot of interesting things to come out of the discussion. Regarding the Poldrack paper, the discussion apparently got started when a student of Jonides analyzed the same fMRI dataset under several different defensible methods and assumptions and got totally different results. I can believe that — not because I have extensive experience with fMRI analysis (or any hands-on experience at all), but because that’s true with any statistical analysis where there is not strong and widespread consensus on how to do things. (See covariate adjustment versus difference scores.)

The other thing about the Poldrack discussion that caught my attention was commenter #8, who asked that more attention be given to selection and determination of ROIs. S/he wrote:

We, as psychologists, are not primarily interested in exploring the brain. Rather, we want to harness fMRI to reach a better understanding of psychological process. Thus, the choice of the various ROIs should be derived from psychological models (or at least from models that are closely related to psychological mechanisms). Such a justification might be an important editorial criterion for fMRI studies submitted to a psychological journal. Such a psychological model might also include ROIs where NO activity is expected, control regions, so to speak.

A.k.a. convergent and discriminant validity. (Once again, the psychometricians were there first.) A lot of research that is billed (in the press or in the scientific reports themselves) as reaching new conclusions about the human mind is really, when you look closely, using established psychological theories and methods as a framework to explore the brain. Which is a fine thing to do, and in fact is a necessary precursor to research that goes the other way, but shouldn’t be misrepresented.

Turning to the Simmons et al. piece, there was a lot of consensus that it had some good ideas but went too far, which is similar to what I thought when I first read the paper. Some of the Simmons recommendations were so obviously important that I wondered why they needed to be made at all, because doesn’t everybody know them already? (E.g., running analyses while you collect data and using p-values as a stopping rule for sample size — a definite no-no.) The fact that Simmons et al. thought this needed to be said makes me worried about the rigor of the average research paper. Other of their recommendations seemed rather rigid and targeted toward a pretty small subset of research designs. The n>20 rule and the “report all your measures” rule might make sense for small-and-fast randomized experiments of the type the authors probably mostly do themselves, but may not work for everything (case studies, intensive repeated-measures studies, large multivariate surveys and longitudinal studies, etc.).

Commenter #8 (again) had something interesting to say about a priori predictions:

It is always the educated reader who needs to be persuaded using convincing methodology. Therefore, I am not interested in the autobiography of the researcher. That is, I do not care whether s/he has actually held the tested hypothesis before learning about the outcomes…

Again, an interesting point. When there is not a strong enough theory that different experts in that theory would have drawn the same hypotheses independently, maybe a priori doesn’t mean much? Or put a little differently: a priori should be grounded in a publicly held and shared understanding of a theory, not in the contents of an individual mind.

Finally, a general point that many people made was that Psych Science (and for that matter, any journal nowadays) should make more use of supplemental online materials (SOM). Why shouldn’t stimuli, scripts, measures, etc. — which are necessary to conduct exact replications — be posted online for every paper? In current practice, if you want to replicate part or all of someone’s procedure, you need to email the author. Reviewers almost never have access to this material, which means they cannot evaluate it easily. I have had the experience of getting stimuli or measures for a published study and seeing stuff that made me worry about demand characteristics, content validity, etc. That has made me wonder why reviewers are not given the opportunity to closely review such crucial materials as a matter of course.

Journals can be groundbreaking or definitive, not both

I was recently invited to contribute to Personality and Social Psychology Connections, an online journal of commentary (read: fancy blog) run by SPSP. Don Forsyth is the editor, and the contributors include David Dunning, Harry Reis, Jennifer Crocker, Shige Oishi, Mark Leary, and Scott Allison. My inaugural post is titled “Groundbreaking or definitive? Journals need to pick one.” Excerpt:

Do our top journals need to rethink their missions of publishing research that is both groundbreaking and definitive? And as a part of that, do they — and we scientists — need to reconsider how we engage with the press and the public?…

In some key ways groundbreaking is the opposite of definitive. There is a lot of hard work to be done between scooping that first shovelful of dirt and completing a stable foundation. And the same goes for science (with the crucial difference that in science, you’re much more likely to discover along the way that you’ve started digging on a site that’s impossible to build on). “Definitive” means that there is a sufficient body of evidence to accept some conclusion with a high degree of confidence. And by the time that body of evidence builds up, the idea is no longer groundbreaking.

Read it here.

 

What the Heck is Research Anyway? (A guest post by Brent Roberts)

Brent Roberts recently showed me a copy of this essay he wrote to explain to family members what he does for a living. I thought it would make a neat holiday-themed entry on the blog (a link to forward in response to “it must be so nice to have almost a month off between semesters!”). So I asked him if I could put it up as a guest post, and he kindly agreed. 

Recently, I was asked for the 17th time[1] by a family member, “So, what are you going to do this summer?”  As usual, I answered, “research.”  And, as usual, I was met with that quizzical look that says, “What the heck is research anyway?”

It struck me in retrospect that I’ve done a pretty poor job of describing what research is to my family and friends.  So, I thought it might be a good idea to write an open letter that tries explaining research a little better.  You deserve an explanation.  So do other people, like parents of students and the general public.  You all pay a part of our salary, either through your taxes or the generous support of your kid’s education, and therefore should know where your money goes.

First, I should apologize if my reaction to the question “What are you going to do this summer?” has been less than positive in the past.  It is hard not to react negatively.  Because when asked this question it is hard not to interpret it as really asking “Hey, you’re a teacher, and now that you are done teaching, what the heck are you going to do with yourself?”  Since scientific research is typically the majority of the work we do in the professoriate we tend to chafe at seeing our job pigeonholed in such a way.  In fact, when we are asked “are you done for the summer?”, we typically think to ourselves “I’m going to get a boat-load of research done, like four papers, two grants, and some progress made on my book, along with starting several new projects.”  In other words, we typically think along the lines of “I’m going to work my tail off this summer because I’m finally free of those teaching and service obligations which take me away from what I love to do and for that matter what I mostly get paid to do.”

Let me expand on that latter point a little before delving into what I mean by scientific research. As a professor at a major research university I am paid to do three things: Research, teaching, and service.  On the teaching side of things, we often teach what appears to be an appallingly small number of classes.  That said much of our teaching is done in the old-fashioned artisan-apprentice fashion—one-on-one with students.  We have countless meetings throughout our week outside of the classroom working with undergraduate and graduate students, and post-doctoral researchers teaching them how to do research.  In terms of service, we are tasked with helping to run our department and university, and with running the guilds to which we belong.  I can expand on that later if you like.  That said, one thing you may not have known is that at major research universities teaching and research service constitute less than 50% of our job description, combined.  You may expect us to take summers and winter breaks off, but our universities are smiling as we apply ourselves to what they hired us to do, research—often when they are not even paying us.  There’s nothing like free labor[2].

So what is research anyway?  Let me answer a slightly different question that my wife’s aunt asked recently as it will help frame the answer.  She asked, “What purpose does research serve?” Now there is probably less consensus on the answer to this question than I’d like, but ultimately, I think the answer is knowledge.  Research is supposed to provide knowledge that can be used by others and hopefully the broader society.  To illustrate, let me describe the number of ways in which the knowledge we generate might be used.

The most common way that the knowledge we create is used is by other researchers.  This is what you’ll hear described as “basic” research because it may or may not have a direct applied purpose.  This is about all most researchers can aspire to.  We are pretty happy if other researchers not only read our work, but also draw on it to inform their research too.  This is important because the knowledge we generate is not only read, but also built upon and extended in meaningful ways by others.  The next way our knowledge is used – and ultimately the way our research will most likely influence society – is through teaching.  Yes, our research hopefully gets incorporated into the classroom because it is summarized in textbooks or our original research articles are assigned as core reading.  In this way, our research forms the material that thousands of students learn in order to make themselves better-informed citizens who hopefully go on to be productive members of society.  Finally, our research might be used for more practical aims like shaping social policy set by State and Federal authorities, informing decisions made by employers or other organizations, or helping practitioners treat illness.  For example, recently a Nobel Prize winning economist discovered our work on the personality dimension of conscientiousness (being self-controlled, responsible, and organized).  He has conducted rigorous work demonstrating the importance of this psychological attribute to human potential, and has started lobbying congress and federal funding agencies to focus on how we can teach kids to be more conscientious.  Similarly, other scholars conduct research on how best to characterize psychopathology and, in turn, how that might affect the way we treat patients.  Not many researchers ever get this level of influence, but you see it in medical breakthroughs and engineering accomplishments on a regular basis.  So, ultimately, we do research to provide usable knowledge back to society.  At least, that’s my opinion.

So what do we do when we do this thing called research?  I can’t speak for all types of scientists, but here are what I believe to be the basic phases of the generic research project:

  1. We are posed with a problem, challenge, riddle, or question that needs to be solved or answered.  For example, Teresa might ask:  “How can an employer help workers to see work as more meaningful?”
  2. We come up with a method for answering the question.
  3. We assemble the tools and resources needed to conduct our research.
  4. We run the study intended to answer our question.
  5. We analyze the data that comes from our study.
  6. We write up our findings and send the paper off to a journal where it is reviewed by several (typically three) anonymous peers who, along with the journal editor, decide whether the way we answered the question provides an adequate answer and thus provides an incremental advancement to our knowledge.  If they think we did add something to the knowledge pool, then Hallelujah, our work gets published.

I know, that all sounds a little abstract.  So let me walk you through these steps in a little more detail.  What do I mean by a problem, challenge, riddle, or question and where do these things come from?  Well, typically, these riddles come from us knowing a lot about some particular area of knowledge.  By becoming an expert in a specific area you become aware of not only what we know, but also of what we don’t know, and more importantly, what we need to know.  To get the point of being able to ask the right question requires a lot of time reading, going to conferences, and meeting with other experts.  That is why our graduate education took so long.  That is why we spend a lot of time with our noses stuck in books and journals.  We need to know.

Once we have a grasp of some issue, then comes the fun—and hard part—coming up with the question and or idea you want to test.  I think this is the fun part because it is often the most creative part of the job.  It is like solving a riddle or puzzle.  You have a bunch of disparate facts and you need to put them together in a new way.[3] This is one reason we are so often lost in thought.  It is hard to turn the thinking off and new ideas might come to us at any time and from any source.  One time I cooked up a program of research by having a discussion with my mother-in-law about the Ten Commandments. Inspiration can strike anywhere, any time.

The way we test our ideas is often tied to what kind of researcher we are.  That said, there are only so many options available.  The key to our choice of method is that anything we do should be transparent to others, replicable (i.e., we or someone else should be able to do what we did again and get the same results), and systematic so that other researchers can duplicate our efforts.  Our methods range from simple observation and documentation—this might result in a book or case study—to surveys where we see if two things go together (e.g., age and maturity)—to experiments where we obsessively control all extraneous variables so that we can get an idea of something causes something else (e.g., does increasing empathy for others increase cooperation?).  Often our choices are determined by our question—for example, personality change is hard to study using experiments.  Changing someone’s mood is relatively easy and easily tested with an experiment.  Some of our choices are determined by technology—we can now take pictures of the brain in action, something not available to researchers in previous generations.  We like technology, especially if it is new.[4]

Assembling our tools ranges from the simple—like going to a library—to the complex, like ordering a great big atom smasher or a satellite to be delivered to high earth orbit.[5]  Many of us will populate a lab space with necessary equipment. Some of us will work solo on a computer in our office.  Usually, we work in teams of 2 or more people and we sometimes work with researchers from other universities.  Graduate students are often part of the team, but it can also include undergraduates, post docs, and administrative staff.  Remember when I said we spend a lot of time teaching students one-on-one?  This is a good example. For some of us, our labs become very much like a small business.  Of course, to get to that stage, we typically need grant money, but that’s a topic for a different letter.

After we come up with our idea and assemble our team to work on it, we run the study.  This can be as simple as borrowing other people’s data—economists seem to do this a lot—or more likely, we’ll run the study with people or animals, or things, in our labs.  Sometimes this goes fast.  Running a small experiment with undergraduate students can take as little as a few days.  Sometimes this goes slow.  I’ve been running several longitudinal studies now for 10 years.  I may never stop.  Some researchers become famous because at this stage they are either very creative in how they test their ideas or ingenious in how they develop their techniques.  This type of technical skill is an underappreciated aspect of the job.

After we’ve collected our data, we analyze it.  This is where that dreaded concept of statistics rears its ugly head.  To be honest, some of us get really excited at this stage.  Okay, to be really honest, I get excited at this stage.  Call me a nerd.  I’m okay with that.  This is also where we lose our audience.  You’ve probably heard us invoke statisticalese in describing our work or some other finding.  It has a universal effect on the neurobiology of human brains—it puts 99% of them to sleep[6].  Again, please accept our apologies. If we start down this path, ask us to explain it in language that normal people can understand.

Finally, we write.  This stage would be great for us, and others, if we could write like normal writers, but we can’t.  We have to write for an academic audience.  This means that most of the rhetorical techniques used by creative writers to keep readers engaged are off limits.  We must choose our words carefully, be painfully consistent with those words, and hedge most everything we say.  This doesn’t mean it is bad writing, just typically not that exciting—closer to a user’s manual than pulp fiction- and it’s full of arcane terms that only people in our field are likely to understand.

Of course, once we’ve written our research article we need to submit it to a scientific journal and have it reviewed.  This step is what makes our work different from magazine writers, pundits, or reporters.  We can’t just spout off.  Our ideas need to be vetted by other knowledgeable researchers.  More often than not, our papers get rejected, or at best rejected with an invitation to make revisions along the lines of the criticisms laid out by the reviewers.  In other words, you have anonymous people ripping your precious ideas, hard work, and painful writing to shreds.  It hurts.  You will see us at our most depressed following rejections of our work.  Considering the fact that a typical research project can often take upwards of three years from inspiration to rejection, a little depression is warranted.  Eventually, some of our work gets published. Then, hopefully, somebody uses it, somehow.

Now, multiply this process several times over and you get an idea of our research lives.  Most of us work on several projects simultaneously.  It keeps us busy and off the streets at night.  For that matter, it keeps us off the streets during the day too—thus the pasty complexion.

So, that’s research.  Sorry to be long winded.  That’s one reason we don’t elaborate on our answer to your questions concerning our summertime activities.  Your eyes would be glazed over before we got to the second paragraph.  Keep your questions coming, though. Right now, I gotta go do some research.


[1] I’ve been doing this professor thing for 17 years now.

[2] Our employers, Universities, typically pay us on a 9-month contract.  They like research because it makes our institutions famous.  The more famous the institution, the more likely students will come and the more likely granting agencies will give us money.   Teaching and service are important, but research brings in the dough.

[3] I often find ideas come to me in the bathroom, which has led to the “proximity to porcelain” hypothesis.  Being in contact or near porcelain acts like a catalyst for new ideas.  Or, alternatively, it is the only time you are left alone for long enough to think.

[4] The ideas described here would offend our colleagues who consider themselves “post modern” or “deconstructivists”.  They don’t believe in replicable knowledge or anything roughly thought of as the scientific method.  We mostly humor them as they eat their own departments and fields from the inside out and then we take their faculty lines for our own.

[5] No kidding.  Some physicists at Berkeley did this while I was there.

[6] Another fMRI study begging to be done….