An interesting study of why unstructured interviews are so alluring

A while back I wrote about whether grad school admissions interviews are effective. Following up on that, Sam Gosling recently passed along an article by Dana, Dawes, and Peterson from the latest issue of Judgment and Decision Making:

Belief in the unstructured interview: The persistence of an illusion

Unstructured interviews are a ubiquitous tool for making screening decisions despite a vast literature suggesting that they have little validity. We sought to establish reasons why people might persist in the illusion that unstructured interviews are valid and what features about them actually lead to poor predictive accuracy. In three studies, we investigated the propensity for “sensemaking” – the ability for interviewers to make sense of virtually anything the interviewee says—and “dilution” – the tendency for available but non-diagnostic information to weaken the predictive value of quality information. In Study 1, participants predicted two fellow students’ semester GPAs from valid background information like prior GPA and, for one of them, an unstructured interview. In one condition, the interview was essentially nonsense in that the interviewee was actually answering questions using a random response system. Consistent with sensemaking, participants formed interview impressions just as confidently after getting random responses as they did after real responses. Consistent with dilution, interviews actually led participants to make worse predictions. Study 2 showed that watching a random interview, rather than personally conducting it, did little to mitigate sensemaking. Study 3 showed that participants believe unstructured interviews will help accuracy, so much so that they would rather have random interviews than no interview. People form confident impressions even interviews are defined to be invalid, like our random interview, and these impressions can interfere with the use of valid information. Our simple recommendation for those making screening decisions is not to use them.

It’s an interesting study. In my experience people’s beliefs in unstructured interviews are pretty powerful — hard to shake even when you show them empirical evidence.

I did have some comments on the design and analyses:

1. In Studies 1 and 2, each subject made a prediction about absolute GPA for 1 interviewee. So estimates of how good people are at predicting GPA from interviews are based on entirely between-subjects comparisons. It is very likely that a substantial chunk of the variance in predictions will be due to perceiver variance — differences between subjects in their implicit assumptions about how GPA is distributed. (E.g., Subject 1 might assume most GPAs range from 3 to 4, whereas Subject 2 assumes most GPAs range from 2.3 to 3.3. So even if they have the same subjective impression of the same target — “this person’s going to do great this term” — their numerical predictions might differ by a lot.) That perceiver variance would go into the denominator as noise variance in this study, lowering the interviewers’ predictive validity correlations.

Whether that’s a good thing or a bad thing depends on what situation you’re trying to generalize to. Perceiver variance would contribute to errors in judgment when each judge makes an absolute decision about a single target. On the other hand, in some cases perceivers make relative judgments about several targets, such as when an employer interviews several candidates and picks the best one. In that setting, perceiver variance would not matter, and a study with this design could underestimate accuracy.

2. Study 1 had 76 interviewers spread across 3 conditions (n = 25 or 26 per condition), and only 7 interviewees (each of whom was rated by multiple interviewers). Based on 73 degrees of freedom reported for the test of the “dilution” effect, it looks like they treated interviewer as the unit of analysis but did not account for the dependency in interviewees. Study 2 looked to have similar issues (though in Study 2 the dilution effect was not significant.)

3. I also had concerns about power and precision of the estimates. Any inferences about who makes better or worse predictions will depend a lot on variance among the 7 interviewees whose GPAs were being predicted (8 interviewees in study 2). I haven’t done a formal power analysis, but my intuition is that that’s pretty small. You can see a possible sign of this in one key difference between the studies. In Study 1, the correlation between the interviewees’ prior GPA and upcoming GPA was r = .65, but in Study 2 it was r = .37. That’s a pretty big difference between estimates of a quantity that should not be changing between studies.

So it’s an interesting study but not one that can give answers I’d call definitive. If that’s well understood by readers of the study, I’m okay with that. Maybe someone will use the interesting ideas in this paper as a springboard for a larger followup. Given the ubiquity of unstructured interviews, it’s something we need to know more about.

Everybody knows that grad school admission interviews don’t tell us anything useful, right? Right?

From time to time I have heard people in my field challenging the usefulness of interviews in grad student selection. It usually is delivered with the weary tone of the evidence-based curmudgeon. (I should note that as an admirer of Paul Meehl and a grand-advisee of Lew Goldberg, who once wrote an article called “Human Mind Versus Regression Equation” in which the regression equation wins, I am often disposed toward such curmudeonry myself.)

The argument usually goes something like this: “All the evidence from personnel selection studies says that interviews don’t predict anything. We are wasting people’s time and money by interviewing grad students, and we are possibly making our decisions worse by substituting bad information for good.”

I have been hearing more or less that same thing for years, starting when I was grad school myself. In fact, I have heard it often enough that, not being familiar with the literature myself, I accepted what people were saying at face value. But I finally got curious about what the literature actually says, so I looked it up.

And given what I’d come to believe over the years, I was a little surprised at what I found.

A little Google Scholaring for terms like “employment interviews” and “incremental validity” led me to a bunch of meta-analyses that concluded that in fact interviews can and do provide useful information above and beyond other valid sources of information (like cognitive ability tests, work sample tests, conscientiousness, etc.). One of the most heavily cited is a 1998 Psych Bulletin paper by Schmidt and Hunter (link is a pdf; it’s also discussed in this blog post). Another was this paper by Cortina et al, which makes finer distinctions among different kinds of interviews. The meta-analyses generally seem to agree that (a) interviews correlate with job performance assessments and other criterion measures, (b) interviews aren’t as strong predictors as cognitive ability, (c) but they do provide incremental (non-overlapping) information, and (d) in those meta-analyses that make distinctions between different kinds of interviews, structured interviews are better than unstructured interviews.

If you look at point “b” above and think that maybe interviews add too little variance to be worth the trouble, my response is: live by the coefficients, die by the coefficients. You’d also have to conclude that we shouldn’t be asking applicants to write about their background or interests in a personal statement, and we shouldn’t be obtaining letters of recommendation. According to Schmidt and Hunter (Table 1), biography, interests, and references all have weaker predictive power than structured interviews. (You might want to justify those things over interviews on a cost-benefit basis, though I’d suggest that they aren’t necessarily cheap either. A personal statement plus 3 reference letters adds up to a lot of person-hours of labor.)

A bigger problem is that if you are going to take an evidence-based approach, your evidence needs to be relevant. Graduate training shares some features with conventional employment, but they are certainly not the same. So I think it is fair to question how well personnel studies can generalize to doctoral admissions. For example, one justification for interviews that I’ve commonly heard is that Ph.D. programs require a lot of close mentoring and productive collaboration. Interviews might help the prospective advisor and advisee evaluate the potential for rapport and shared interests and goals. Even if an applicant is generally well qualified to earn a Ph.D., they might not be a good fit for a particular advisor/lab/program.

That, of course, is a testable question. So if you are an evidence-based curmudgeon, you should probably want some relevant data. I was not able to find any studies that specifically addressed the importance of rapport and interest-matching as predictors of later performance in a doctoral program. (Indeed, validity studies of graduate admissions are few and far between, and the ones I could find were mostly for medical school and MBA programs, which are very different from research-oriented Ph.D. programs.) It would be worth doing such studies, but not easy.

Anyway, if I’m misunderstanding the literature or missing important studies, I hope someone will tell me in the comments. (Personnel selection is not my wheelhouse, but since this is a blog I’m happy to plow forward anyway.) However, based on what I’ve been able to find in the literature, I’m certainly not ready to conclude that admissions interviews are useless.