A while back I wrote about whether grad school admissions interviews are effective. Following up on that, Sam Gosling recently passed along an article by Dana, Dawes, and Peterson from the latest issue of Judgment and Decision Making:
Unstructured interviews are a ubiquitous tool for making screening decisions despite a vast literature suggesting that they have little validity. We sought to establish reasons why people might persist in the illusion that unstructured interviews are valid and what features about them actually lead to poor predictive accuracy. In three studies, we investigated the propensity for “sensemaking” – the ability for interviewers to make sense of virtually anything the interviewee says—and “dilution” – the tendency for available but non-diagnostic information to weaken the predictive value of quality information. In Study 1, participants predicted two fellow students’ semester GPAs from valid background information like prior GPA and, for one of them, an unstructured interview. In one condition, the interview was essentially nonsense in that the interviewee was actually answering questions using a random response system. Consistent with sensemaking, participants formed interview impressions just as confidently after getting random responses as they did after real responses. Consistent with dilution, interviews actually led participants to make worse predictions. Study 2 showed that watching a random interview, rather than personally conducting it, did little to mitigate sensemaking. Study 3 showed that participants believe unstructured interviews will help accuracy, so much so that they would rather have random interviews than no interview. People form confident impressions even interviews are defined to be invalid, like our random interview, and these impressions can interfere with the use of valid information. Our simple recommendation for those making screening decisions is not to use them.
It’s an interesting study. In my experience people’s beliefs in unstructured interviews are pretty powerful — hard to shake even when you show them empirical evidence.
I did have some comments on the design and analyses:
1. In Studies 1 and 2, each subject made a prediction about absolute GPA for 1 interviewee. So estimates of how good people are at predicting GPA from interviews are based on entirely between-subjects comparisons. It is very likely that a substantial chunk of the variance in predictions will be due to perceiver variance — differences between subjects in their implicit assumptions about how GPA is distributed. (E.g., Subject 1 might assume most GPAs range from 3 to 4, whereas Subject 2 assumes most GPAs range from 2.3 to 3.3. So even if they have the same subjective impression of the same target — “this person’s going to do great this term” — their numerical predictions might differ by a lot.) That perceiver variance would go into the denominator as noise variance in this study, lowering the interviewers’ predictive validity correlations.
Whether that’s a good thing or a bad thing depends on what situation you’re trying to generalize to. Perceiver variance would contribute to errors in judgment when each judge makes an absolute decision about a single target. On the other hand, in some cases perceivers make relative judgments about several targets, such as when an employer interviews several candidates and picks the best one. In that setting, perceiver variance would not matter, and a study with this design could underestimate accuracy.
2. Study 1 had 76 interviewers spread across 3 conditions (n = 25 or 26 per condition), and only 7 interviewees (each of whom was rated by multiple interviewers). Based on 73 degrees of freedom reported for the test of the “dilution” effect, it looks like they treated interviewer as the unit of analysis but did not account for the dependency in interviewees. Study 2 looked to have similar issues (though in Study 2 the dilution effect was not significant.)
3. I also had concerns about power and precision of the estimates. Any inferences about who makes better or worse predictions will depend a lot on variance among the 7 interviewees whose GPAs were being predicted (8 interviewees in study 2). I haven’t done a formal power analysis, but my intuition is that that’s pretty small. You can see a possible sign of this in one key difference between the studies. In Study 1, the correlation between the interviewees’ prior GPA and upcoming GPA was r = .65, but in Study 2 it was r = .37. That’s a pretty big difference between estimates of a quantity that should not be changing between studies.
So it’s an interesting study but not one that can give answers I’d call definitive. If that’s well understood by readers of the study, I’m okay with that. Maybe someone will use the interesting ideas in this paper as a springboard for a larger followup. Given the ubiquity of unstructured interviews, it’s something we need to know more about.