An interesting study of why unstructured interviews are so alluring

A while back I wrote about whether grad school admissions interviews are effective. Following up on that, Sam Gosling recently passed along an article by Dana, Dawes, and Peterson from the latest issue of Judgment and Decision Making:

Belief in the unstructured interview: The persistence of an illusion

Unstructured interviews are a ubiquitous tool for making screening decisions despite a vast literature suggesting that they have little validity. We sought to establish reasons why people might persist in the illusion that unstructured interviews are valid and what features about them actually lead to poor predictive accuracy. In three studies, we investigated the propensity for “sensemaking” – the ability for interviewers to make sense of virtually anything the interviewee says—and “dilution” – the tendency for available but non-diagnostic information to weaken the predictive value of quality information. In Study 1, participants predicted two fellow students’ semester GPAs from valid background information like prior GPA and, for one of them, an unstructured interview. In one condition, the interview was essentially nonsense in that the interviewee was actually answering questions using a random response system. Consistent with sensemaking, participants formed interview impressions just as confidently after getting random responses as they did after real responses. Consistent with dilution, interviews actually led participants to make worse predictions. Study 2 showed that watching a random interview, rather than personally conducting it, did little to mitigate sensemaking. Study 3 showed that participants believe unstructured interviews will help accuracy, so much so that they would rather have random interviews than no interview. People form confident impressions even interviews are defined to be invalid, like our random interview, and these impressions can interfere with the use of valid information. Our simple recommendation for those making screening decisions is not to use them.

It’s an interesting study. In my experience people’s beliefs in unstructured interviews are pretty powerful — hard to shake even when you show them empirical evidence.

I did have some comments on the design and analyses:

1. In Studies 1 and 2, each subject made a prediction about absolute GPA for 1 interviewee. So estimates of how good people are at predicting GPA from interviews are based on entirely between-subjects comparisons. It is very likely that a substantial chunk of the variance in predictions will be due to perceiver variance — differences between subjects in their implicit assumptions about how GPA is distributed. (E.g., Subject 1 might assume most GPAs range from 3 to 4, whereas Subject 2 assumes most GPAs range from 2.3 to 3.3. So even if they have the same subjective impression of the same target — “this person’s going to do great this term” — their numerical predictions might differ by a lot.) That perceiver variance would go into the denominator as noise variance in this study, lowering the interviewers’ predictive validity correlations.

Whether that’s a good thing or a bad thing depends on what situation you’re trying to generalize to. Perceiver variance would contribute to errors in judgment when each judge makes an absolute decision about a single target. On the other hand, in some cases perceivers make relative judgments about several targets, such as when an employer interviews several candidates and picks the best one. In that setting, perceiver variance would not matter, and a study with this design could underestimate accuracy.

2. Study 1 had 76 interviewers spread across 3 conditions (n = 25 or 26 per condition), and only 7 interviewees (each of whom was rated by multiple interviewers). Based on 73 degrees of freedom reported for the test of the “dilution” effect, it looks like they treated interviewer as the unit of analysis but did not account for the dependency in interviewees. Study 2 looked to have similar issues (though in Study 2 the dilution effect was not significant.)

3. I also had concerns about power and precision of the estimates. Any inferences about who makes better or worse predictions will depend a lot on variance among the 7 interviewees whose GPAs were being predicted (8 interviewees in study 2). I haven’t done a formal power analysis, but my intuition is that that’s pretty small. You can see a possible sign of this in one key difference between the studies. In Study 1, the correlation between the interviewees’ prior GPA and upcoming GPA was r = .65, but in Study 2 it was r = .37. That’s a pretty big difference between estimates of a quantity that should not be changing between studies.

So it’s an interesting study but not one that can give answers I’d call definitive. If that’s well understood by readers of the study, I’m okay with that. Maybe someone will use the interesting ideas in this paper as a springboard for a larger followup. Given the ubiquity of unstructured interviews, it’s something we need to know more about.

Perceiver effects in interpersonal perception

Hot off the presses is a paper I wrote with Steve Guglielmo and Jenni Beer on perceiver effects in the Social Relations Model. Here’s the abstract:

In interpersonal perception, “perceiver effects” are tendencies of perceivers to see other people in a particular way. Two studies of naturalistic interactions examined perceiver effects for personality traits: seeing a typical other as sympathetic or quarrelsome, responsible or careless, and so forth. Several basic questions were addressed. First, are perceiver effects organized as a global evaluative halo, or do perceptions of different traits vary in distinct ways? Second, does assumed similarity (as evidenced by self-perceiver correlations) reflect broad evaluative consistency or trait-specific content? Third, are perceiver effects a manifestation of stable beliefs about the generalized other, or do they form in specific contexts as group-specific stereotypes? Findings indicated that perceiver effects were better described by a differentiated, multidimensional structure with both trait-specific content and a higher order global evaluation factor. Assumed similarity was at least partially attributable to trait-specific content, not just to broad evaluative similarity between self and others. Perceiver effects were correlated with gender and attachment style, but in newly formed groups, they became more stable over time, suggesting that they grew dynamically as group stereotypes. Implications for the interpretation of perceiver effects and for research on personality assessment and psychopathology are discussed.

A couple of quick comments to add:

  • This is an example of using the Big Five / Five-Factor Model not as a model of personality per se, but as a model of social perception. I very briefly mention this potential use of the Big Five in my guide to measuring the Big Five, and I’m currently working on a manuscript expanding on this idea. (BTW, I’m certainly not the first person to think of the Big Five in this way. I’m trying to carry this idea forward a bit, but it’s one of those cases where I oscillate between thinking what I’m saying about it is radically new and thinking ho-hum-we-already-thought-of-that.)
  • While we were working on this manuscript, I became aware that a group led by Dustin Wood was looking at very similar issues (but with some interesting differences in approach and areas of non-overlap). They’ve got a paper in press at JPSP.

If you want to read more you can download the PDF:

Srivastava, S., Guglielmo, S., & Beer, J. S. (2010). Perceiving others’ personalities: Examining the dimensionality, assumed similarity to the self, and stability of perceiver effects. Journal of Personality and Social Psychology, 98, 520-534.