The precisely fuzzy science of gaydar

Recently Dan Savage stirred up controversy by suggesting on his podcast that Marcus Bachmann, husband of Michele Bachmann and a therapist who tries to turn gays and lesbians into heterosexuals, is secretly gay. Since reparative therapy does not work and causes harm, Savage has ample grounds to criticize him for his therapeutic practice. However, Savage went beyond calling him a bad therapist and suggested that his voice, appearance, and mannerisms mark him as gay. In support, Savage cited research on “gaydar.” Here’s what he said:

People used to talk about gaydar and debate whether it was real or existed or not. Now there’s been all sorts of tests that actually people have really good gaydar, and they can look at a picture or listen to a clip of someone’s voice… and with eerie accuracy nail their actual sexual orientation.

I was going to write up a post on gaydar research (known in the biz as interpersonal perception of sexual orientation), but William Saletan over at Slate did a nice job of summarizing the key findings. Let me instead add some commentary.

The statistical tests in the studies show that the accuracy rates are better than a random guess. That’s a real finding — it tells us that there’s some information about sexual orientation available in thin slices of appearance and/or behavior. But what about Savage’s claims that people are “really good” and have “eerie accuracy”? That’s an effect size question. Saletan sort of bungles this – he reports correlation coefficients of around .30 (erroneously capitalizing r), but not knowing what to do with them, he squares them to get variance explained. That makes it sound smallish (absent context, most people will think that 9% of something sounds small), but it’s sort of silly: if a reader doesn’t know what a correlation coefficient is, they won’t know what variance is either.

It turns out some of the articles do report accuracy rates as percentages. And even if you didn’t have those numbers, you could rough out the accuracy rates with a binomial effect size display. In a typical study, half of the targets are gay/lesbian and half are straight, so a purely random guesser (i.e., someone with no gaydar) would be around 50%. The reported accuracy rates in the articles, as well as the BESD conversion, say that people guess correctly about 65% of the time. Better than chance, but nowhere near perfect.

In fact, you can go a step further and get Bayesian on the problem. Let’s assume that the 65% accuracy rate is symmetric — that guessers are just as good at correctly identifying gays/lesbians as they are in identifying straight people. Let’s also assume that 5% of people are actually gay/lesbian. From those numbers, a quick calculation tells us that for a randomly-selected member of the population, if your gaydar says “GAY” there is a 9% chance that you are right. Eerily accurate? Not so much. If you rely too much on your gaydar, you are going to make a lot of dumb mistakes.

That calculation isn’t meant to be taken too seriously though, because it makes some other assumptions. For example, I’m assuming that the 65% accuracy rate in these controlled lab studies would apply to real-world guessing situations. But Saletan is spot-on in pointing out that all of the targets in the studies were out. (Depending on the study, stimuli were created from personal ads, from college undergraduates’ facebook profiles that listed sexual orientation, or grad students who were members of LGBT or public-service organizations.) An out, sexually active twentysomething probably wants others to know their orientation in a way that a middle-aged closeted person would not. How controllable are the signals that people send about their orientation? Researchers have been working on that question, by trying to isolate the various channels of information (voice, gesture, hairstyle, facial expressions and features, etc.). Looking at the studies I’d say the jury’s still out, but that at least some of the signals are indeed controllable.

Let me conclude by saying that these are common kinds of errors we make when jumping too quickly from controlled lab studies to real-world applications. As I said earlier, the finding that accuracy was significantly better than chance is a meaningful one. It tells us that there is information there. But the interpretation has to be a narrow and careful one, because the finding raises many more questions than it answers — questions about what the signals are, who is and isn’t sending them, under what circumstances they will and won’t be present, how and why they are being sent and received, and much more. It’s the start of an inquiry, not the end of one. And all too easy to overinterpret.