Error variance and humility

I often hear researchers criticize each other for treating important phenomena as error variance. For example, situationist social psychologists criticize trait researchers for treating situations as error variance, and vice versa. (And us interactionists get peeved at both.) The implication is that if you treat something as error variance, you are dismissing it as unimportant. And that’s often how the term is used. For example, during discussions of randomized experiments, students who are learning how experiments work will often wonder whether pre-existing individual differences could have affected the outcomes. A typical response is, “Oh, that couldn’t have driven the effects because of randomization. If there are any individual differences, they go into the error variance.” And therefore they get excluded from the explanation of the phenomenon.

I think we’d all be better off if we remembered that the word “error” refers to an error of a model or theory. On the first day of my grad school regression course, Chick Judd wrote on the board: “DATA = MODEL + ERROR”. A short while later he wrote “ERROR = DATA – MODEL.” Error is data that your model cannot explain. Its existence is a sign of the incompleteness of your model. Its ubiquity should be a constant reminder to all scientists to stay humble and open-minded.

When you have an interaction, which variable moderates which?

I was talking recently with a colleague about interpreting moderator effects, and the question came up: when you have a 2-way interaction between A and B, how do you decide whether to say that A moderates B versus B moderates A?

Mathematically, of course, A*B = B*A, so the underlying math is indifferent. I was schooled in the Baron and Kenny approach to moderation and mediation. I’ve never found any hard and fast rules in any of Kenny’s writing on the subject (if I’ve missed any, please let me know in the comments section). B&K talk about the moderator moderating the “focal” variable, and I’ve always taken that to be an interpretive choice by the researcher. If the researcher’s primary goal is to understand how A affects Y, and in the researcher’s mind B is some other interesting variable across which the A->Y relationship might vary, then B is the moderator. And vice versa. And to me, it’s entirely legitimate to talk about the same analysis in different ways — it’s a framing issue rather than a deep substantive issue.

However, my colleague has been trying to apply Kraemer et al.’s “MacArthur framework” and has been running into some problems. One of the MacArthur rules is that the variable you call the moderator (M) is the one that comes first, since (in their framework) the moderator always temporally precedes the treatment (T). But in my colleague’s study the ordering is not clear. (I believe that in my colleague’s study, the variables in question meet all of Kraemer’s other criteria for moderation — e.g., they’re uncorrelated — but they were measured at the same timepoint in a longitudinal study. Theoretically it’s not clear which one “would have” come first. Does it come down to which one came first in the questionnaire packet?)

I’ll admit that I’ve looked at Kraemer et al.’s writing on mediation/moderation a few times and it’s never quite resonated with me — they’re trying to make hard-and-fast rules for choosing between what, to me, seem like 2 legitimate alternative interpretations. (I also don’t really grok their argument that a significant interaction can sometimes be interpreted as mediation — unless it’s “mediated moderation” in Kenny-speak — but that’s a separate issue.) I’m curious how others deal with this issue…

Thinking hard

I’ve been enjoying William Cleveland’s The Elements of Graphing Data, a book I wish I’d discovered years ago. The following sentence jumped out at me:

No complete prescription can be designed to allow us to proceed mechanically and to relieve us of thinking hard. (p. 59)

The context was — well, it doesn’t matter what the context was. It’s a great encapsulation of what statistical teaching, mentoring, and consulting should be (teaching how to think hard) and cannot be (mechanical prescriptions).

Best. Poster. Ever.

In an exercise described as “rigorous mapping of ridiculous data,” Kansas State geography student Thomas Vought plotted the geographic distribution of the 7 deadly sins for a poster presented at the Association of American Geographers conference.

Many of the maps aren’t very kind to the Deep South. I was somewhat disappointed to see that my county is fairly nondescript — neither sinful nor virtuous — on 6 of 7 indices. But we are apparently quite the hotspot for envy.

The ridiculousness isn’t so much the data itself as the interpretations (which I’m sure Vought wasn’t entirely serious about). Lust, for example, is indexed by STDs per capita. That doesn’t necessarily mean that you’re having more sex with more partners — just that you’re not being very careful about it.

My region’s supposed sin of choice, envy, is indexed by thefts (burglary, robbery, etc.). I doubt that most of those crimes are really about envy. My bike was stolen last fall, but odds are the thief wasn’t coveting the bike itself. They probably just fenced it for some meth.

The conference location, Las Vegas, probably helped motivate Vought’s whimsical presentation. My main conference will be in Vegas next year. Maybe I should think about a followup?

The magazine curse?

Paul Krugman writes about Robert Rubin, Alan Greenspan, and Lawrence Summers, who appeared on the cover of Time in 1999:

Two… have since succumbed to the magazine cover curse, the plunge in reputation that so often follows lionization in the media.

Umm, hey Mr. Krugman… think this might just be regression to the mean? Sports Illustrated knows what I’m talking about. So does your fellow Nobel-in-economics laureate Daniel Kahneman.

Jared from Subway banished for extreme deviance

A new rule under consideration by the FTC (see also here) will require that ads with customer testimonials show typical results, not just best-case outcomes.

Of course, following best practices in data visualization would mean you should show the central tendency and the variability (in all directions). I’m not holding my breath for density plots on the nightly news, though. A single, typical exemplar would still be an improvement over a single, cherrypicked extreme.

However… Maybe I’m too jaded, but I wonder about unintended consequences. For example, will there be a flood of crappy research after this rule? If companies are required to depict “typical” results, they may churn out poorly designed studies to get the numbers they want, hoping to lend more credibility to bogus products. And if these studies are marketed as “scientific” and then easily (and publicly) disputed, that could feed into kneejerk cynicism in the public about science more broadly.

Consider that FDA clinical trials are one of the most highly regulated forms of research around, with numerous checks and balances designed to ensure integrity.  The system mostly works, but there are still serious concerns about conflicts of interest. How well is the FTC going to ensure the quality of research on consumer products, herbal supplements, diet plans, and the like? Will there be independent investigators, peer review, mandatory publication of negative results, etc.?