Changing software to nudge researchers toward better data analysis practice

The tools we have available to us affect the way we interact with and even think about the world. “If all you have is a hammer” etc. Along these lines, I’ve been wondering what would happen if the makers of data analysis software like SPSS, SAS, etc. changed some of the defaults and options. Sort of in the spirit of Nudge — don’t necessarily change the list of what is ultimately possible to do, but make changes to make some things easier and other things harder (like via defaults and options).

Would people think about their data differently? Here’s my list of how I might change regression procedures, and what I think these changes might do:

1. Let users write common transformations of variables directly into the syntax. Things like centering, z-scoring, log-transforming, multiplying variables into interactions, etc. This is already part of some packages (it’s easy to do in R), but not others. In particular, running interactions in SPSS is a huge royal pain. For example, to do a simple 2-way interaction with centered variables, you have to write all this crap *and* cycle back and forth between the code and the output along the way:

desc x1 x2.
* Run just the above, then look at the output and see what the means are, then edit the code below.
compute x1_c = x1 - [whatever the mean was].
compute x2_c = x2 - [whatever the mean was].
compute x1x2 = x1_c*x2_c.
regression /dependent y /enter x1_c x2_c x1x2.

Why shouldn’t we be able to do it all in one line like this?

regression /dependent y /enter center(x1) center(x2) center(x1)*center(x2).

The nudge: If it were easy to write everything into a single command, maybe more people would look at interactions more often. And maybe they’d stop doing median splits and then jamming everything into an ANOVA!

2. By default, the output shows you parameter estimates and confidence intervals.

3. Either by default or with an easy-to-implement option, you can get a variety of standardized effect size estimates with their confidence intervals. And let’s not make variance-explained metrics (like R^2 or eta^2) the defaults.

The nudge: #2 and #3 are both designed to focus people on point and interval estimation, rather than NHST.

This next one is a little more radical:

4. By default the output does not show you inferential t-tests and p-values — you have to ask for them through an option. And when you ask for them, you have to state what the null hypotheses are! So if you want to test the null that some parameter equals zero (as 99.9% of research in social science does), hey, go for it — but it has to be an active request, not a passive default. And if you want to test a null hypothesis that some parameter is some nonzero value, it would be easy to do that too.

The nudge. In the way a lot of statistics is taught in psychology, NHST is the main event and effect estimation is an afterthought. This would turn it around. And by making users specify a null hypothesis, it might spur us to pause and think about how and why we are doing so, rather than just mining for asterisks to put in tables. Heck, I bet some nontrivial number of psychology researchers don’t even know that the null hypothesis doesn’t have to be the nil hypothesis. (I still remember the “aha” feeling the first time I learned that you could do that — well along into graduate school, in an elective statistics class.) If we want researchers to move toward point or range predictions with strong hypothesis testing, we should make it easier to do.

All of these things are possible to do in most or all software packages. But as my SPSS example under #1 shows, they’re not necessarily easy to implement in a user-friendly way. Even R doesn’t do all of these things in the standard lm function. As a result, they probably don’t get done as much as they could or should.

Any other nudges you’d make?

7 thoughts on “Changing software to nudge researchers toward better data analysis practice”

I like this idea a lot! I’d also recommend:
By default, all graphs & diagrams are ‘best practice’ e.g. all axes start at 0; it displays confidence intervals; it displays individual data points overlaid over the summary statistics (i.e. it shows a bar graph for the means, but you also get the raw data).

Scatter plots, it draws the regression line, displays the r^2 (not the r!) and p value of the correlation…

After you hit ‘Run’ but before you get the output, a window pops up telling you how much power you have to detect a small, medium, and large effect and asks “are you sure you don’t want to collect more data first?”.

This is an excellent idea. Seems like it is a R package waiting to be written (or a collection of packages waiting to be put together) and used as the default. My program teaches the intro stats sequence in R and I’m guessing others do too, so if this could be implemented into graduate programs as the default, I think it would be effective for changing the culture.

Also, I second NeuroSkeptic here, graphical best practices should be included as the default too. I’d also include graphs to check assumptions, like a graph checking heteroscedasticity.

What about an additional nudge (maybe doesn’t qualify as nudge, just started the book) of journals requiring graphs of assumptions to be submitted with the paper?

Hi Sanjay, I’m curious what your complaint with variance-explained metrics is.

As for nudges, for me this is an easy one. For regressions with categorical predictors (i.e., ANOVA), remove the “default” contrasts–which are very often not “contrasts” in the sum-to-zero sense at all, but rather dummies–and instead force users to think about and explicitly specify what coding scheme they wish to use for each of their factors. Do you want dummy codes? Okay, which group should be the reference? Contrast codes? Okay, polynomial or Helmert or what? By far the most common mistake that I see people make in interpreting ANOVA output is that they accepted the default contrasts without giving any thought whatever to whether they make sense *or even to what they are* (!), and then they unquestioningly interpret the resulting tests of the parameters as if they all represent straightforward main/simple effects and interactions. Maybe they are (if you got lucky), but very often, unless you thought about your codes, they are not.

One more smaller thought. Can we replace all of those t- and F-tests with bootstrapped confidence intervals already?

Sanjay Srivastava says:

November 21, 2012 at 5:05 pm

If you follow the link you’ll get a sense of some of the complaints (and the first comment on that page has a couple of good references). “Variance explained” is a statistical abstraction that people have no context to interpret. And lacking interpretive context, people draw on their intuitions about bare numbers, and percentages of variance almost always sound small; or they say dumb things like “you’re only explaining 10% of the phenomenon you’re studying.” (“Variance” != “the phenomenon.”) For example, in the linked example a 20% increase in survival rate (from 50% to 60%) corresponds to 1% “variance explained.”

Loading...
Evelyn says:

December 2, 2012 at 9:27 pm

Jake: There’s a good package for bootstrapped confidence intervals in R, called BEST.

http://www.indiana.edu/~kruschke/BEST/

Loading...

“And maybe they’d stop doing median splits and then jamming everything into an ANOVA!”

You mean people still DO that?..

Comments are closed.