Taking aim at evolutionary psychology

Sharon Begley has a doozy of an article in Newsweek taking aim at evolutionary psychology. The article is a real mixed bag and is already starting to generate vigorous rebuttals.

As background, the term “evolutionary psychology” tends to confuse outsiders because it sounds like a catchall for any approach to psychology that incorporates evolutionary theory and principles. But that’s not how it’s used by insiders. Rather, evolutionary psychology (EP) refers to one specific way (of many) of thinking about evolution and human behavior. (This article by Eric Alden Smith contrasts EP with other evolutionary approaches.) EP can be differentiated from other evolutionary approaches on at least 3 different levels. There are the core scientific propositions, assumptions, and methods that EPs use. There are the particular topics and conclusions that EP has most commonly been associated with. And there is a layer of politics and extra-scientific discourse regarding how EP is discussed and interpreted by its proponents, its critics, and the media.

Begley makes clear that EP is not the only way of applying evolutionary principles to understanding human behavior. (In particular, she contrasts it with human behavioral ecology). Thus, hopefully most readers won’t take this as a ding on evolutionary theory broadly speaking. But unfortunately, she cherrypicks her examples and conflates the controversies at different levels — something that I suspect is going to drive the EP folks nuts.

At the core scientific level, one of the fundamental debates is over modularity versus flexibility. EP posits that the ancestral environment presented our forebears with specific adaptive problems that were repeated over multiple generations, and as a result we evolved specialized cognitive modules that help us solve those problems. Leda Cosmides’s work on cheater detection is an example of this — she has proposed that humans have specialized cognitive mechanisms for detecting when somebody isn’t holding up their obligations in a social exchange. Critics of EP argue that our ancestors faced a wide and unpredictable range of adaptive problems, and as a result our minds are more flexible — for example they say that we detect cheaters by applying a general capacity for reasoning, not through specialized cheater-detecting skills. This is an important, serious scientific debate with broad implications.

Begley discusses the modularity versus flexibility debate — and if her article stuck to the deep scientific issues, it could be a great piece of science journalism. But it is telling what topics and examples she uses to flesh out her arguments. Cosmides’s work on cheater detection would have been a great topic to focus on: Cosmides has found support across multiple methods and levels of analysis, and at the same time critics like David Buller have presented serious challenges. That could have made for a thoughtful but still dramatic presentation. But Begley never mentions cheater detection. Instead, she picks examples of proposed adaptations that (a) have icky overtones, like rape or the abuse of stepchildren; and (b) do not have widespread support even among EPs. (Daly and Wilson, the researchers who originally suggested that stepchild abuse might be an adaptation, no longer believe that the evidence supports that conclusion.) Begley wants to leave readers with the impression that EP claims are falling apart left and right because of fundamental flaws in the underlying principles (as opposed to narrower instances of particular arguments or evidence falling through). To make her case, she cherrypicks the weakest and most controversial claims. She never mentions less-controversial EP research on topics like decision-making, emotions, group dynamics, etc.

Probably the ugliest part of the article is the way that Begley worms ad hominem attacks into her treatment of the science, and then accuses EPs of changing topics when they defend themselves. A major point of Begley’s is that EP is used to justify horrific behavior like infidelity, rape, and child abuse. Maybe the findings are sometimes used that way — but in my experience that is almost never done by the scientists themselves, who are well aware of the difference between “is” and “ought.” (If Begley wants to call somebody out on committing the naturalistic fallacy, she should be taking aim at mass media, not science.) Begley also seems to play a rhetorical “I’m not touching you” baiting game. Introducing EP research on jealousy she writes, “Let’s not speculate on the motives that (mostly male) evolutionary psychologists might have in asserting that their wives are programmed to not really care if they sleep around…” Then amazingly a few paragraphs later she writes, “Evolutionary psychologists have moved the battle from science, where they are on shaky ground, to ideology, where bluster and name-calling can be quite successful.” Whahuh? Who’s moving what battle now?

The whole thing is really unfortunate, because evolutionary psychology deserves serious attention by serious science journalists (which Begley can sometimes be). David Buller’s critique a few years ago raised some provocative challenges and earned equally sharp rebuttals, and the back-and-forth continues to reverberate. That makes for a potentially gripping story. And EP claims frequently get breathless coverage and oversimplified interpretations in the mass media, so a nuanced and thoughtful treatment of the science (with maybe a little media criticism thrown in) would play a needed corrective role. I’m no EP partisan — I tend to take EP on a claim-by-claim basis, and I find the evidence for some EP conclusions to be compelling and others poorly supported. I just wish the public was getting a more informative and more scientifically grounded view of the facts and controversies.

The monkey game

Sometimes when I’m in a public place, I like to amuse myself by playing what I call the “monkey game.” I look at people going by and try to observe them as if I were a primatologist. I ignore what they’re saying (and even the very fact that their verbalizations have any meaning at all) and set aside any sophisticated human psychological concepts. Instead, I try to interpret their behavior by putting it into a few basic categories that an ethologist might use when observing any social species: dominance displays, grooming, alliance-building, territoriality, various mating-related behaviors (attracting, maintaining, guarding), etc.

It’s amazing how much behavior you can make sense of with this sort of reverse anthropomorphizing. Try it next time you’re at the mall.

How a sexist environment affects women in engineering

Women in traditionally male-dominated fields like math and engineering face the extra burden that their performance, beyond reflecting on them individually, might be taken as broader confirmation of stereotypes if they perform poorly. A newly published series of experiments by Christine Logel and colleagues tested the effects of such stereotype threat among engineering students.

Standardized observations showed that male engineering students who had previously expressed subtle sexist attitudes on a pretest were more likely, when talking with a female engineering student about work issues, to adopt a domineering posture and to display signs of sexual interest (such as noticeably looking at the woman’s body).

In the next 2 experiments, female engineering students were randomly assigned in one experiment to interact with males who had endorsed different levels of subtle sexism, and in a second experiment with an actor who randomly either displayed or did not display the domineering/sexual nonverbal behaviors. Women performed worse on an engineering test after interacting with the randomly-assigned sexist males (or males simulating sexists’ nonverbal behavior).

In another experiment, women’s poorer performance was shown to be limited to stereotype-related tests, not a broad cognitive deficit. In a final experiment, interacting with a domineering/sexually interested male caused women to have temporarily elevated concern about negative stereotypes, which they subsequent attempted to suppress (thought suppression being a well-known resource hog).

The results further support the idea that even subtle sexism can be toxic in workplace environments where women are traditionally targets of discrimination.

Statistics terms that would make good metal band names, Part 2

The Five-Factor Solution.

Statistics terms that would make good metal band names, Part 1

Variance decomposition.

Do review sheets help?

A lot of what I do as a college instructor draws upon the accumulated wisdom and practice of my profession, plus my personal experience. I accumulate ideas and strategies from mentors and colleagues, I read about pedagogy, I try to get a feel for what works and what doesn’t in my classes, and I ask my students what is working for them. That’s what I suspect that most of us do, and probably it works pretty well.

But as stats guru and blogger Andrew Gelman pointed out not too long ago, we don’t often formally test which of our practices work. Hopefully the accumulated wisdom is valid — but if you’re a social scientist, your training might make you want something stronger than that. In that spirit, recently I ran a few numbers on a pedagogical practice that I’ve always wondered about. Do review sheets help students prepare for tests?

Background

When I first started teaching undergrad courses, I did not make review sheets for my students. I didn’t think they were particularly useful. I decided that I would rather focus my time and energy on doing things for my students that I believed would actually help them learn.

Why didn’t I think a review sheet would be useful? There are 2 ways to make a review sheet for an exam. Method #1 involves listing the important topics, terms, concepts, etc. that students should study. The review sheet isn’t something you study on its own — it’s like a guide or checklist that tells you what to study. That seemed questionable to me. It’s essentially an outline of the lectures and textbook — pull out the headings, stick in the boldface terms, and voila! Review sheet. If anything, I thought, students are better off doing that themselves. (Many resources on study skills tell students to scan and outline before they start reading.) In fact, the first time I taught my big Intro course, I put the students into groups and had them make their own review sheets. Students were not enthusiastic about that.

Method #2 involves making a document that actually contains studyable information on its own. That makes sense in a course where there are a few critical nuggets of knowledge that everybody should know — like maybe some key formulas in a math class that students need to memorize. But that doesn’t really apply to most of the courses I teach, where students need to broadly understand the lectures and readings, make connections, apply concepts, etc. (As a result, this analysis doesn’t really apply to courses that use that kind of approach.)

So in my early days of teaching, I gave out no review sheets. But boy, did I get protests. My students really, really wanted a review sheet. So a couple years ago I finally started making list-of-topics review sheets and passing them out before exams. I got a lot of positive feedback — students told me that they really helped.

Generally speaking, I trust students to tell me what works for them. But in this case, I’ve held on to some nagging doubts. So recently I decided to collect a little data. It’s not a randomized experiment, but even some correlational data might be informative.

Method

In Blackboard, the course website management system we use at my school, you can turn on tracking for items that you post. Students have to be logged in to the Blackboard system to access the course website, and if you turn on tracking, it’ll tell you when (if ever) each student clicked on a particular item. So for my latest midterm, the second one of the term, I decided to turn on tracking for the review sheet so that I could find out who downloaded it. Then I linked that data to the test scores.

I posted the review sheet on a Monday, 1 week before the exam. The major distinction I drew was between people who downloaded the sheet and those who never did. But I also tracked when students downloaded it. There were optional review sessions on Thursday and Friday. Students were told that if they came to the review session, they should come prepared. (It was a Jeopardy-style quiz.) So I divided students into several subgroups: those who first downloaded the sheet early in the week (before the review sessions), those who downloaded it on Thursday or Friday, and those who waited until the weekend before they downloaded it. I have no record of who actually attended the review sessions.

A quick caveat: It is possible that a few students could’ve gotten the review sheet some other way, like by having a friend in the class print it for them. But it’s probably reasonable to assume that wasn’t widespread. More plausible is that some people might have downloaded the review sheet but never really used it, which I have no way of knowing about.

Results

Okay, so what did I find? First, out of N=327 students, 225 downloaded the review sheet at some point. Most of them (173) waited until the last minute and didn’t download it until the weekend before the exam. 17 downloaded it Thursday-Friday, and 35 downloaded it early in the week. So apparently most students thought the review sheet might help.

Did students who downloaded the review sheet do any better? Nope. Zip, zilch, nada. The correlation between getting the review sheet and exam scores was virtually nil, r = -.04, p = .42. Here’s a plot, further broken down into the subgroups:

Review Sheet 1

This correlational analysis has potential confounds. Students were not randomly assigned — they decided for themselves whether to download the review sheet. So those who downloaded it might have been systematically different from those who did not; and if they differed in some way that would affect their performance on the second midterm, that could’ve confounded the results. In particular, perhaps the students who were already doing well in the class didn’t bother to download the review sheet, but the students who were doing more poorly downloaded it, and the review sheet helped them close the gap. If that happened, you’d observe a zero correlation. (Psychometricians call this a suppressor effect.)

So to address that possibility, I ran a regression in which I controlled for scores on the first midterm. The simple correlation asks: did students who downloaded the review sheet do better than students who didn’t? The regression asks: did students who downloaded the review sheet do better than students who performed just as well on the first midterm but didn’t download the sheet? If there was a suppressor effect, controlling for prior performance should reveal the effect of the review sheet.

But that isn’t what happened. The two midterms were pretty strongly correlated, r = .63. But controlling for prior performance made no difference — the review sheet still had no effect. The standardized beta was .00, p = .90. Here’s a plot to illustrate the regression: this time, the y-axis is the residual (the difference between somebody’s actual score minus the score we would have expected them to get based on the first midterm):

Review Sheet 2Limitations

This was not a highly controlled study. As I mentioned earlier, I have no way of knowing whether students who downloaded the review sheet actually used it. I also don’t know who used a review sheet for the first midterm, the one that I controlled for. (I didn’t think to turn on tracking at the start of the term.) And there could be other factors I didn’t account for.

A better way to do this would be to run a true experiment. If I was going to do this right, I’d go into a class where the instructor isn’t planning to give out review sheets. Tell students that if they enroll in the experiment, they’ll be randomly assigned to get different materials to help them prepare for the test. Then you give a random half of them a review sheet and tell them to use it. For both ethical and practical reasons, you would probably want to tell everybody in advance that you’ll adjust scores so that if there is an effect, students who didn’t get the sheet (either because they were in the control group or because they chose not to participate) won’t be at a disadvantage. You’d have to be careful in what you tell them about the experiment to balance informed consent without creating demand characteristics. But it could probably be done.

Conclusions

In spite of these issues, I think this data is strongly suggestive. The most obvious confounding factor was prior performance, which I was able to control for. If some of the students who downloaded the review sheet didn’t use it, that would attenuate the difference, but it shouldn’t make it go away entirely. To me, the most plausible explanation left standing is that review sheets don’t make a difference.

If that’s true, why do students ask for review sheets and why do they think that they help? As a student, you only have a limited capacity to gauge what really makes a difference for you — because on any given test, you will never know how well you would have done if you had studied differently. (By “limited capacity,” I don’t mean that students are dumb — I mean that there’s a fundamental barrier.) So a lot of what students do is rely on feelings. Do I feel comfortable with the material? Do I feel like I know it? Do I feel ready for the exam? And I suspect that review sheets offer students an illusory feeling of control and mastery. “Okay, I’ve got this thing that’s gonna help me. I feel better already.” So students become convinced that they make a difference, and then they insist on them.

I also suspect, by the way, that lots of other things work that way. To date, I have steadfastly refused to give out my lecture slides before the lecture. Taking notes in your own words (not rote) requires you to be intellectually engaged with the material. Following along on a printout might feel more relaxed, but I doubt it’s better for learning. Maybe I’ll test that one next time…

Students, fellow teachers, and anybody else: I’d welcome your thoughts and feedback, both pro and con, in the comments section. Thanks!

Thinking hard

I’ve been enjoying William Cleveland’s The Elements of Graphing Data, a book I wish I’d discovered years ago. The following sentence jumped out at me:

No complete prescription can be designed to allow us to proceed mechanically and to relieve us of thinking hard. (p. 59)

The context was — well, it doesn’t matter what the context was. It’s a great encapsulation of what statistical teaching, mentoring, and consulting should be (teaching how to think hard) and cannot be (mechanical prescriptions).

Vaccines, autism, and the end of the world

What do vaccines and the end of the world have in common? To some activists, it might seem like the former is going to bring about the latter. To the rest of us, there may be a more subtle connection.

A new article in the NEJM examines the characteristics of families that refuse vaccination. Chris Mooney blogs about it at The Intersection, noting that the families that refuse vaccination tend to seek medical information from a tightly interconnected community of alternative healers and anti-vaccination advocates, rather than relying on the scientific or medical establishment.

Mooney also has a piece in Discover about why the vaccination-autism controversy persists. I was particularly struck by this passage:

Meanwhile, in the face of powerful evidence against two of its strongest initial hypotheses—concerning MMR and thimerosal—the vaccine skeptic movement is morphing before our eyes. Advocates have begun moving the goalposts, now claiming, for instance, that the childhood vaccination schedule hits kids with too many vaccines at once, overwhelming their immune systems. Jenny McCarthy wants to “green our vaccines,” pointing to many other alleged toxins that they contain. “I think it’s definitely a response to the science, which has consistently shown no correlation,” says David Gorski, a cancer surgeon funded by the National Institutes of Health who in his spare time blogs at Respectful Insolence, a top medical blog known for its provaccine stance. A hardening of antivaccine attitudes, mixed with the despair experienced by families living under the strain of autism, has heightened the debate—sometimes leading to blowback against scientific researchers.

What this immediately reminded me of was Leon Festinger’s book When Prophecy Fails.

Festinger was a social psychologist who developed the theory of cognitive dissonance. WPF is an account of how Festinger and his colleagues infiltrated a cult that had predicted that the world would end on a specific day (December 20, 1954). He wanted to see what would happen when the predicted day came and went without incident.

Ordinary logic would suggest that if your theory makes an incorrect prediction, you should go back and question the theory. But what happened instead was that the disconfirming evidence actually strengthened the cultists’ beliefs. They decided that God had temporarily spared the world because of their dedication, and they became even more committed to trying to spread their views before the revised apocalypse date came around.

Drawing on cognitive dissonance theory, Festinger explained why the group’s beliefs were strengthened. Broadening beyond just cults, he outlined five conditions that will lead people to intensify their beliefs in the face of disconfirming evidence:

  • The belief is deeply held
  • The believer has taken public actions that reflect his/her commitment and that cannot be undone
  • The belief leads to specific, falsifiable predictions about something that will happen
  • The specific, falsifiable predictions are disconfirmed by objective evidence
  • After the discomfirming evidence comes to light, the believer has social support from other believers

Under these conditions, Festinger argued, it is well near impossible to reverse the belief. You hold it too dearly, you’ve already committed yourself to it, and other people are telling you to hang in there. So instead, you try to figure out how you can twist or morph your belief in order to hold on to it.

Festinger’s account of these conditions and consequences does a strikingly good job of describing the arc of the vaccination-autism story. In the late 1990s, parents and activists became convinced that vaccines were causing autism. They developed two specific predictions. First, they proposed that the MMR vaccine triggered a release of toxins that had accumulated in the intestines; but the data failed to support this view and most of the scientists who first proposed it retracted the conclusion. Activists also argued that thimerosol (a preservative in vaccines that contains mercury) was responsible for the link; but when thimerosol was removed from most vaccines, autism rates didn’t go down. So now, as Mooney describes it, activists are once again “moving the goalposts.” They are now blaming other toxins or saying that vaccination schedules are at fault. The belief has morphed, but key elements of its original form are preserved (it’s something about vaccines), allowing them to feel like they haven’t been wrong.

Unfortunately, a broader casualty of this process is parents’ confidence in science and medicine. Festinger showed that disconfirming evidence didn’t just lead believers to believe more; it led them to proselytize more too. So now we face an increasingly outspoken group of sympathetic and passionate advocates telling parents not to believe scientists and doctors. And that undermined confidence has very real and dangerous consequences.

What should be done? Frankly, I’m not entirely sure. One problem, it seems to me, is how this debate is framed as “science” or “scientists” versus the anti-vaccine activists. Science is supposed to be the rules of the game and scientists the referees, not an opponent in it. But I’m not sure there’s any way around that. If one side doesn’t like where the rules lead to, who else are they going to blame?

Teaching is a social interaction

Howard Gardner suggests that the next big leap for teaching will be “personalized education,” in which people will learn from computers that adapt to their individual learning style:

Well-programmed computers—whether in the form of personal computers or hand-held devices—are becoming the vehicles of choice. They will offer many ways to master materials. Students (or their teachers, parents, or coaches) will choose the optimal ways of presenting the materials. Appropriate tools for assessment will be implemented. And best of all, computers are infinitely patient and flexible. If a promising approach does not work the first time, it can be repeated, and if it continues to fail, other options will be readily available.

My response to this is a big fat humbug. Gardner has put forward some interesting ideas about multiple intelligences and different learning styles. But the notion that computers will supplant human teachers strikes me as overreaching.

Teaching is, at its core, a social interaction between teacher and student. That is why MIT isn’t putting itself out of business by putting gobs of course materials online. Teachers do not create new information. (Or at least — if they’re at a university and also do research — not in their role as teachers.) And frankly, they don’t often package it into some novel format (“here is a bodily-kinesthetic presentation of Bayes’ Theorem”). What teachers do is convey information through a social interaction with their students. Perhaps some day we’ll know enough about how to turn computers into compelling social agents that can reproduce that experience. But until then, I’m not worried about technology supplanting human teachers.

Best. Poster. Ever.

In an exercise described as “rigorous mapping of ridiculous data,” Kansas State geography student Thomas Vought plotted the geographic distribution of the 7 deadly sins for a poster presented at the Association of American Geographers conference.

Many of the maps aren’t very kind to the Deep South. I was somewhat disappointed to see that my county is fairly nondescript — neither sinful nor virtuous — on 6 of 7 indices. But we are apparently quite the hotspot for envy.

The ridiculousness isn’t so much the data itself as the interpretations (which I’m sure Vought wasn’t entirely serious about). Lust, for example, is indexed by STDs per capita. That doesn’t necessarily mean that you’re having more sex with more partners — just that you’re not being very careful about it.

My region’s supposed sin of choice, envy, is indexed by thefts (burglary, robbery, etc.). I doubt that most of those crimes are really about envy. My bike was stolen last fall, but odds are the thief wasn’t coveting the bike itself. They probably just fenced it for some meth.

The conference location, Las Vegas, probably helped motivate Vought’s whimsical presentation. My main conference will be in Vegas next year. Maybe I should think about a followup?