Who doesn’t love hijinks? Last week, science journalist John Bohannon brought the hijinks. He wrote on io9.com about how he joined a sting operation designed to reveal the lightning-quick path from bad science – about fad diets – to big headlines. Here’s the short version. They ran a short clinical trial on 16 volunteers, collected some data, and a statistician abused the data until a statistically significant result popped out. It was juicy: Eating chocolate can accelerate weight loss! Science says so!
The finding was sensational, counterintuitive, and flashy. They’d followed what Bohannon calls “recipe for false positives,” and found a zinger. In short order, the tricksters published their findings. They whipped up a press release and let the hijinks unfold. The first few lines of Bohannon’s story tell us that the subsequent reportage was an unequivocal triumph: “It made the front page of Bild, Europe’s largest daily newspaper, just beneath their update about the Germanwings crash. From there, it ricocheted around the internet and beyond, making news in more than 20 countries and half a dozen languages.” He told NPR: “My goal was to show that scientists who do a bad job and get their work published can end up making headlines because it’s us — journalists like you and me — who are failing.”
When I first clicked on the story, I didn’t read it very closely. The article’s thrust seemed obvious: Reporters need to be more thorough when reporting on health, etc, etc. I’d heard it before, it was good to remember. I skimmed the rest. But when I got to a section on “p-hacking,” I got kind of excited about the hoax. “P-hacking” refers to the p-value, a statistical tool usually reported as a measure of the strength of the evidence. The lower the p-value, as is usually reported, the stronger the evidence.
But the p-value can be manipulated! A researcher can lower the p-value by omitting some data. Or by looking at only selected subgroups. Or in many other ways. P-hacking is a big problem that makes it difficult for other researchers to replicate important findings. Last year, science writer Regina Nuzzo brought attention to the problem in a smart overview of the problem for Nature. In that article, Penn psychologist Uri Simonsohn, who’s largely credited with bringing us the term “p-hacking,” describes it as “trying multiple things until you get the desired result.” In other words, to paraphrase Henry Coates, torture the data long enough, and they will confess.
I initially read Bohannon’s article as a solid, shining demonstration of how p-hacking can be used intentionally to get desired results. I am a fact-checker and writer for a magazine devoted to cancer research, and it bothers me how much money and attention are lavished on small studies with small findings that, because of the limitations of p-value and tools like relative risk, nevertheless get hyped as major steps forward. (My wish for the future: Make relative risk verboten in every study about this or that cancer biomarker.) I’m in favor of shedding light on this problem: If more scientists are aware of p-hacking — and the limitations of the using p-values — then maybe the science will get better, right?
So there I was, happy to see Bohannon using this big chocolate science project to reveal the problem of p-hacking, and we can all move forward and things will get better. Alas. That’s not what the story was about. The goal was to show the shoddy state of nutrition journalism, not to bring attention to p-hacking.
Here’s where my excitement began to wane. I like to eavesdrop on smart conversations among smarter people, and during one online conversation about Bohannon’s article, the timbre quickly soured. Critics of the sting rightly pointed out that the press release went out to more than 5,000 outlets – but only a dozen or so picked up the story. (Shape, it seems, was the Big Fish caught in the trap.) Bohannon told the Washington Post that, in order to keep the ruse alive, he stopped taking calls from one reporter who was asking all the right questions. That action, as Rachel Ehrenberg points out in Science News, resonates with leaving out some data to obtain more convincing results.
The problematic aspect that really got to me, though, that finally got in my craw, was this: Those participants gave their blood. I imagine IRB requirements likely differ between Germany and the United States, but those participants were intentionally misled. (If this isn’t unethical, it at least seems ethically shady.) I went back and read the article more closely. This time, my thrill was gone. Or at least diminished – does thrill have a half-life? I think it was smart of Bohannon and his team to use legitimate methods to run their clinical trial and maintain some hope that the project will bring valuable attention to p-hacking. It may even help chip away at the primacy of the p-value as a way to measure evidence.
Ultimately, though, the whole experience gave me an unwelcome feeling of deja vu. In the fall of 2013, Bohannon led a project that purported to show the lack of peer review in some open-access journals. He cleverly concocted some papers, sent them out to open-access journals, and cried “gotcha!” when they were accepted for publication. But really, what it showed was that some predatory journals will happily take the money of researchers and publish rubbish, which we already knew. It had little to do with the open-access model.
I teach an undergraduate class on science communication, and in 2013 I brought that hoax to the attention of my students. Our class discussion followed this trajectory: Initial excitement, followed by a closer examination of methods, then a closer look at the experimental set up, then re-evaluating our initial excitement, and ending with disillusionment. What might have illuminated something interesting about the scientific process instead showcased the potential reach – and limitation – of deception. The ultimate question was this: What was the value of the experiment? (It’s like I ask researcher sources for cancer stories: What are the clinical implications?) Beyond the ultimate splash, will it make any difference? Maybe not as the architects intended: This blog post at the American Beverage Association, the trade association for non-alcoholic beverages, holds up Bohannon’s story as a reason why consumers shouldn’t trust nutrition reporting. (H/T Robin Mejia)
The chocolate hoax still hasn’t finished its journey. If you’re in Europe, tonight you can watch the documentary that resulted from the hijinks. And if you speak German, you’ll understand this bizarre teaser for the show. Genießen Sie die Show!
____________
Stephen Ornes writes about physics, math and cancer research from a shed in his backyard in Nashville. Visit him online @stephenornes or at stephenornes.com .
Photos: Chocolateface: FabCafe and Ks Design Lab by CC2.0; wasp: Jacopo Werther, CC BY-SA 3.0
Stephen, this is a great even-handed look at this story. Really nice work. I would only add one thing. While a lot of these issues are common knowledge to us science writers, the average reader has never thought about them before. What Bohannon seemed to be doing was reaching out to them and to do that, you often have to get a little creative. About half a dozen people forwarded the story to me, but not one of them was a science writer or scientists. On the whole and weighing both sides, his article feels like a win to me.
Great piece, Stephen. This reinforced my sense that one should be very suspicious of any work that’s drawn a conclusion before doing the grunt work. In this case, the conclusion was that the press is gullible, and Bohannon managed to fit the data to his thesis. But if his prior conclusion was that most of the press is pretty careful and only the bottom trawlers aren’t most of the time, he could have fit the data to that thesis too.