Incomplete data and sloppy data analysis can come back to bite you later on. In one of the strangest examples, an academic study lingered in the popular media for nine years before its conclusions were finally challenged — by the family behind the classic American cookbook, “The Joy of Cooking.”
Using the small insurance settlement she’d received from her husband’s suicide, Irma S. Rombauer self-published the first edition of the wildly popular cookbook back in 1931. Her daughter Marion Rombauer Becker then revised every edition for the next 40 years. The Library of Congress lists it as one of 88 books that shaped America, noting that its various editions have sold nearly 18 million copies. In an introduction to the seventh edition — published in 1997 — senior editor Maria Guarnaschelli remembers that Rombauer’s grandson Ethan “guarded his birthright with rare courage and patience.” Ethan inherited the family’s love of cooking, attended Le Cordon Bleu in Paris, and “When Mom passed away in 1976, I was entrusted to oversee the tradition and future of ‘Joy,'” he writes in that book’s foreword.
Guarnaschelli’s introduction acknowledges America had “a growing realization that what we eat can make us feel better and live longer,” and indicates that the book had been carefully adapted, including a whole new chapter on nutrition with “sound and practical advice on healthful diets.” They’d also added recipes for people allergic to gluten, ingredient charts showing the amounts of calories, cholesterol, and fat, and “There are ideas for substitutions to lower fat in recipes and reduced-fat recipes in the baking sections.”
So it must’ve really hurt when a 2009 academic study from Cornell University tested 18 recipes which appeared in each edition of the book and concluded that “total caloric content increased for 14 out of the 18 recipes.” In a widely-cited letter publicizing their results, a pair of Cornell researchers reported that “The average calories per recipe increased by 43.7 percent from 2123.8 calories to 3051.9 calories. Also, the mean average calories per serving increased in 17 out of the 18 recipes by 37.4 percent from 268.1 calories to 436.9 calories.”
Sounds bad, right? The academics acknowledged that “the results of this study are largely descriptive in nature,” but insisted that “they can be used as a basis for recommendations regarding weight maintenance in the home.” John Becker, the great-grandson of Irma Rombauer, later told the New Yorker that “We assumed that he was probably correct and that the recipes probably had increased in calories per serving.” After all, this was an Ivy League research institution. “If we had wanted to impugn the reputation of a sitting Cornell department head, I think we would’ve found a really tough row to hoe.”
But nine years later, they learned that one of the study’s two co-authors was facing serious criticism over his research methodology. Brian Wansink was the former executive director of the USDA’s Center for Nutrition Policy and Promotion, and in 2016 had written a blog post praising a graduate student for being willing to continually re-analyze experimental data until a pattern finally emerged.
“But that’s not how science is supposed to work,” Buzzfeed’s Stephanie M. Lee wrote last month, arguing that apparently Wansink “was retroactively creating hypotheses to fit data patterns that emerged after an experiment was over.”
Buzzfeed cites a psychologist at the University of Virginia who’s an “outspoken reformer” and the co-founder of the Center for Open Science, which pushes for more reproducibility in research. “This is not science, it is storytelling,” he complained, later adding that “There was the explicit goal of ‘Let’s just get something out of the data, use the data as a device to find something, anything, that’s interesting.'”
And Wansink’s studies sometimes contained other serious errors too. One study had claimed students between the ages of 8 and 11 were more likely to choose an apple instead of cookies if there was a sticker of Elmo on the apple. But even after that paper was retracted and then re-submitted, it still contained a major error in its methodology, Wansink acknowledged to Buzzfeed News: “the children were actually 3 to 5 years old.” (A critic of Wansink’s had become suspicious when the revised study’s data included comments like “no snack, didn’t wake up” and “picked neither was feeling sick after nap.”)
Buzzfeed also reported that in January Wansink’s collaborator on the “Joy of Cooking” study was no longer teaching at the University of New Mexico, where he had been an associate marketing professor since 2008.
The Rombauer family decided it was time to re-examine that data.
(THREAD) Inspired by @stpehaniemlee ’s new piece, we have decided to share this. We have the dubious honor of being a victim of @BrianWansink and Collin R. Payne’s early work. pic.twitter.com/s4NUd1YpqC
— Joy of Cooking (@TheJoyofCooking) February 27, 2018
Tweet and Savory
In a modern twist, the great-grandson of Irma Rombauer now runs the “Joy of Cooking” Twitter feed, and last month weighed in online.
2/the basic, gaping troubles such an analysis glosses over can be summarized with one word: LEFTOVERS! When analyzing the caloric density and portion size of a big mac , it’s pretty easy to assume that the whole thing will be eaten in one sitting. Not so with recipes.
— Joy of Cooking (@TheJoyofCooking) February 27, 2018
Sleuthing around, he’d eventually discovered a 2009 article which listed the 18 recipes tested. In a series of Tweets, Becker reveals that by applying some “USDA-grade nutritional analysis software,” he’d duplicated the Cornell researchers’ experiment, and “the results they reported are incredibly different than what we found.
13/ Regardless of their agenda (raising awareness of over-eating, calorie intake), Wansink is a bad researcher, and the rote repetition of his work needs to stop. Bad science does not make good public policy, or contribute to our ongoing disc re: health, cooking, and consumption.
— Joy of Cooking (@TheJoyofCooking) February 27, 2018
On February 27th he announced on Twitter that “We have a wealth of data and scribblings on this, and will provide it to anyone who is interested.” Soon another data analyst stepped in, and the Cornell study’s results would face an attempt at replication from James Heathers, who describes himself on Twitter as a “data thug.”
Picture a feature-length underdog revenge montage, where the underdog is widely beloved cookbook @TheJoyofCooking, and the revenge weapon is SCIENCE
— Woman (@hels) March 21, 2018
Writing for the New Yorker, Helen Rosner calls Heathers “one of a platoon of swashbuckling statisticians who devote time outside of their regular work to re-analyzing too-good-to-be-true studies published by media-friendly researchers — and loudly calling public attention to any inaccuracies they find.”
Enjoy the food-yelling, mine beautifuls.https://t.co/2mGc10aMlC
— ?James Heathers? (@jamesheathers) March 23, 2018
Heathers discussed his results in an essay on Medium, pointing out that “At the end of the day, even with every last detail ready and waiting for us in the cookbooks, we still need to make an alarming amount of assumptions to populate things like the calorie content and density.”
Heathers agrees there’s an issue with the construction of the Cornell study. “To produce this analysis, serving sizes just… well, appeared. There is no other word for it. Serving size was not listed, nor implied for just over HALF the recipes.” Heathers told the New Yorker, “The problem is not that it was added up wrong. It’s that there’s no real way to add it up right.” His essay points out that serving size is crucial for the results you’ll get, since “a summer salad for four people has more calories than a quarter of a Mars Bar. ”
But there’s an even bigger error with the study’s results on gumbo. “Basically, the ’36 and ’06 recipes share a name and little else.” One is just a chicken stew, while the more modern version contains fatty Andouille smoked pork sausage, and is thickened with flour and oil. He later refers to it as the study’s “great gumbo disaster.”
And finally, while the original study reported a 44 percent increase in calories, Heathers only calculated a 21 percent increase for the same recipes — and even that’s a little misleading… Heathers concedes that “You can see some very substantial increases in overall calories, but only for a few recipes. One is chowder, which replaces water with milk (+86 percent increase), the other is chicken a la king, which is affected by the fact that cream is optional in the ’36 recipe (and therefore not included) but its equivalent is mandatory (and therefore included) in the ’06 recipe (+134 percent increase).
“The result in question hinges on this kind of slipperiness.”
Heathers also criticized the small sample size — just eighteen recipes — pointing out to the New Yorker that the goulash recipe had 134 percent more calories, while the rice pudding recipe had 30 percent less. “That’s not a reliable pattern!”
The Dangers of Data Dredging
But is this just the tip of the iceberg? Even Buzzfeed acknowledges that “Wansink’s practices are part of a troubling pattern of strategic data-crunching across the entire field of social science.”
The Undark site (published by MIT’s journalism program) cites a practice known as data dredging (also referred to as p-hacking) — and a 2015 study which found that the practice “is widespread throughout science.” Last October the New York Times Magazine explained that while there’s always leeway to exclude highly unusual results, add additional subjects, or exclude data because of experimental glitches, “more often than not, those decisions — always seemingly justified as a way of eliminating noise — conveniently strengthened the findings’ results.”
And though the scientific principle insists that all results should be verified by an independent team, “for the majority of social-psychology results, even the most influential ones, this hadn’t happened… There was no incentive to replicate, in any case: Journals were largely not interested in studies that had already been done, and failed replications made people (maybe even your adviser) uncomfortable.”
Undark points out that “While pillorying and shaming data masseurs may offer a sense of catharsis, it’s also a facile response to a far more complicated problem in which there appear to be plenty of blameworthy actors.”
And The New York Times Magazine reports there’s now a movement to reform social science research that began in 2011 — and that it’s taking a very emotional toll on scientists “forced to confront the fear that what they were doing all those years may not have been entirely scientific.”
“All of a sudden you have people emailing other people, asking for their data,” complains Eli Finkel, a social psychologist at Northwestern, “and then writing blog posts accusing them of shoddy practices.”
- Can AI revolutionize agriculture?
- Remembering 1996 and “the glory that was Yahoo.”
- Stephen Hawking’s dire predictions for the future of humanity.
- Much-mocked daredevil finally launches himself 1,875 feet on a home-made rocket.
- Walmart explores using price-checking robots and maybe even customer-assisting drones.
- 27 clever ways algorithms have outsmarted their creators.
- Most dextrous robot ever gets closer to human-like dexterity, though robots are not good at picking strawberries.
- Tim Berners-Lee urges web users to care about their data.