08 October 2018

Science corrections - replication is much more important than retraction

(The original article appears at The Conversation under the title "Retraction of a journal article doesn't make its findings false.").
The American Medical Association recently retracted six papers co-authored by food consumption and psychology researcher, Brian Wansink, in three of its journals. These studies include two showing that large bowl sizes encourage us to eat more, and that shopping when hungry leads us to buy more calorie-dense foods.
A prolific academic researcher, Wansink has provided many thought-provoking ideas about the psychology of food consumption through more than 500 publications which have been collectively cited more than 25,000 times.
His research has shown that people will eat a lot more from a bottomless soup bowl; they will eat more from larger portions, even if it is stale popcorn or food served in a dark restaurant; and they will eat less if a portion is made to appear larger using visual illusions.
Retractions are a permanent means by which journals endeavour to preserve the integrity of scientific literature. They are typically issued for some form of misconduct, but it does not necessarily mean the results are false.

Are retracted studies false?

A number of challenges have been made against more than 50 of Wansink’s publications. At present, 15 corrections have been published and 13 retractions have been made.
The retractions follow a range of allegations of misconduct including auto-plagiarism (copying your own work), data mismanagement and data manipulation. But none of this means Wansink’s results are entirely discredited.
The American Medical Association made its retractions based on Cornell University (Wansink’s employer) being unable to provide an independent evaluation in response to an Expression of Concern regarding Wansink’s studies issued in May 2018.
The absence of evidence does not prove his results are false.
Science relies far more on whether results are repeatable. And many of Wansink’s results – including some which have been retracted – have been replicated.
Two of the most recently retracted studies showing that adults and children eat more from larger bowls form a part of a larger literature as reflected in their having been cited nearly 300 times and 40 times respectively.

The bigger the plate size, the more people will eat (if they serve themselves). NeONBRAND/Unsplash

Importantly, multiple reviews of the scientific literature reveal that others have replicated the findings of Wansink and colleagues on how the plate or bowl size affects consumption.
In a meta-analysis I authored with others, the combined studies in this area show that doubling the plate size increases consumption by 40% on average if people are serving food onto the plate themselves. However, there is no effect of plate size on consumption if a fixed or constant amount of food is served on the plate. (Disclosure: this meta-analysis was published in a journal issue for which Wansink was one of the editors.) 

The evidence that retraction does not necessarily falsify a result is shown in the meta-analysis reported above showing that doubling the plate-size to people who are self-serving reduces consumption by 40%. In fairness, this meta-analysis contains one of Wansink's retracted papers. A re-analysis excluding this one study sees the self-serving plate-size effect (reported in the previous paragraph) reduce from 41% to 39%. A more conservative approach would be to exclude all papers in which Wansink was an author. In this case, the self-serving plate-size effect reduces to 38%.

Replication is more important than retraction
The problem of reproducing findings in science (called replication) is a much bigger issue than retractions. Retractions attract attention, but are relatively minor; replication does not attract attention, and is critically important.
The replication crisis facing social sciences, health and medicine suggests that 50% or more of published findings may not be repeatable.
For instance, in social science, a large team of researchers replicated 100 studies published in three high-ranking journals. The results showed only 36% of the replications found statistically significant results, and the average size of the observed effects was half of that seen in the original studies.
The high rate of replication failure arises, in part, from the arcane statistical approach used for analysing research data. In essence, researchers seek statistically significant findings. Statistical significance is typically defined as when the probability (p-value) of the observed data assuming there was no effect is less than 5%.
The statistical approach has a convoluted logic which many students and even many academics, misunderstand. In simple terms, if 100 studies are conducted in which there is no effect, about five of them will produce "statistically significant" results. Even though there is no effect. That is, if I test a shark detection system in a fresh-water swimming pool, and it goes off less than five times in a 100 trials, it has passed the standard of statistics and we can safely conclude that it works - which is clearly absurd.
One of the simple ways in which people misunderstand this analysis is by concluding that only about 5% of published studies will be false. But this is wrong, and why few seem to genuinely understand that over 50% of published studies are false - as has been suggested for both medical and psychological research.
Journals and academics wish to publish novel, statistically significant results. They tend to ignore studies with null results, and put them in a file-drawer.

Replications that are successful add nothing new, and replications that fail (not statistically significant) are uninteresting to publishers albeit critically important to science.
A related problem is that academics may dredge through data and cherry-pick statistically significant results, a practice called p-hacking.
The misconduct of journals and academics through their obsessive focus on statistically significant findings is reflected by the replication crisis and the prevalence of p-hacking.  The implications are that many, even most published studies are probably false.
If Wansink differs from others, it is in his disarming openness in a 2016 blog post admitting to data dredging. This post attracted intensive scrutiny from his peers and began a thorough examination of much of his research.

To put this into context, Wansink has published more than 500 articles. If 250 of them prove to be false in the sense that the results cannot be replicated, then he is on par with social and medical science in general.
The retraction of thirteen of Wansink’s articles - some of which have been replicated by others - is a blip that is receiving much more attention than it deserves.

Science ought to be interested in what it true, rather than baying about the missteps of its practitioners. If science were to follow the principle of 'wrong in one thing, wrong in everything' (fallus in uno, fallus in omnibus), science would be a very thin volume and even then, likely to reflect only the findings of those whose errors have not been detected.
Science makes mistakes and missteps. The advances are achieved through new ideas and repeated testing.
Retractions may be important signals of reduced confidence in a finding, but they do not prove a finding false. This requires replication.
Science doesn’t provide certainty. The good scientist is one who embraces uncertainty. 
Claims of absolute certainty made by authoritative figures are probably false (c.f. Clarke's first law).
Tim van der Zee, one of Wansink’s lead detractors states on his website “I am wrong most of the time.” 
The challenge for scientists is to believe this. Few are prepared to accept that 50% or more of their published findings might be false. 

1 comment:

  1. Agree with the author. However, it is pretty obvious that large percent of quantity researches operate with the false or inaccurate data. This is simply the reason meta-analysis exists.