Tuesday, October 11, 2011

Peer review: Is it all it's cracked up to be?

Sometimes, it seems as though every new day brings a new groundbreaking finding in the quest to understand autism. New genes discovered. New bits of brain found to be a different shape or size; to be over-activated, underactivated, or not properly connected. New tasks that people with autism are either exceptionally good or exceptionally bad at. New environmental factors linked to an ever so slightly increased chance of having an autistic kid.

With all this progress, it's a wonder we haven't, well, made more progress.

In truth, advances in scientific understanding are a little more gradual than the blizzard of press releases would have you believe. Science is also a fairly haphazard process. Less the sleek and indomitable machine of popular imagination, more like a drunkard trying to find the light switch. Sometimes, science gets it completely wrong and, like my best friend on a caravan holiday, ends up weeing in the oven. True story.

Keeping things as much as possible on the straight and narrow is the process of peer review. Before they can be published as journal articles, scientific papers must be vetted by other researchers in the field. These reviewers will report back to the editor of the journal who ultimately gives the thumbs up or the thumbs down. However, the fact that a paper has successfully run the peer review gauntlet is no guarantee that it's actually any good, that it's free from errors, or that the conclusions reached are justified.

Some journals, let's be frank, will publish any old rubbish. And even well-respected journals sometimes let things slip through the net.

An example I covered a while back. In 2009, the prestigious journal, Biological Psychiatry, published a paper claiming that people with autism have extraordinary visual acuity. To cut a long story short, it’s now clear that there were major problems with the astudy and, as several subsequent studies have shown, people with autism seem to have visual acuity that is distinctly uneaglelike - no better or worse than your average non-autistic person in the street. The reviewers didn’t spot the technical problem. Neither, I should confess, did I initially. But still a peer review #fail.

A more recent example, also coincidentally from Biological Psychiatry, involved a study trying to use MRI brain scans to diagnose autism. Remarkably, one of the measures derived from this process was able to predict the severity of communication difficulties – information that it hadn't been trained on. However, a quick look at Figure 3 in the paper showed a pretty glaring mistake.

Figure 3, Uddin et al, 2011, Biological Psychiatry

The severity scores for three of the autistic kids had been lost but, instead of excluding them from the analysis, the authors had accidentally included them, giving each person a score of zero, as if they didn’t have any communication difficulties. As it happens, the authors have confirmed that the result just about holds up when the analyses are done correctly, although the effect is somewhat diminished. [Update: A correction has now been published to this effect]. The point, nonetheless, is that the paper made it past the reviewers (and eight authors and an editor) without this problem being noticed.

Mistakes such as these are easy to make – and they’re not always so easy to spot. In the case of the brain scan paper, it was only because the actual data points were plotted in a figure in the paper that the error was even visible. Errors buried deep in the analyses may never be discovered. Peer review can't help.

Even when there are no technical problems and the statistical analysis is flawless, there's still no guarantee that the results mean what the authors think they mean. The conclusions drawn depend on assumptions about what the tests are actually measuring. If those assumptions are wrong then so too might be the conclusions.

A classic example, again from the autism literature. In 1991, Josef Perner and Sue Leekam reported a neat dissociation - kids with autism failed to comprehend that a person could believe something that wasn't true any more, but those same kids were perfectly able to understand that a photograph could show a scene that had since changed.

The conclusion at the time was that kids with autism must have a very specific problem with understanding other people's mental states, otherwise they would have found both tasks equally difficult. However, this story has gradually unravelled. As Perner and Leekam have latterly argued, the two tasks aren't really equivalent at all. In particular, a photograph isn't a false representation of the present (in the way that a false belief can be), but a true representation of the past. As such, the conclusions of a specific problem with mental states were not warranted.

In hindsight it all seems quite obvious. Indeed, it is really only with hindsight that we can see which ideas, studies, and methods were the ones worth following.

This, in essence, is why scientific progress is slow and haphazard. And while peer review does serve a function, at best it's a crude spam filter for weeding out those papers that are most obviously problematic. It isn't a stamp of scientific truth, because there is no such thing as scientific truth. We shouldn't be shocked or surprised when results don't hold up. Even good science can be wrong - and it frequently is.

What is absolutely critical is that any statement presented as being "science-based" is backed up with a clear report of how the data were collected and analysed. Then the whole world can see how you reached those conclusions; where the problems in your method, your analysis, your conclusions might lie.

Further reading:
The post that got me writing this:
And a response:


  1. Nice post, Jon! One quibble: I think you're really talking about traditional pre-publication peer review, and not peer review in general. The examples you give are cases where peer review failed because it was the job of just 2 or 3 people to catch every possible error in the paper, which isn't reasonable. In an open system where all reviews were public (and could be evaluated in turn), such problems would presumably be caught much more quickly and effectively--but the reviewers would still be scientific professionals for the most part.

  2. Absolutely. In my mind, post publication peer review is part of that process of hindsight.

    But I think pre-publication peer review is what most people understand by peer review. When people say "But has it been peer reviewed?" this implies a finite process. Once it's been peer reviewed then it must be OK. Only that isn't true.

    I worry that too many scientists argue against pseudoscience by playing the "peer review" card (that NYT Neuromarketing piece being a good example). It completely misses the point, not least because there is plenty of pseudoscience that *has* been peer reviewed.

  3. 'Independent replication' rather than 'peer review' should be the rallying cry. Of course there are problems with peer review (and also with replications). I hope peer review is better than a SPAM filter, or else I have wasted a lot of time as a reviewer. Should you take into account also the rejections that happened after peer review?

    You give two clear examples (eagle eye vision and MRI diagnosis) where the SPAM filter was set too low. Luckily some critics spotted the blatant blunders pretty soon.

    The careful theoretical analysis of the false photograph task by Perner and Leekam (2008) resulted in the insight that it is after all not a decisive control for a Theory of Mind task (papers published in 1980s and 90s). This is an example that should not come under the heading of failure of peer review. It is an important example of progress in science. It built on prior publications, critically examining them. And furthermore, it is not the last word either! We constantly learn from previous work, even flawed work.

  4. Uta. Thanks for commenting. As always, it’s very much appreciated.

    The central message I was trying to get across was that, just because something has been published in a peer reviewed journal doesn't make it "scientific fact". In large part, I was motivated by the piece in yesterday's Guardian by Sumner and colleagues, proposing that science journalists should always copy-check their articles with the scientists involved. Their argument, as I read it, was that published science is peer-reviewed so has already been critically evaluated, and a science journalist’s duty is therefore only as a mouthpiece to the scientists involved. I think this is a ridiculous argument, not least because it’s an incredibly naïve view of peer review. But also because it misses out your crucial point, that science is never completely settled. Even when the peer review process has done its job in terms of the study design and analysis, the conclusions reached and the direction that future research takes as a consequence depends on the underlying assumptions that the researchers hold to be true.

    That last point was the one I was trying to make with Leekam and Perner study. But I realise now that it is actually a really poor and confusing example, not least because it wasn’t Leslie and Perner who were arguing that false photographs task was a control task for false belief. They were, as I understand it, testing a prediction (not supported) that kids with autism should be bad at both tasks.

    I do, nevertheless, think that the false photograph saga is a nice illustration of how old data can be reinterpreted in the light of new data. Initially there was lots of evidence that kids with autism struggled with false beliefs, and then evidence that they had no problems with false photographs. Now there’s evidence that they do have difficulties with false signposts (for anyone else reading this, the idea is that a signpost pointing the wrong way truly is a false representation of the world). So any theory now has to address the pattern of performance across three tasks. False photos was an important step (even if the interpretation was a misstep), and, crucially, the data from those studies is still valid, even if previous conclusions drawn from them are not.

  5. Here's another paper on peer review and why most published research findings are false.


  6. Uta. I only answered half your comment before.

    So I do think that peer review often adds value beyond weeding out the really bad papers (although if you're determined to get something published you can always publish it *somewhere*). I can think of numerous examples where reviewers have given us helpful advice or suggested literature that we've missed out on.

    Our latest paper in Perception is a good example - initially, one reviewer raised a serious concern with one of the tasks and the paper was rejected; we thought about it and realised he or she was possibly right, reanalysed the data and it actually came out even more clearly in favour of our original conclusions; so we resubmitted it to Perception and it got in with a footnote. Point being, other readers may have come to the same conclusion as Reviewer 1 and dismissed the paper after publication. In a way, it's similar to the situation with the Uddin MRI paper, except that the potential problem was caught pre- rather than post-publication.

    But to my mind, the most useful function of peer review in my mind is to force the writer to clarify what actually happened in the study and how the data were analysed (both before actual submission and in response to reviews). So while I said that a post on a website fulfils the criteria for scientific publication as much as a Nature article, that depends on the webpost being clearly articulated so that the study is reproducible. Peer review forces that to happen (to a certain degree at least) and if we ever did move towards a model that didn't involve pre-publication peer review, it would have to somehow address that issue.

  7. Interesting post Jon. I want to digress slightly from your main point, and comment on Theory of Mind tests. Although at a conceptual level we know what we mean by ToM, when it comes to operationalising the construct, it's a different story.

    False belief tests can be presented in a variety of formats. This means that whether or not a child 'passes' a false-belief test will depend on how well the child can process information in the sensory modalities used by the test format. The classic Sally-Anne test can be presented purely verbally (as a spoken or written story), purely visually (as a cartoon, a series of photographs, a film, a model, or a real life enactment), audio-visually (the two modalities in any combination), or with tactile and other sensory input as well.

    One would expect children with auditory processing difficulties to perform less well on the spoken story version as compared to audio-visual versions and that differences between visual presentations would emerge in the presence of visual processing difficulties. Jill de Villiers found, for example, that children with hearing difficulties tend to pass even non-verbal false belief tests later than children with normal hearing, and suggests that performance on false belief tests depends on command of syntax.

    In other words, the problem might not be with ToM as such, but with its component skills. The type of conclusion drawn from ToM tests would depend on whether one saw ToM as a hard-wired function or as an emergent property of information processing.

  8. Problems in passing 'Theory of Mind' tests is common in all groups presenting with damaged neuronal circuitry and is certainly not specific to autism:


    The same applies to the phenomena of echolalia the presence of which distinguishes verbal autistic childen from children with specific childhood language disorders.

    Echolalia is a common feature in many groups with damaged neuronal circuitry including stroke, brain tumor and Alheimers patients.

    Child psychiatry and psychology tend focus exclusively at behavior while ignoring the biology of autism.

  9. Last call...

    I point to ToM, and I point to a fabulous confabulation.

    That's all.