Wednesday, June 29, 2011

Why null ain't necessarily dull

This post was chosen as an Editor's Selection for

Something slightly unusual happened this week. In a paper in the journal Vision Research, Simon Baron-Cohen and colleagues reported that they had found no statistically significant difference between the visual acuity of individuals with and without autism.

The study was a follow-up to a 2009 paper that claimed to show enhanced (or "eagle-eyed") visual acuity in autism. Following two particularly damning commentaries by experts in vision science, the Baron-Cohen group got together with the critics, fixed up the problems with the study, and tried to replicate their original findings. They failed.

While it's slightly concerning that the original study ever made it to publication, it's heartening that the authors took the criticism seriously, the concerns were addressed, and the scientific record was set straight fairly quickly. This is how science is supposed to work. But it's something that happens all too rarely.

In a brilliant piece in last weekend's New York Times, Carl Zimmer highlighted the difficulty science has in correcting itself. Wrong hypotheses are, in principle, there to be disproven but it's not always that straightforward in reality. In particular, as Zimmer points out, scientists are under various pressures to investigate new hypotheses and report novel findings rather than revisit their own or other people's old studies and replicate (or not) their results. And many journals have a policy of not publishing replication studies, even if the outcomes should lead to a complete reassessment of the original study's conclusions. 

There is, however, a deeper problem that Zimmer doesn’t really go into.

Most of the time, at least in the fields of science I'm familiar with, we’re in the business of null hypothesis testing. We're looking for an effect - a difference between two conditions of an experiment or two populations of people, or a correlation between two variables. But we test this effect statistically by seeing how likely it is that we would have made the observations we did if our hypothesis was wrong and there wasn’t an effect at all. If the tests suggest that it’s unlikely that this null hypothesis can account for the data, we conclude that there was an effect.

The criteria are deliberately strict. By convention, there has to be less than a 5% chance that the null hypothesis can explain your data before you can confidently conclude that an effect exists. This is supposed to minimize the occurrence of people making grand claims based on small effects that could easily have come about purely by chance. But the problem is that it doesn’t work in reverse. If you don’t find a statistically significant effect, you can’t be confident that there isn’t one. Reviewers know this. Editors know this. Researchers know that reviewers and editors know this. Rather than being conservative, null hypothesis testing actually biases the whole scientific process towards spurious effects entering the literature and biases against publication of follow-up studies that don't show such an effect. Failure to reject the null hypothesis is seen as just that - a failure.

This is something with which I'm well acquainted. My PhD was essentially a series of failures to replicate.  To cut a very long story very short, a bunch of studies in the mid 90s had apparently shown that, during memory tasks, people with Williams syndrome rely less on the meanings of words and more on their sounds. I identified a number of alternative explanations for these results and, like a good little scientist, designed some experiments to rule them out. Lo and behold, all the group differences disappeared.

Perhaps not surprisingly, publishing these studies turned out to be a major challenge. One paper was rejected four times before being finally accepted. By this time, I'd finished my PhD, completed a post-doc on similar issues in Down syndrome, and published two papers arising from that study. In some ways, they were much less interesting than the Williams syndrome studies because they really just confirmed what we already knew about Down syndrome. But they contained significant group differences and were both accepted first time.

So the big question. How do you get a null result published?

One helpful suggestion comes from Chris Aberson in the brilliantly titled Journal of Articles in Support of the Null Hypothesis. He points out that you can never really say that an effect doesn’t exist. What you can do, however, is report confidence intervals on the effect size. In other words, you can say that, if an effect exists, it’s almost certainly going to be very small.

Another possibility is to go Bayesian. Rather than simply telling you that there is not enough evidence to reject the null hypothesis, Bayesian statistics provides information on how likely it is that the null hypothesis versus the experimental hypothesis is correct given the observed data. I haven't attempted this yet myself so I'd be interested to hear from anyone who has.

The strategy I've found really helpful is to look at factors that contribute to the size of the effect you're interested in. For example, in one study on context effects in language comprehension in autism, we were concerned that group differences in previous studies were really down to confounding group differences in language skills. Sure enough, when we selected our control group to have similar language skills to our autism group, we found no difference between the two groups. But more importantly, within each group, we were able to show that an individual's language level predicted the size of their context effect. This gave us a significant result to report and in itself is quite an interesting finding.

This brings me neatly to my final point. At least in research on disorders such as autism or Williams syndrome, a significant group difference is considered to be the holy grail. In terms of getting the study published, it certainly makes life easier. But there is another way of looking at it. If you find a group difference, you’ve failed to control for whatever it is that has caused the group difference in the first place. A significant effect should really only be the beginning of the story.


Tavassoli T, Latham K, Bach M, Dakin SC, & Baron-Cohen S (2011). Psychophysical measures of visual acuity in autism spectrum conditions. Vision research PMID: 21704058

Further reading:

Friday, June 24, 2011

Screening for autism in infants and toddlers

ResearchBlogging.orgIt’s widely believed that early intervention is crucial for long-term prognosis in autism and that the earlier the intervention begins the better. Getting in early, of course, requires that autistic children are identified at a young age. But even for more severe forms of autism, children are rarely diagnosed before three to four years of age. With this in mind, the American Academy of Pediatrics has recommended screening all toddlers for autism.

However, writing in next July’s issue of Pediatrics (the academy’s own journal), Mona Al Qabandi and colleagues argue against routine population-based screening for autism. Chief amongst their objections is that existing screening tools are simply not up to the task. Most of these screens involve a questionnaire given to parents, sometimes augmented with a brief phone interview. But they all have their problems. Some are insensitive, missing a large number of kids who go on to get an ASD diagnosis further down the line. Others are sensitive but not specific, hoovering up all kinds of kids, many of whom don’t have autism, and may not have any kind of developmental problems at all.

Al Qabandi et al. conclude that “none of the autism screening tests currently available has been shown to be able to fulfill the properties of accuracy… in a population-wide screening program”.

Similar conclusions were reached in an earlier review by Josephine Barbaro and Cheryl Dissanayake at the Olga Tennison Autism Research Centre in Melbourne. So they tried a different approach. Rather than relying on parental questionnaires, they set up a 'surveillance program', training community nurses to spot the signs of autism during regular infant health checks.

Each nurse attended a short two-and-a-half-hour workshop in which they were shown how to complete the screen. They were given a checklist with key behaviours to monitor, depending on the child’s age, and were trained how to score each item as either typical, atypical, or absent. For instance, the item for “eye contact” read as follows:
"Has the child spontaneously made eye contact with you during the session? If not, interact with the child to elicit eye contact. Does s/he make eye contact with you?"
From an initial sample of almost 21 thousand children, 216 were identified as “at risk” of ASD by 24 months of age. Of these, 110 completed further assessment, including the ADOS and ADI-R. 89 of these kids received an ASD diagnosis, giving the surveillance program a positive predictive value of 81%. Of the remaining 21 children, all but one had developmental language disorders.

Calculating the screening program’s sensitivity is an inexact process at this stage. But assuming that the rates were similar for the children who did not undergo further assessment, Barbaro and Dissanayake estimated that approximately 175 ASD children would have been picked up. Dividing this by the total number of kids in the program gave an estimated prevalence of 1 in 119. This is reassuringly close to recent estimates of approximately 1 in 100 kids having an ASD, suggesting that the screen researchers managed to pick up the majority of ASD kids in the initial sample.

To get a more accurate indication of sensitivity, however, the researchers will have to wait until the children enter school. Only then will they be able to work out how many children end up with an ASD diagnosis but weren’t picked up by the screening measure surveillance program.

While it’s still early days, the Melbourne study suggests that population-wide screening for autism is possible, at least in areas that already have comprehensive regular child health checks.


Barbaro, J., & Dissanayake, C. (2010). Prospective Identification of Autism Spectrum Disorders in Infancy and Toddlerhood Using Developmental Surveillance: The Social Attention and Communication Study Journal of Developmental & Behavioral Pediatrics, 31 (5), 376-385 DOI: 10.1097/DBP.0b013e3181df7f3c

Al-Qabandi M, Gorter JW, & Rosenbaum P (2011). Early Autism Detection: Are We Ready for Routine Screening? Pediatrics PMID: 21669896

Olga Tennison Autism Research Centre

Further reading:

Monday, June 6, 2011

Social Communication Disorder - A new category in DSM 5

This post was chosen as an Editor's Selection for ResearchBlogging.orgA couple of weeks ago, I posted on a paper by Mandy and colleagues, which aimed to better characterise kids meeting current (DSM IV-TR) criteria for PDD-NOS (Pervasive Developmental Disorder Not Otherwise Specified). Their conclusion was that most of these kids had social and communication difficulties but not the repetitive and stereotyped behaviours (RSBs) that would have given them a full 'autistic disorder' diagnosis.

Under proposed revisions to diagnostic criteria (DSM 5), PDD-NOS is supposed to be subsumed by a broader category of "Autism Spectrum Disorder". However, Mandy et al.  pointed out that the proposed criteria for Autism Spectrum Disorder require evidence of RSBs, and so would actually exclude most of their PDD-NOS kids.

In a new paper, Prof Francesca Happe, a member of the DSM-5 working group, outlines the rationale for the proposed DSM 5 changes affecting autism spectrum disorders. The paper overlaps to a large extent with her excellent blogpost on the SFARI website. However, she also references the Mandy et al. paper, acknowledging that many individuals with PDD-NOS may miss out on an Autism Spectrum Disorder diagnosis because they don't have repetitive or stereotyped behaviours.

Here's what she has to say:
“Recently, Mandy et al. raised concerns that many children currently receiving [a PDD-NOS] diagnosis will not meet proposed DSM-5 criteria for ASD because of a lack of restricted / repetitive behaviour. For these children, the proposed new neurodevelopmental diagnostic category of social communication disorder will be relevant. This diagnosis, it is hoped, will more clearly and accurately capture the pattern of impaired social and communication abilities seen in the largest subgroup now labeled PDD-NOS”.
On the DSM 5 website, the new disorder is defined more formally:
"Social Communication Disorder (SCD) is an impairment of pragmatics and is diagnosed based on difficulty in the social uses of verbal and nonverbal communication in naturalistic contexts, which affects the development of social relationships and discourse comprehension and cannot be explained by low abilities in the domains of word structure and grammar or general cognitive ability."
Effectively, SCD seems to be official recognition for what researchers and practitioners have previously referred to as "Pragmatic Language Impairment" rather than a replacement for PDD-NOS. The emphasis is very much on the communication side of things, particularly conversation skills, with a suggestion that social difficulties are a secondary consequence of impaired communication. That's my interpretation at least.

As Happé suggests, it seems likely that many people who currently reside in the PDD-NOS pigeon hole would meet the SCD criteria. However, I'm not sure that the criteria necessarily capture the extent of the issues they face. As Will Mandy mentioned in his comment to my post:
"Our clinical experience is that children with PDD-NOS (i.e. mainly individuals with severe autistic social-communication difficulties, but without high levels of repetitive and stereotyped behaviours) are similar to those with a full autism diagnosis in terms of their functional impairment."
How this will all play out in practice in terms of access to services and interventions, I don't pretend to know. I'd certainly welcome comments from people better informed than I.

Update [29/01/13]: SCD is now definitely included in DSM-5 and does appear to be a replacement for PDD-NOS but is classified as a language disorder rather than a "relative" of autism. More details here.


Happé F (2011). Criteria, Categories, and Continua: Autism and Related Disorders in DSM-5. Journal of the American Academy of Child and Adolescent Psychiatry, 50 (6), 540-2 PMID: 21621137

Related posts

What is PDD-NOS?

Further reading

Dorothy Bishop: Pragmatic language impairment: A correlate of SLI, a distinct subgroup, or part of the autistic continuum? [PDF]