The best outcome variable for very preterm newborns?

Death or ‘neurodevelopmental impairment’ (NDI) as a combined outcome has become a sort of de facto standard as the primary outcome for neonatal clinical trials. Because many very preterm infants have developmental delay, intellectual and learning difficulties, and some have neurological problems, it is considered valuable to know if an intervention affects those things. Increasing survival is a good thing to do, reducing long term problems is also a good goal. Obviously, if you die, you can’t have developmental delay, so these are “competing” outcomes, and there is a certain sense to combining them. The idea being that the primary outcome of a trial or of a cohort study, should be to examine “intact” survival.

In order for the follow up to be of a reasonable length and to keep follow-up rates high, the duration of the follow-up can’t be too long. Which is why we end up using tools like the Bayley scales of Infant Development. The BSID is useful as a screening tool for developmental delay (now (version 3) in 5 domains, 2 are evaluated from parent questionnaires, 3, motor, cognitive, and language, are evaluated from structured observations of the child). The BSID can be applied at different ages, and was normalized (version 3) on a US population that included (by design) 10% of subjects who had developmental difficulties. It is a useful tool for screening infants to see who needs further evaluation, perhaps referral, perhaps intervention and so on. At the ages that we usually do follow-up, that is, about 18 to 24 months for the reasons mentioned above, developmental delay, especially language delay, is very common in former very preterm infants.

But, the BSID is not a tool for defining which children are impaired. There is a large proportion, in fact a majority, of infants who are below the -2SD cut off on the BSID who do not have cognitive impairment in the long term (between 66 and 80% of them, depending on the cohort you examine for the version 2 of the BSID). There are a few who go the other way, they have scores above the threshold at 18 to 24 months, but have intellectual difficulties when seen at 5 years, but they are much less common.

Nevertheless many groups refer to having a low score as a ‘developmental impairment’. Which is a distortion of the meaning of the word impairment. (Delay may not be the best word either, it may be taken as implying that all children with low scores will catch-up, but it is nevertheless a more accurate term for the substantial majority, the majority who do indeed “catch up”).

This isn’t just a criticism of the BSID, it is the best studied of the developmental screening tests in our population, but I don’t know if other screening tests are any better, I would doubt it. The percentages quoted above are based on the BSID version 2 (which had a different structure to the BSID3 and didn’t differentiate between cognitive and language scores).

There hasn’t been nearly as much study of the BSID3 but one recent publication (not surprisingly it is from Melbourne) shows poor predictive ability of that test also. What they did was perform a Bayley 3 at 24 months on 100 very preterm infants (<30 weeks), and then a DAS2 at 4 years of age. The DAS2 being a test of cognitive abilities, which produces a number of subscales, and an overall score for GCA (General Conceptual Ability).

They found basically, that as a predictor of DAS test scores the BSID3 was not much good. Which is a polite way of saying that it was crap.

They showed, for example, that if your BSID3 score on the cognitive composite at 24 months was below 85, then only half of the infants had an age 4 year DAS2 score on the GCS of less than 85. Even though there was a significant positive correlation between the scores, there was a huge scatter. This is another message that we have to get across, just because there is a statistically significant correlation between two scores (such as the DAS2 and the BSID3) does not mean that one usefully predicts the other.

For many of the other comparisons that they studied the proportion of those with low BSID3 scores who turned out to have long term impairments was even lower, as low as 12.5%. So, as a way of diagnosing impairment, the test is severely flawed.

Nevertheless, we use that test to make life and death decisions in the NICU. We tell parents, with a particular constellation of findings, “the chance of your baby surviving without neurodevelopmental impairment is X%” (or is very low, or is insignificant. However you want to say it). While what we mean is “a previous group of infants with similar findings, when studied at 2 years of age had either died or had scores on a developmental screening test of less than 70, scores which have very little predictive value for later cognitive impairment or in how the infants function”.

Recent guidelines have recommended that children with a high rate of not performing well on developmental screening tests should not receive active intensive care.

I hope others share my outrage at this. It is a perversion of the literature, and a perversion of good medical practice. When we counsel parents we should be talking about things which affect the lives of families. Not Bayley scales. Or other developmental screening tests.

What should we do to go forward:

We should stop using the term NDI for the combination of neurologic impairment (which will often be permanent) and developmental delay (which often is truly a delay). I suggest NIDD instead, (neurologic impairment or developmental delay).
We should stop lumping death and NIDD together for counselling. Completely. I understand the need to put competing outcomes together for research, even then it is artificial, and there are other approaches which have been developed. But having a baby die, or having a baby survive with disability or developmental delay, have such different meanings for many parents.
We should rethink that term, NIDD and what we put into it. A baby with spastic quadriplegia and a very limited ability to communicate because of severe intellectual disability is not the same as a child with a DQ of 68 who then goes to normal school, and ends up with an IQ of 80. Right now the definition puts them together, and makes everyone (especially obstetricians, it seems) think that a kid with “NDI” is going to have a horrible life.
We should ask parents (of babies who died, and of babies who survived with and without disability) what they think are important outcomes. Some families are completely disrupted by children without NIDD but who have conduct disorder, or other emotional problems. Some are almost unaffected by a child with a hemiplegia. There will not be one answer about what is important, but unless we know the range of answers we can’t really talk to parents about what is important to them.
We should focus on those babies whose health related quality of life is significantly affected. Determine whether we can discriminate between them and those with little impact, at any early age. Then we could see if we can really predict who will have severely affected QoL.
We should push research focused on how parents and families adapt to children with impairments, how we can limit their disabilities and minimize the resultant handicap.

1 Response to The best outcome variable for very preterm newborns?

ericruthford says:

3 December 2015 at 15:43

Reblogged this on They don't cry and commented:
This excellent post sums up much of what I’ve been hoping for in antenatal counseling. When they tell you “moderate to severe neurodevelopmental impairment” and “severe to profound neurodevelopmental impairment,” it sounds like “life without parole” for you, the parent, when “delayed” would be the better way of looking at it. Our son is delayed, and we might wait an extra year on kindergarten, but this isn’t as bad as it sounds in the terms they give you when they’re counseling you on whether you should intubate your child.