Longer-term outcomes: what should we measure? part 2

I have made my concerns about developmental screening tests including the Bayley clear over the years, including in the previous post, which might make what I am going to say now seem odd: I do not think we should stop screening for developmental delay in neonatal follow up!

The problem is not doing the Bayley evaluation, the problem is thinking of Bayley scores as a hard endpoint which diagnoses clinically important impairment.

The tests that we perform in follow up should be adjusted to our goals of follow up. If we wish to screen for developmental delay in order to offer intervention, then that is a worthwhile goal, and performing a test early will identify more babies with low scores, so an 18-month test might be reasonable. Brett Manley and a group of CAP investigators analysed the factors which were associated with an improvement in test scores from 18 month Bayley version 2 scores to an IQ test at 5 years; we found that the major factor associated with having an improvement was the socio-economic environment. Identifying infants at 18 months of age who have delayed development and who are also socio-economically deprived identifies a group of children at high risk of lower IQ scores, and probably therefore of difficulties at school, who could well benefit from early intervention programs, while those who are socially and economically advantaged will usually get higher scores anyway in the future.

On the other hand, if we want to predict later impairment, the predictive value of Bayley scores for later intellectual difficulties is poor, especially at 18 months. The screening tests become more useful with age, but even at 30 months they (at least the Bayley version 3) only correctly identify around about 50% of infants. The table below (from the EXPRESS cohort) shows that, if defined as moderately “disabled” at 30 months of age, which in 2/3 of the cases was as a result of a moderately low Bayley 3 score, 44% of the infants were moderately or severely disabled at 6.5 years, again, mostly because of low cognitive scores, (using the WISC-IV).

You can also see from the table, that infants with no disability at 30 months still sometimes had moderate or even severe disability at 6.5 years (Serenius F, et al. Neurodevelopmental Outcomes Among Extremely Preterm Infants 6.5 Years After Active Perinatal Care in Sweden. JAMA Pediatr. 2016)

Ideally, a follow-up program should be able to identify infants that will benefit from therapy as well as to determine the outcomes of our neonatal interventions. If 2 arms of study have the same survival, then examining other aspects of the outcomes becomes of interest. Outcomes that affect the function of the child and their family are the ones we should focus on; and the rare infant with a significantly reduced quality of life, if different between arms of a study would also be valuable information. The newer versions of the Bayley scores include an evaluation of function, which in the Bayley version 4 is derived from the Vineland Adaptive Behaviour Scales, a well-supported scale for analysing function.

Outcomes of neonatal trials should be considered in a hierarchical fashion. As almost all survivors have an acceptable, good or excellent quality of life, survival should always be the primary outcome. The second level of the hierarchy should be impairments and clinical difficulties which affect function: disabling CP, blindness, gastrostomy feeding, recurrent hospital admissions, medical instrumentation at home, disruptive behaviour problems. The third level of the hierarchy should be things that affect function little but would be preferable to avoid: chronic medication use, developmental delay, need for physiotherapy. You may notice that I haven’t put in this schema intellectual limitations and learning difficulties, because I do not know where they should go, I think different families, would probably score them differently and different severities of those problems might put them in the 2nd or third priority group.

Ways of analysing trials that prioritize adverse outcomes in that way have been developed, and could be much more useful in deciding between therapeutic approaches than “death or NDI”. I have blogged about this previously, the “win ratio” where results are compared between groups with prioritized outcomes, so death is the worst outcome, survival with serious long term problems is next, and survival with moderately severe problems is next. By comparing the number of winners in the comparisons, Pocock SJ, et al. The win ratio: a new approach to the analysis of composite endpoints in clinical trials based on clinical priorities. European Heart Journal. 2012;33(2):176-82 you can analyse statistically which of two groups have the better outcome.

This is not just a theoretical methodology, there are now several trials, mostly in Cardiology, that have used the win ratio as a way to take into account death as well as non-fatal complications as part of a composite outcome, where death is the most important outcome, but others such as hospitalisation for cardiac events, are given secondary importance. (Redfors B, et al. The win ratio approach for composite endpoints: practical guidance based on previous experience. Eur Heart J. 2020;41(46):4391-9).

There are also logistical reasons for continuing to perform developmental screening tests at around 2 years, keeping infants in a follow-up program needs frequent contact, and waiting until a child has reached 5 years of age, risks losing many more children and decreasing the confidence in the results, funding for trials which can’t report their outcomes for 7 or 8 years is tricky. The CAP trial was for example funded initially for 18-month outcomes, and then repeated grant applications to extend follow up were required, and successfully obtained by Barbara Schmidt and her collaborators.

Finally, I have a question for my readers; is there any trial that has shown no difference in developmental outcomes at 2 years between two groups that has then found an important difference in neurological/intellectual/learning outcomes later on?

If the answer is no, and I can’t think of such a trial currently, them perhaps continuing to evaluate and report on developmental delay at around 2 years of age is reasonable, not as a hard endpoint of any clinical significance, but as a sort of a screen to decide which trials should then get funding for 5 or 6-year outcomes. Studies showing no difference in neurological impairment or developmental delay at around 2 years may be extremely unlikely to show a difference later; while those that show a difference could then be funded to examine clinically important outcomes and outcomes important to families at around 5 or 6 years of age. I think that the analysis of the 2 years outcomes should prioritize survival as the most important of the outcomes, and then as second priority neurological impairments (which are more likely to be long-term problems for the child) and then as third priority developmental delay, that analysis could be done using the win ratio method.

At later follow up we could measure outcomes which have an importance to families at 5 or 6 years, such as indicators of good health, behaviour problems, feeding difficulties, and those which could lead to adjustments in the way they are taught, such as measures of IQ and executive function.

Indeed we should be doing more research to find out which outcomes matter most to parents, such as the studies that Annie Janvier and her collaborators are doing.

Below is a youtube video of a webinar that Annie gave about the outcomes that parents care about, and about her on-going research on the topic.

About Keith Barrington

I am a neonatologist and clinical researcher at Sainte Justine University Health Center in Montréal
This entry was posted in Neonatal Research and tagged . Bookmark the permalink.

3 Responses to Longer-term outcomes: what should we measure? part 2

  1. Gil Wernovsky says:

    Hi Keith, Simply brilliant. As you well know, the follow up of term cardiac babies is nearly identical to the premature population. Couldn’t agree more—Bayley is very good for comparing toddlers to other toddlers, but minimal predictive validity for longer term outcomes. The more socially important outcomes such as executive function and social cognition don’t appear for at least a decade. So it’s an ongoing moving target, at the same time our neonatal practices change every decade or so.

    Thanks as always for your thoughtful posts. Hope to see you again soon.

    • Thanks, it’s really a problem, isn’t it! We can’t just wait for the outcomes that mean the most to individuals and parents, they take a long time, and no-one will fund us for outcomes a decade after the intervention.
      I sometimes think the only outcome that really matters is to ask people on their death bed if they had a good life! But to be more sensible, outcomes that make life more difficult for an individual, such as a learning disability are certainly worth avoiding if we can, and perhaps even developmental delay (and low Bayley scores), even if there is no difference in longer-term outcomes, if parents worry less when their child has a slower development than their peers, then fewer cases of developmental delay would be a worthwhile goal as long as other outcomes are equivalent.
      A lot of preterm babies have executive function problems, which make schooling more difficult, even if there is not much impact on quality of life. I know a lot of cardiac kids have similar problems, which are also not shown up by our Bayleys. Your seminars review from 4 or 5 years ago was a great insightful analysis of the outcomes, (Ringle ML, Wernovsky G. Functional, quality of life, and neurodevelopmental outcomes after congenital cardiac surgery. Semin Perinatol. 2016;40(8):556-70), everyone who looks after those babies should read that.
      Hopefully, when we are all full vaccinated and some of the craziness of the last 18 months starts to ebb away we can meet at a conference at some exotic location!

  2. alex stevenson says:

    Dear Prof Barrington,

    This is just some fan mail from Harare to say thank you for your work and posts on neonatology.

    I am a newly qualified neonatologist and really enjoy your emails and I hope that they improve my ability to analyse studies, and potentially to design better ones.

    I see you enjoy birdwatching. We have some great birds here in Zim- I’m not much of a photographer but asked my friend Nick for some of his photos. Who knows, maybe one day you will visit.

    Thanks again,




Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.