I have made my concerns about developmental screening tests including the Bayley clear over the years, including in the previous post, which might make what I am going to say now seem odd: I do not think we should stop screening for developmental delay in neonatal follow up!
The problem is not doing the Bayley evaluation, the problem is thinking of Bayley scores as a hard endpoint which diagnoses clinically important impairment.
The tests that we perform in follow up should be adjusted to our goals of follow up. If we wish to screen for developmental delay in order to offer intervention, then that is a worthwhile goal, and performing a test early will identify more babies with low scores, so an 18-month test might be reasonable. Brett Manley and a group of CAP investigators analysed the factors which were associated with an improvement in test scores from 18 month Bayley version 2 scores to an IQ test at 5 years; we found that the major factor associated with having an improvement was the socio-economic environment. Identifying infants at 18 months of age who have delayed development and who are also socio-economically deprived identifies a group of children at high risk of lower IQ scores, and probably therefore of difficulties at school, who could well benefit from early intervention programs, while those who are socially and economically advantaged will usually get higher scores anyway in the future.
On the other hand, if we want to predict later impairment, the predictive value of Bayley scores for later intellectual difficulties is poor, especially at 18 months. The screening tests become more useful with age, but even at 30 months they (at least the Bayley version 3) only correctly identify around about 50% of infants. The table below (from the EXPRESS cohort) shows that, if defined as moderately “disabled” at 30 months of age, which in 2/3 of the cases was as a result of a moderately low Bayley 3 score, 44% of the infants were moderately or severely disabled at 6.5 years, again, mostly because of low cognitive scores, (using the WISC-IV).
You can also see from the table, that infants with no disability at 30 months still sometimes had moderate or even severe disability at 6.5 years (Serenius F, et al. Neurodevelopmental Outcomes Among Extremely Preterm Infants 6.5 Years After Active Perinatal Care in Sweden. JAMA Pediatr. 2016)
Ideally, a follow-up program should be able to identify infants that will benefit from therapy as well as to determine the outcomes of our neonatal interventions. If 2 arms of study have the same survival, then examining other aspects of the outcomes becomes of interest. Outcomes that affect the function of the child and their family are the ones we should focus on; and the rare infant with a significantly reduced quality of life, if different between arms of a study would also be valuable information. The newer versions of the Bayley scores include an evaluation of function, which in the Bayley version 4 is derived from the Vineland Adaptive Behaviour Scales, a well-supported scale for analysing function.
Outcomes of neonatal trials should be considered in a hierarchical fashion. As almost all survivors have an acceptable, good or excellent quality of life, survival should always be the primary outcome. The second level of the hierarchy should be impairments and clinical difficulties which affect function: disabling CP, blindness, gastrostomy feeding, recurrent hospital admissions, medical instrumentation at home, disruptive behaviour problems. The third level of the hierarchy should be things that affect function little but would be preferable to avoid: chronic medication use, developmental delay, need for physiotherapy. You may notice that I haven’t put in this schema intellectual limitations and learning difficulties, because I do not know where they should go, I think different families, would probably score them differently and different severities of those problems might put them in the 2nd or third priority group.
Ways of analysing trials that prioritize adverse outcomes in that way have been developed, and could be much more useful in deciding between therapeutic approaches than “death or NDI”. I have blogged about this previously, the “win ratio” where results are compared between groups with prioritized outcomes, so death is the worst outcome, survival with serious long term problems is next, and survival with moderately severe problems is next. By comparing the number of winners in the comparisons, Pocock SJ, et al. The win ratio: a new approach to the analysis of composite endpoints in clinical trials based on clinical priorities. European Heart Journal. 2012;33(2):176-82 you can analyse statistically which of two groups have the better outcome.
This is not just a theoretical methodology, there are now several trials, mostly in Cardiology, that have used the win ratio as a way to take into account death as well as non-fatal complications as part of a composite outcome, where death is the most important outcome, but others such as hospitalisation for cardiac events, are given secondary importance. (Redfors B, et al. The win ratio approach for composite endpoints: practical guidance based on previous experience. Eur Heart J. 2020;41(46):4391-9).
There are also logistical reasons for continuing to perform developmental screening tests at around 2 years, keeping infants in a follow-up program needs frequent contact, and waiting until a child has reached 5 years of age, risks losing many more children and decreasing the confidence in the results, funding for trials which can’t report their outcomes for 7 or 8 years is tricky. The CAP trial was for example funded initially for 18-month outcomes, and then repeated grant applications to extend follow up were required, and successfully obtained by Barbara Schmidt and her collaborators.
Finally, I have a question for my readers; is there any trial that has shown no difference in developmental outcomes at 2 years between two groups that has then found an important difference in neurological/intellectual/learning outcomes later on?
If the answer is no, and I can’t think of such a trial currently, them perhaps continuing to evaluate and report on developmental delay at around 2 years of age is reasonable, not as a hard endpoint of any clinical significance, but as a sort of a screen to decide which trials should then get funding for 5 or 6-year outcomes. Studies showing no difference in neurological impairment or developmental delay at around 2 years may be extremely unlikely to show a difference later; while those that show a difference could then be funded to examine clinically important outcomes and outcomes important to families at around 5 or 6 years of age. I think that the analysis of the 2 years outcomes should prioritize survival as the most important of the outcomes, and then as second priority neurological impairments (which are more likely to be long-term problems for the child) and then as third priority developmental delay, that analysis could be done using the win ratio method.
At later follow up we could measure outcomes which have an importance to families at 5 or 6 years, such as indicators of good health, behaviour problems, feeding difficulties, and those which could lead to adjustments in the way they are taught, such as measures of IQ and executive function.
Indeed we should be doing more research to find out which outcomes matter most to parents, such as the studies that Annie Janvier and her collaborators are doing.
Below is a youtube video of a webinar that Annie gave about the outcomes that parents care about, and about her on-going research on the topic.