Long term outcomes; the 2 year Bayley tells us very little

The Bayley Scales of Infants Development were created to screen babies for developmental delay, and can be used as one way of identifying children with potential problems, and then determining whether they might benefit from intervention. Unfortunately they have become a way of measuring outcomes of neonatal interventions, and are often used to determine whether such a neonatal intervention is of benefit or not.

The 5 year follow up of the CAP (Caffeine for Apnea of Prematurity) trial was fascinating for me, we compared the resuts of 18 month Bayley scores (version 2) to IQ testing done at 5 years. Only 18% of babies who had a Bayley score below 70 at 18 months had an IQ score below 70 at 5 years.

I have added 2 lines to the graph that we published. The babies whose scores are represented by the dots to the left of the blue line are those who had a Bayley MDI <70. The children under the red line are those with an IQ score <70 at 5 years. The black line is from the original and shows the regression between the Bayley score and how much it changed compared to the IQ at 5 years, which shows that the lower the Bayley scores the more, on average, that they increased by 5 years.

A new publication from the ELGAN study group has done a similar thing, but in a lot more detail, and has defined adverse outcomes at 10 years of age, comparing the results of the Bayley version 2 scores (BSID) and motor evaluation (Gross Motor Functional Classification Score) at 2 years of age to IQ tests and other evaluations at 10 years. (Taylor GL, et al. Changes in Neurodevelopmental Outcomes From Age 2 to 10 Years for Children Born Extremely Preterm. Pediatrics. 2021)

At 2 years, they defined profound NDI as BSID-II MDI ,50, PDI ,50, or GMFCS 5 and moderate to severe NDI as BSID-II MDI 50 to 70, PDI 50 to 70, GMFCS 3 to 4, bilateral legal blindness, or bilateral hearing loss requiring amplification, the others were considered “none to mild”.

At 10 years the expert panel they put together came up with these definitions : moderate impairment (IQ 55–70, GMFCS 3, bilateral hearing loss requiring amplification, bilateral legal blindness, Autism Spectrum Disorder level 2, or epilepsy), severe impairment (IQ 35–54, GMFCS 4, or ASD level 3), or profound impairment (IQ ,35, GMFCS 5, or ASD level 3 combined with IQ 35–54). They had data at both times for just over 800 babies <28 weeks gestation.

The first publication I remember that made this clear to me was by Maureen Hack (Hack M, et al. Poor Predictive Validity of the Bayley Scales of Infant Development for Cognitive Function of Extremely Low Birth Weight Children at School Age. Pediatrics. 2005;116(2):333-41). She showed that, of 78 babies with a 20 month MDI <70, only 29 had an 8 year IQ score <70. She also noted that the babies who were less likely to improve were those with neurosensory abnormalities.

The new study also seems to show that infants with profound “NDI” at 2 years who had severe motor problems GMFCS 4 or 5 were less likely to improve. Looking at their table 3, most of the babies with severe GMFCS scores either stayed profoundly impaired or worsened from 2 to 10 years, only a few improved from profound to moderate or severe.

Overall these data are rather encouraging, the proportion of babies with adverse outcomes, and, in particular, severe or profound impairments is much lower at 10 years than the 2 year evaluation would suggest, and importantly, for individual patients predictions are quite unreliable.

Looking at things from another point of view, a publication from the NICHD network examining the outcomes at 2 years of age (Rysavy MA, et al. The relationship of neurodevelopmental impairment to concurrent early childhood outcomes of extremely preterm infants. J Perinatol. 2021) compared the Bayley version 3 results of close to 3,500 babies of 22 to 26 weeks gestation to other outcomes such as hospital readmission, surgery in infancy, feeding problems leading to gastrostomy tube placement, medication use, and medical equipement needs at home.

Although many of those outcomes were more frequent among infants with so-called “Neuro-Developmental Impairment” or NDI they also occurred in infants without this label, and some outcomes such as re-hospitalisation for respiratory illness or surgery were very similar across groups of infants with no “NDI” to severe “NDI”.

As the authors of this study note, many of the outcomes that are reported in this paper are things which are important to parents, and impact their families, but are usually not collected or presented in detail.

They also note the following:

NDI does not have a consensus definition. Published definitions vary widely across studies. Small variations in the definition of NDI can have a substantial influence on its rate in a population and on its association with specific variables. Despite this, studies of NDI in children born extremely preterm are used as the basis for recommendations to make treatment decisions, including whether to direct care toward survival or palliation, and are frequently used as a component of the primary outcome of major clinical trials.

You will have noted that I usually put “NDI” in quotation marks as it is a term that I think should be abandoned. Most infants at 2 years of age are labeled as NDI because of a low score on the Bayley scales. But a low score on a developmental screening test is NOT an impairment. Although of some value for identifying infants who might benefit from further evaluation or intervention, they should not be used to determine whether a baby’s life is worth living.

The Bayley 4 is coming, I think it was released in 2019, I don’t know if the standardisation will prove to be more reflective of the general population, but I know it was re-standardized, or if it will be more predictive of longer-term impairment. But I doubt if any test designed to detect developmental delay in early childhood can predict school difficulties or persistent intellectual difficulties. Using such scores well beyond their initial intended purpose, such as for a definition of profound impairment, and then using the chance that a baby may have “profound impairment” in decision-making, is an enormous mistake.

About keithbarrington

I am a neonatologist and clinical researcher at Sainte Justine University Health Center in Montréal
