Bayley, Bayley, Bayley

That is, Bayley 3.

Back from the PAS-meeting, followed by a resuscitation research workshop, and suffering from brain fatigue.

One thing that fatigues my brain is a statement that I heard more than once which is that the version 3 of the Bayley scales exaggerates the performance of very preterm infants.

I think that is an erroneous interpretation of what we know.

The 3rd version of the Bayley scales of infant development give higher scores, when administered to the same preterm infants, on the cognitive scale, compared to the MDI on the Bayley 2. Of course the Bayley 2 did not have a language sub-scale, language skills were incorporated into the MDI. That doesn’t necessarily mean that the Bayley 3 is over-optimistic.

Some of this is a question of terminology; for example, Peter Anderson, and the amazing Victorian Group from Australia, published an article which they titled ‘Underestimation of Developmental Delay by the New Bayley-III Scale‘. Their title and analysis are true in so far as the definition of developmental delay is : a score more than 2 SD below the age-matched full-term controls with birth weight over the 10th percentile who consented to be part of the study. Even if the controls who participated (92% of them) are representative of the whole population, eliminating late preterms (8 to 10% of the population), and growth restricted babies (10% of the population) will inflate scores a little compared to a completely unselected population. It is also possible that normal birth weight full-term babies with difficulties might have self-selected to not participate, but that is not sure. In that study, the numbers of babies with a score less than 70 was much lower than the number with a score below -2SD, because the mean score of the controls was about 108 for both cognitive and language, with an SD of about 14. So 2 SD below the mean of the controls was about 80 on each scale.

The mean score of this control sample was well above 100 partly because of the factors I’ve just mentioned, but also because the standardization of the Bayley 3 tool included adding 10% of infants with behavioural, developmental, and physical issues to the standardization group.

If you want to identify children with a score lower than -2SD below a sample of non-growth restricted full term babies, then you need to do what Anderson et al did, and have such a control group. In that sense the title and implications of their study are correct.

Another way of looking at it though, would be to ask if the Bayley 3 can identify infants who will have problems later, or if it can identify infants who will benefit from intervention.

A good step in this direction has just been made by the same group, who have compared Bayley 3 cognitive and language scores to an IQ type test at 4 years of age, (the DAS-II, apparently a test of reasoning and conceptual abilities).

One example of their results is this scatterplot:

Which shows the (non)-relation between the 2 y Bayley 3 cognitive score and the non-verbal reasoning score on the DAS-II. As you can see there is a positive correlation, but there is also a major spread on either side of the line. I can see only one baby with a DAS less than 70, who had a Bayley 3 of 70. The remaining Bayley 3 scores that were below 80 are all in the ‘normal range’ at 4 years on the DAS.

The graph comparing Bayley 3 language scores to the language subscore of the DAS-II is even more striking, The correlation between the Bayley 3 and the DAS is very poor, the line is almost horizontal.

At 4 years the non-verbal reasoning score of this group of preterm infants of less than 3o weeks gestation was 99.9. Low Bayley 3 scores were very poor at predicting who would have low DAS scores, positive predictive values were all less than 50, even when using local reference data, and using a more severe cutoff.

This means 2 things,

1. We still can’t use the 2 year Bayley scores (even the Bayley 3) to define who should be considered impaired, most babies with low scores do not have low cognitive abilities later on.

2. The Bayley 3 is not sufficiently predictive to use to determine who should have further follow-up or intervention. The negative predictive values were very high though, so an infant who scores well on the 2 year Bayley-3, will probably continue to score well later on.

About Keith Barrington

I am a neonatologist and clinical researcher at Sainte Justine University Health Center in Montréal

View all posts by Keith Barrington →

This entry was posted in Neonatal Research. Bookmark the permalink.