Neil Marlow has published a thoughtful, and thought provoking, article to address the issue outlined in the title. What are the appropriate outcomes when designing neonatal research studies?
It has become almost a rule, that a multi-center trial of an intervention in neonatology, especially if it is planning to enroll very preterm infants, has to have ‘survival without neurological or developmental disability’ as the primary outcome, with the ‘disability’ part measured at about 2 years.
There is some value to this outcome, very preterm infants have high mortality and high morbidity in those domains, infants who die cannot be developmentally delayed, so they are competing outcomes that need to be taken into account if we are trying to construct a dichotomous end-point. Like most things in trial design we have to make compromises, later follow up might be nice, but it will increase costs and increase drop-out rate; follow up to 2 years allows good retention rates, if you work hard at getting parents to return, but means you need to use developmental screening tools which are developed for very young infants.
There are however, numerous problems with this approach, which are discussed in this piece. (Marlow N. Is survival and neurodevelopmental impairment at 2 years of age the gold standard outcome for neonatal studies? Archives of Disease in Childhood – Fetal and Neonatal Edition. 2014).
I also think we do need to rethink this approach for some of the following reasons (which overlap with the reasoning of Neil Marlow)
1 Developmental delay is not a dichotomous outcome. It is convenient for research planning to think of children either having impairment or not, but of course, developmental delay is a continuum, arbitrarily deciding that a Bayley score of 69 is delayed, but a score of 71 is not, misses all sorts of nuances in outcomes.
2. Developmental delay is not stable over time: many children labelled as delayed at 2 years have intellectual abilities in the long term that are close to, or above, average (in fact according to the work of Maureen Hack, 2/3 of them do). The improvement in developmental scores is correlated more closely with social advantages than to anything which occurs in the neonatal period.
3. Developmental delay has little or no influence of quality of life. Children with developmental delay can, and usually do, have an excellent QoL. If the purpose of your research project is to decided which therapies we should use in the future, then whether or not the therapy affects quality of life should be our main consideration, but as far as I know, no therapy has been shown to affect quality of life, apart from effects on quantity of life. In other words, surfactant for RDS increases quality of life, because more children are alive to have a life of good quality!
4. Very few neonatal preterm studies have ever shown an effect on neurological impairment or developmental delay. Of all the studies in the very preterm baby, which have actually been confirmed to reduce developmental delay? Maybe someone should do a systematic review to answer that question, of the top of my head there is Caffeine (at least when defined by developmental screening at 2 years, but not when examined at 5 years), and then there is… well that’s about it. So many of our decisions about which treatments are proven to be beneficial are really based on their impacts on survival, or on other morbidities, such as lung injury (another non-dichotomous outcome that we ‘dichotomize’ for facilitating compound outcomes).
5. Another important consideration is that the effect of an intervention on survival and on neurological or developmental outcomes may be in different directions. Which means that a trial might be ‘negative’ but still have important results that should change practice. I don’t know if this has happened for the outcome of survival and 2 year developmental screening test scores, but it is analogous to what SUPPORT showed. SUPPORT was a negative trial. The composite primary outcome (survival without severe retinopathy) was not affected by different saturation target ranges, because the impacts on the two components of the outcome were in opposite directions.
Neil Marlow includes a discussion of the TIPP trial, which showed a reduction in severe brain injury, a reduction in serious pulmonary hemorrhage, and a reduction in need for PDA ligation. But, the study did not show an overall improvement in developmental outcomes with indomethacin prophylaxis compared to control. I think it was an excellent trial with reliable results, but because of the lack of improvement in the primary outcome it has discouraged the use of prophylactic indomethacin. However, there was a reduction in severe hemorrhage from 13 to 9%: if the only effect of indomethacin on the brain or on development was the reduction in IVH, and if the patients who had a severe IVH had an increase in neurodevelopmental ‘impairment'; then any benefit would only have been on those 4% who escaped IVH, and the impact on the scores of the groups as a whole would have been very small. This study was therefore grossly underpowered, in the sense that it would not be able to show that a reduction in severe IVH of that magnitude had an overall effect on the developmental scores of the entire group.
In their answer to an article which questioned why prophylactic indomethacin was not more widely used, De Mauro et al stated
” TIPP failed to demonstrate any long-term benefit of indomethacin prophylaxis, but the study also failed to prove the absence of long-term harm”
Which is true, but I would say it is not the whole story, because you could equally well say, ”TIPP failed to prove the absence of long term benefit, but the study also failed to demonstrate any long-term harm ”
A study with no statistically significant overall effect on developmental outcome always means that there is some possibility that developmental outcomes are actually improved, or actually harmed, depending on the range of the confidence intervals of the result. TIPP gives some confidence that developmental outcomes are not dramatically harmed by indomethacin, but there will also be a possibility that in reality there is some impact on development: the confidence intervals of the study result gives us a range within which we can say, with 95% confidence, the true difference in the means of the Bayley scores lie. Uncertainty has been reduced by doing the trial, it means it is likely that the true impact on the Odds of an infant (similar to those eligible for the trial) having a Bayley score below 70 are between an Odds ratio of 0.8 and 1.4 if they receive prophylactic indomethacin rather than not. Or, to put it in terms that I find easier to conceptualize, the risk of having a Bayley score below 70 is probably (with 95% confidence) between a 17% reduction and 29% increase in risk, if you get prophylactic indomethacin.
If most survivors of NICU do well (as they do) and almost all have a good quality of life (which they do) and the history of neonatal intensive care research has shown it is extremely difficult to demonstrate improved developmental outcomes at 2 years (as I discuss above); and developmental outcomes at later ages than that (when testing is more relevant for functional ability) have never been affected by a neonatal intervention, then maybe we should reconsider our outcomes. I think that trials should be designed and powered to examine effects on improving survival and other serious short term complications, such as severe cerebral hemorrhage, necrotizing enterocolitis and so forth, and that surveillance for other medium and long term outcomes, including developmental screening test scores, should be considered important for ensuring safety.
I also think we should be asking parents about these things. Is a reduction in severe brain hemorrhage an outcome that they think is important, even if we can’t prove an advantage in terms of Bayley scores?
Which is all very similar to what Neil Marlow says in his article. So he must be right.