The NeoProm primary publication is the result of a prolonged gestation; I remember meetings over 15 years ago when Cynthia Cole from Boston was suggesting the idea, and although she was not, I think, finally an author on any of the major trials or the Prospective Meta-Analysis, she was quite persuasive and helped to get the ball rolling.
Askie LM, et al. Association Between Oxygen Saturation Targeting and Death or Disability in Extremely Preterm Infants in the Neonatal Oxygenation Prospective Meta-analysis Collaboration. JAMA. 2018;319(21):2190-201. I am actually a little surprised that the title of this article starts with the word “association”; surely prospective randomized controlled trials are designed to investigate causation, rather than association?
What is NeoProm, and what is a prospective meta-analysis (PMA)? The idea behind PMA is that, to prove small, but clinically important, differences in outcomes we need to perform very large trials. In neonatology we look after about 10% of all human beings, but only for the first few days, or occasionally weeks, of life. When we are looking at an even smaller proportion of those live births, around 1%, that is, those born extremely preterm, then large international collaborations become essential for many of the important questions in the NICU.
Costs for such very large trials can become prohibitive for a single funding agency, even if, like NIH (USA), CIHR (Canada), NHMRC (Australia), and the MRC (UK), many large national bodies are willing to fund participants in other countries, if you need to fund a trial in 5000 intensive care patients, requiring extensive data collection, it gets very expensive. One alternative is to perform a number of co-ordinated trials, funded by different agencies, with the intention of performing an Individual Patient Data meta-analysis. You facilitate the IPD, by agreeing beforehand on what data will be collected, and how, and what the definitions of certain outcomes will be. It is also important, of course, to pre-define the primary outcome and the important secondary outcomes. These restrictions on trial design are what makes a PMA different to a post-hoc IPD meta-analysis.
A PMA is not as powerful as an individual trial with the same number of participants, because you have to use some of the statistical power to account for possible differences between trials. You also have to be careful that all the originally planned trials are truly committed to the PMA, otherwise you might end up with fragmentary publications, which might artificially inflate the apparent “significance” of the results.
The NeoProm collaboration is, I think, the first PMA in neonatology. Bravo to all the PI’s, the steering committees, the local investigators (including me!) and in particular to Lisa Askie, who seems to still be sane (or at least as sane as she ever was) following the successful completion of this megaproject.
The Collaboration included SUPPORT, which was started and finished first with dates that overlapped the others, the BOOST-2 trials in Australia and in NZ, BOOST2-UK, and COT. In total there were indeed almost 5000 extremely preterm babies in the trials.
The primary outcome, agreed before the collaboration proceeded, was a composite of either death or “major disability” at 18 to 24 months corrected age: major disability was any of the following: Bayley Scales Development version 3 cognitive or language score of less than 85; severe visual loss (cannot fixate or is legally blind with visual acuity <6/60 in both eyes); cerebral palsy with the GMFCS 2 or higher; or deafness requiring hearing aids.
Basically there was no impact of the different saturation targets (high 80’s vs low 90’s) on the primary outcome.
I guess I could stop there, but you know me!
I have serious concerns about how this result might be interpreted, and about the relevance of this primary outcome.
A Bayley-3 score of under 85 at 2 years of age is NOT A DISABILITY. The Bayley test is a somewhat useful, over-sensitive, screening test (over-sensitive as most screening tests should be) for developmental delay, the MAJORITY of infants with a Bayley under 85 at 24 months do not have any long-term impairment. Very few of them have a disability, and even fewer have a handicap. It is certainly not equivalent to being dead.
The major part of the “disability” outcome was a low Bayley score, of the 1429 babies with “major disability” 1319 had a low Bayley score, 213 had cerebral palsy, 120 were deaf and 48 had serious visual problems; and there were 896 deaths in total by 2 years of age. (In the supplemental information you can find that the Bayley-3 language or cognitive composite scores were under 70 for 443 of the infants).
I understand the problem of competing outcomes, that a baby who is dead cannot have a developmental delay at 24 months, but there are other ways of dealing with competing outcomes that do not imply equivalence. Including: hierarchical outcome evaluations, for example, which evaluate the most important first.
What should the primary outcome be for large neonatal studies? Almost all of our survivors have acceptable quality of life, so I think that survival should be the primary outcome of any of these studies. There might be some reasonable disagreement, however, about the place of very profound disability with inability to communicate, although rare, there are reasonable people, and parents of extremely preterm babies, who find such an outcome to be equivalent to death, in terms of the value to them and the infant. Perhaps the first outcome to evaluate could be a composite of death and very profound impairment (inability to communicate). I also think that the decision regarding primary, and other, outcomes should be made in collaboration with parent partners, and former preterm infants.
Following this, evaluation of other aspects of longer term outcomes could follow.
As profound impairment with inability to communicate is so rare among our extreme preterm babies, the primary outcome of this PMA would likely be unchanged if death or profound disability was used as the primary outcome, compared to death alone. The PMA shows an absolute decrease in mortality before discharge of 2.4%, and a relative decrease of 17% when the higher saturation targets are used, compared to lower targets. The 95% confidence intervals for that outcome do not include ‘no difference’, so the difference is unlikely to be due to chance alone. As the article in JAMA is (too) ready to point out, that is a secondary outcome; the components of a primary outcome are strictly always secondary outcomes, but I think they have a different status to other outcomes, which are not directly related to the primary, such as, for this study, bronchopulmonary dysplasia for example. Many years ago when we started on this adventure I thought that BPD would probably be less frequent in the low saturation group, as you would need less oxygen and lower mean airway pressures during ventilation, and less non-invasive ventilation afterwards; but there really is no signal at all for BPD. There are of course more babies in the high saturation target group receiving oxygen at 36 weeks gestation, but other indicators of lung injury severity, such as the proportoin going home on oxygen (supplemental appendix), when home oxygen babies came out of their oxygen, and hospital readmissions in the first years of life, were the same between groups. One think I hadn’t particularly been expecting was that the lower saturation group would have more severe Necrotising Enterocolitis (that is needing surgery or dying), but that does seem to be the case, with 9% in the low saturation group, and 7% in the high saturation group.
The implication of this for clinical practice is that NeoProm confirms that the only target SpO2 range which is evidence-based is between 90 and 95%. Anything else (such as 88 to 92% for example) is speculative and would require other studies of many thousands of babies. I don’t think that is likely to happen in the near future.
The main adverse outcome of higher saturations is worse retinopathy, leading to an increase in the need for treatment, from 11% to 15%. As noted in the individual trials, even though treatment was more frequent, visual outcomes at 2 years of age were not different. We need to find other ways of decreasing retinopathy, in particular improving nutritional standards and growth outcomes.