Aiming for slightly higher oxygen saturation targets (low 90s) decreases mortality in the very preterm neonate (compared to the high 80s), and decreases surgical necrotizing enterocolitis, while increasing retinopathy, and the need for retinopathy treatment, in the long term there is no impact on disability, visual impairment or hospital readmissions during the first year among survivors.
That one sentence summary of the current state of play is based on many years of important international collaboration, summarized in the publication of NeoPROM. This was a prospective meta-analysis which analyzed individual patient data from very nearly 5000 randomized babies (Askie LM, et al. Association Between Oxygen Saturation Targeting and Death or Disability in Extremely Preterm Infants in the Neonatal Oxygenation Prospective Meta-analysis Collaboration. JAMA. 2018;319(21):2190-201)
There has been some controversy regarding these results, as some trials showed smaller effects on mortality than others, but there is in fact very little heterogeneity in the results. Some of the difference in results can be traced to the calibration algorithm debacle. An intensive analysis of the impacts of that problem has just been published, which includes an explanation of the issue with the pulse oximeters, supplied by Masimo.
The authors explain:
Masimo reported that this [the calibration problem] reflected their decision to adjust the calibration of their oximeters so that at values >87% the displayed SpO2 were increased by 1%-2%. As well as fewer values than expected between 87% and 90%, this manufacturer-generated artifact returned more SpO2 values than expected >90%, thus affecting both target groups in the NeOProM trials. By elevating SpO2 readings of 88% and 89% to greater displayed values, the artifact would be expected to make the low target group range of 85%-89% narrower and harder to target. By elevating SpO2 values in the range 90%-95% by 1%-2% above the true value the artifact would mean that actual achieved SpO2 values in the high target range with the original oximeters were lower than intended, narrowing the difference in SpO2 between groups.
I have never fully understood why Masimo did this, I have heard that it was to prevent alarms due to minor desaturations, and make the oximeters more attractive to anaesthetists, but that doesn’t make a lot of sense to me. In any case this messed up the trials a little bit, when the problem was identified, Masimo adjusted the calibration of the oximeters, and oximeters with new software were used for the latter part of BOOST Australia, BOOST UK, and COT.
Ben Stenson and colleagues have now re-analyzed data from babies in the Australia and UK BOOST trials and showed that the revised algorithm led to babies being in their target group for a longer period of time, especially those in the low target group. Babies in the low target group were actually in their target range for 40% longer (relatively) when the revised algorithm was in use than with the initial calibration. The absolute difference in time of the low saturation babies in the lower target range was smaller, about 5.5%, this includes periods of time when the infant was not receiving oxygen, which is reasonable for this analysis, because a baby with exactly the same true saturation would get oxygen in the high sat group, but not in the low sat target group. If the authors had removed data from babies in room air with saturations above the high target, i.e. those whose lungs were improved and no longer needed O2 whichever group they were in, then the differences in targeting would likely be substantially greater.
The Cochrane review of all of the oxygen saturation targeting trial data showed that the overall trial results for mortality were not heterogeneous (I-squared = 0) and that overall there was a 16% relative increase in mortality with low saturation targeting (20% vs 17%).

When analyzed by the oximeter calibration, the use of a low saturation target with the new, more accurate, algorithm led to a 38% relative increase in mortality (absolute mortality 22% vs 16% with the higher target).

The publication of the NeOProM collaboration confirms this using the individual patient data, with identical results and practically identical confidence intervals. All Masimo oximeters now have the revised algorithms. The only evidence based saturation targets for very preterm infants are 91 to 95%, which should be the default for preterm infants.
Is it possible that other targets would be even better? I know there are some centres that still use targets that are different to 91 to 95%, and I think it is feasible that another target range might be better than 91 to 95%: perhaps even higher targets would further reduce mortality? Perhaps an intermediate target might reduce RoP without increasing mortality? Such thought are not unreasonable but are unsupported by any evidence. I think it is unlikely that the efforts of the NeOProM collaboration will be reproduced to examine other target ranges in the near future. The only way to do so, I think would be to perform cluster-randomized, or individually-randomized, registry based trials, with very large numbers and low-cost data collection.
The experience of these trials should make us more than ever aware of the risks of observational studies, which, before these prospective RCTs, suggested strongly that survival and outcomes would not be worse with lower saturation targets, but that RoP would be less frequent. Only half of that turned out to be true.
This experience should also make us even more reticent about composite outcomes, the combined outcome of “death or disability” was only slightly affected by the different saturation targets. So-called disability was 41% (lower target) vs 40% (higher target). Being substantially more frequent, this component of the primary outcome was more frequent than death, and caused the whole composite outcome to be not ‘statistically significant’ the relative risk was 1.04 (95% CI 0.98 to 1.09) for the entirety of the results. When subdividing the data according to the computer algorithm there does seem to be an effect of the target range on the composite outcome with the revised algorithm.

But again, when we examine the data, there was no impact on ‘disability’ with the revised calibration algorithm, RR 1.05 (95% CI 0.91, 1.22) the impact was solely on mortality. (And don’t get me started on whether a developmental screening test score below an arbitrary cutoff is actually a disability, just search for Bayley on this blog to see my opinion about that!)
There has been some concern about the analysis of the data by unplanned secondary analysis. But an unplanned analysis which is performed because the intervention changed unexpectedly (on this occasion due to the discovery of this anomaly in the calibration) is entirely different to performing a secondary analysis which is suggested by looking at the data, and seeing an interesting finding that you then analyze. The situation faced by the NeOProM investigators is analogous to a secondary analysis of a drug trial which is required because the formulation of the medication is changed during the trial, substantially increasing bio-availability of the active drug. It would be a failure to NOT perform a secondary analysis of the data according to drug formulation.
A study from Ottawa suggests one reason why higher saturations may lead to lower mortality, they analyzed the development of pulmonary hypertension before and after increasing their saturation targets. (Laliberte C, et al. Target oxygen saturation and development of pulmonary hypertension and increased pulmonary vascular resistance in preterm infants. Pediatr Pulmonol. 2019;54(1):73-81). These observational data seem to show that the development of higher pulmonary vascular resistance, and frank pulmonary hypertension are more frequent with target saturations of 88 to 92% (their previous target) compared to 90 to 95%, the current target.
Of course there is a limit, hyperoxia can also cause pulmonary hypertension, as I will discuss in the next post.















