Doubts about BOOSTING saturations?

Posted on 22 February 2016 by Keith Barrington

I received a very thoughtful comment from Reese Clark, who many of you will know as a leader in neonatology whose many years of experience and important scientific contributions to neonatology make him someone worth listening to.

He has doubts about the reliability of the BOOSTII results, and therefore about the oxygen saturation target ranges that should be used. He notes 2 things, that mortality was getting better during the period that lower saturations were being introduced, and he refers to the meta-analysis by Manja et al. (Manja V, et al. Oxygen saturation target range for extremely preterm infants: A systematic review and meta-analysis. JAMA Pediatrics. 2015;169(4):332-40.)

I will refer to the systematic review first, because I didn’t comment on it when it was first published:

The systematic review by Manja, in fact, showed that death before hospital discharge was significantly increased by targeting low oxygen saturations, and that necrotizing enterocolitis was also increased. They downgraded the quality of evidence using, they stated, the GRADE criteria. But some of their reasons given for downgrading the evidence are bizarre, and not consistent with those guidelines at all.

For each of the outcomes they give these two reasons for downgrading them:

c. The pulse oximeter algorithm was modified midway through the study owing to a calibration correction, and this caused a deviation from SpO2 values.

d. The separation of SpO2 values obtained was not as planned in the study design/protocol. The median SpO2 value in the restricted arm (planned SpO2 of 85%-89%) was higher than 90% in some studies (Figure 1).

c. I don’t see how the change in the calibration would lead to downgrading the evidence, the trials were carried out as designed, and, when the calibration error was discovered, this was noted so that the analyses could take this into account if need be. It also is not entirely true. There was no oximeter calibration change in SUPPORT or in BOOSTII-NZ.

d. This is just not true. The separation of SpO2 values actually obtained was not part of the study protocol. The protocol was to compare the saturation target ranges, not the saturations actually achieved. This is like saying a trial of an anti-hypertension drug is lower quality because the blood pressure was not lowered as much as expected. IF you still see a significant difference in outcomes, despite the intervention being less successful than planned, isn’t that a major red flag?

Two other reasons for downgrading the evidence for the outcome “death before hospital discharge” are given as:

e. This was not a prespecified outcome in the Benefits of Oxygen Saturation Targeting II trial, which was prematurely stopped because of this outcome.

f. Only 4 of the 5 eligible trials reported on the outcome of death before hospital discharge (the Canadian Oxygen Trial group did not).

e. This is evidence of good research practice. If children are dying more in one arm of a trial than another, by a highly statistically significant (more than 3 standard deviations) degree, then to wait another 2 years, allowing continued enrollment, would be a criminally unethical thing to do. I addition there are very few deaths between discharge and two years, so the difference is likely to remain.

f. Why should this lead to downgrading the evidence? It is the quality of the included trials for each outcome that is important, not whether all trials reported the outcome.

At the time the Manja paper was published there were data regarding mortality at 24 months from 3 of the trials (SUPPORT, COT and BOOST-NZ). Mortality was increased by 16%, or in absolute terms, by 27 per 1000 infants, with the lower saturation target. This was not statistically significant (but not far off, 95% confidence intervals from 0.98-1.37), this evidence was downgraded to “moderate” quality for reasons c and d above. The new results from the BOOST-II studies show a relative increase in mortality of 20%, and an absolute risk difference of 35 per 1000 infants (all oximeters combined). Which is remarkably close to the pooled results from the previous studies.

To return to the first issue in the new comment, i.e. the fact that survival was improving during the period that lower saturations were being sporadically and inconsistently introduced. I think this is really questionable as evidence of the impact of lower saturation targets. It may be that survival was improving despite the lowering of saturation targets; in fact I think that a lot of the improved survival was due to changes in obstetrical attitudes and interventions, extremely preterm babies are often delivered in much better condition these days than they used to be. The only way to answer reliably the question of the impact of saturation targeting practices is to perform the kind of large RCTs that we have performed.

I don’t see any other way of interpreting these data than to admit that lower saturation targets lead to higher mortality from a variety of causes, as well as an increase in necrotizing enterocolitis. We might not like it (I don’t like it) but I can’t see any other valid explanation of this weight of evidence from high quality trials enrolling 5000 infants.

Reese Clark has now sent me some more interesting comments which I will put in the next post, and then discuss, probably in a third post.

About Keith Barrington

I am a neonatologist and clinical researcher at Sainte Justine University Health Center in Montréal

View all posts by Keith Barrington →

This entry was posted in Neonatal Research and tagged long term outcomes, mortality, oxygen toxicity, Randomized Controlled Trials. Bookmark the permalink.

3 Responses to Doubts about BOOSTING saturations?

Veena Manja, Satyan Lakshminrusimha says:

22 February 2016 at 13:53

We thank Dr. Barrington for reviewing our paper, we would like to respond to the points made in this blog post to provide a more complete picture and address misunderstandings. We also want to remind the readers that this review was conducted prior to the release of BOOST-II Australia and UK follow-up paper. The new results may alter the conclusions of the systematic review. Response to Dr. Barrington’s comments are shown after his comments.

The systematic review by Manja et al, in fact, showed that death before hospital discharge was significantly increased by targeting low oxygen saturations, and that necrotizing enterocolitis was also increased. They downgraded the quality of evidence using, they stated, the GRADE criteria. But some of their reasons given for downgrading the evidence are bizarre, and not consistent with those guidelines at all.
For each of the outcomes they give these two reasons for downgrading them:
c. The pulse oximeter algorithm was modified midway through the study owing to a calibration correction, and this caused a deviation from SpO2 values.
d. The separation of SpO2 values obtained was not as planned in the study design/protocol. The median SpO2 value in the restricted arm (planned SpO2 of 85%-89%) was higher than 90% in some studies (Figure 1).
c. I don’t see how the change in the calibration would lead to downgrading the evidence, the trials were carried out as designed, and, when the calibration error was discovered, this was noted so that the analyses could take this into account if need be. It also is not entirely true. There was no oximeter calibration change in SUPPORT or in BOOSTII-NZ.

Response: The downgrading was not for the modification of the algorithm but for the error in the original algorithm. The error led to inconsistencies between intended oxygenation target and Sp02 that were actually values obtained resulting in significant overlap between the 2 arms of the study.
In SUPPORT and BOOST II NZ, there was no difference in their pre-specified primary endpoint between the 2 groups.

d. This is just not true. The separation of SpO2 values actually obtained was not part of the study protocol. The protocol was to compare the saturation target ranges, not the saturations actually achieved. This is like saying a trial of an anti-hypertension drug is lower quality because the blood pressure was not lowered as much as expected. IF you still see a significant difference in outcomes, despite the intervention being less successful than planned, isn’t that a major red flag?

Response: This not a valid comparison. A trial of antihypertensives that aims to lowers blood pressure but does not achieve the target BP reduction (presuming that the comparator –placebo – group did not have any blood pressure changes causing an overlap of blood pressures between the 2 groups) is different from a trial where there the comparison is 2 target ranges of 02 saturation with significant overlap between the 2 arms in actual Sp02 obtained. When the degree of overlap is of the extent seen in these trials (graphs included in supplementary material of original publications), what is being compared becomes unclear. If the trial of antihypertensives had significant overlap of blood pressures between the intervention and comparator arm, it would be judged to be of lower quality as far as studying the effects of blood pressure lowering on an outcome are concerned.
If a study is being conducted to compare human milk to formula in preterm infants, and if the infants randomized to human milk are fed formula > 50% of the time, what is being compared becomes unclear. In these studies, the median saturation achieved by the 85-89% arm was >89%, reducing the separation between the groups.
The general assumption that better separation between the two groups would have resulted in higher mortality in the 85-89% target range may not be true. In fact, a post hoc analysis by the COT trial concluded as follows: “Centers with greater separation between the median true SpO2 in the two SpO2 target groups observed lower rather than higher rates of death or disability at 18 mo in the 85% to 89% than in the 91% to 95% group. ” – Ref: [2014] [1400.5] Do the Effects of Targeting Higher vs Lower SpO2 in Extremely Preterm Infants Differ Between Centers With More or Less Separation Between Median SpO2 in the Two Groups? A Subgroup Analysis of the Canadian Oxygen Trial (COT).

Two other reasons for downgrading the evidence for the outcome “death before hospital discharge” are given as:
e. This was not a prespecified outcome in the Benefits of Oxygen Saturation Targeting II trial, which was prematurely stopped because of this outcome.
f. Only 4 of the 5 eligible trials reported on the outcome of death before hospital discharge (the Canadian Oxygen Trial group did not).
e. This is evidence of good research practice. If children are dying more in one arm of a trial than another, by a highly statistically significant (more than 3 standard deviations) degree, then to wait another 2 years, allowing continued enrollment, would be a criminally unethical thing to do. In addition there are very few deaths between discharge and two years, so the difference is likely to remain.

Response: In SUPPORT, the pre-specified outcome in the original protocol was death before 36 weeks PMA (which was not significantly different between the 2 groups), this was changed to death before hospital discharge (which was significantly different) without providing rationale for this change. In the other trials, neither mortality at 36 weeks nor before hospital discharge was a pre-specified outcome.
The downgrading was not for premature termination of the trial but for the fact that the increase in mortality was not expected when the trial was planned and so was not a pre-specified outcome.Polin and Bateman explain in an editorial in the New England Journal of Medicine in 2013 (Polin RA, Bateman D. Oxygen-saturation targets in preterm infants. N Engl J Med. 2013;368(22):2141-2. doi: 10.1056/NEJM e1305534. PubMed PMID: 23642082.) , ‘In all the studies, given the high expected rate of death among premature infants, death was included as an outcome because it competed with ROP as a risk, not because a difference in mortality was expected as a result of differences in oxygenation’. Our confidence in this effect estimate is lower because of the post-hoc nature of this analysis. We are not at all suggesting that an unexpected increase in mortality as seen in BOOST II should have resulted in action any different than the one taken by the study investigators.

f. Why should this lead to downgrading the evidence? It is the quality of the included trials for each outcome that is important, not whether all trials reported the outcome.

This was noted in the outcomes section but was not cited as an argument to downgrade the evidence in the article (The 2 points that led to downgrading were the poor separation of the Sp02 in the 2 groups and the outcome of mortality at hospital discharge not being pre-specified). The outcome of death/disability at 24 months was not available for a significant cohort of the BOOST II study and was cited since incomplete analysis of data may lead to biased assessments.

Response: At the time the Manja paper was published there were data regarding mortality at 24 months from 3 of the trials (SUPPORT, COT and BOOST-NZ). Mortality was increased by 16%, or in absolute terms, by 27 per 1000 infants, with the lower saturation target. This was not statistically significant (but not far off, 95% confidence intervals from 0.98-1.37), this evidence was downgraded to “moderate” quality for reasons c and d above. The new results from the BOOST-II studies show a relative increase in mortality of 20%, and an absolute risk difference of 35 per 1000 infants (all oximeters combined). Which is remarkably close to the pooled results from the previous studies.

We agree that the results from BOOST II add to the available data and should be considered in planning studies involving future oxygen saturation targets, recommendations and policies.

Respectfully submitted,
Veena Manja and Satyan Lakshminrusimha

Reply
- keithbarrington says:
  
  22 February 2016 at 18:05
  
  Thanks for that great comment, and the clarifications. I still don’t agree that the fact that the separation between groups was less than expected is a valid reason to downgrade the evidence. In the example you give, in a high quality study designed to look at breast milk, in whom some of the babies were going to get formula, if the babies got more formula than expected, but you still saw a significant difference in the outcome, then you perform an ITT analysis, and recognize that this is what really happens in the real world. There is still a difference in the outcome, the study was conducted as it was designed, and the study is still high quality, giving you real-life information about the impact of the intervention.
  So the less than expected separation makes this a real-life evaluation of what would happen if babies were treated with a target of 85 to 89 compared to 91 to 95%. Surely that is what we want from clinical research, to know what is likely to happen in real life. Do you not agree?
  
  Reply
Satyan Lakshminrusimha and Veena Manja says:

23 February 2016 at 03:09

Dear Dr. Barrington,

Thank you for your insightful comment. We agree with you that the intention to treat principle applies to this situation if the difficulty to maintain saturation is similar to what is likely to happen in real life.

However, as pointed out by the COT investigators, the masking algorithm played a major role in reducing the separation between the two groups. In the low target group, the displayed saturation increased from 84% to 88% when the true saturation changed from 84 to 85% creating a zone of instability and tendency for the bedside provider to increase FIO2. In the high-target group. Similarly, in the high-target group, the displayed saturation decreased from 96% to 92% when the true saturation changed from 96% to 95% creating a zone of instability and a tendency for the bedside provider to decrease FIO2. (Schmidt et al J Pediatr 2014). The net effect of these two phenomena was reduced separation (2-3% instead of 6%) between the two groups.

In the BOOST-I trial, the masking algorithm was simple (display +/- 2% throughout). The investigators could achieve a median of 93% (IQR 90 to 96%) in the standard saturation group (target saturation range 91 to 94%) and a median of 97% (IQR 94 to 98%) in the high saturation group (target saturation range 95 to 98%). In the BOOST-II UK trial even with the revised algorithm the median saturation was 90% in the low target group (intended target 85-89%) and 93% in the high target group (intended target range 91-95%). With the original algorithm, the median values were 91% with the low target group and 92% with the high group (BOOST-II UK).

Difficulty in maintaining saturation within the 85-89% target may also be due to the inherent nature of the oxygen-hemoglobin dissociation curve (as elegantly pointed out in the BOOST-II UK/Australia follow-up discussion). The higher range includes the plateau of the oxygen-hemoglobin dissociation curve, where oxygen saturation fluctuates less with changing PaO2 and the slope of the SpO2 vs. FIO2/PaO2 curve is flatter. In contrast, the slope of the oxygen-hemoglobin dissociation curve is steep in the 85-89% range resulting in higher fluctuation in saturation with small changes in PaO2 and FIO2. If this turns out to be the major reason for poor separation between the groups, we agree with your point that this is similar to a real-life situation.

The SUPPORT, BOOST-II and COT are well-conducted trials addressing a very important question in Neonatology. The study oximeters (errors in the original algorithm and the effect of masking algorithm on maintaining target range) may have at least in part influenced the results. The planned individual patient data based meta-analysis will hopefully shed more light on this issue.

We sincerely thank Dr. Reese Clark and you for this discussion and thoughtful comments on this topic.

Reply