Which is worse; death or a low Bayley score? Comparing composite outcomes between groups, taking into account clinical priorities.

I keep harping on about this issue as I think we make a mistake in the design of our research studies when we include death and a much less important outcome in composite outcomes. For example in the STOP-BPD trial, the primary outcome “death or BPD at 36 weeks” was “not significantly different” between groups, but death before 36 weeks was somewhat lower in the steroid group than controls. There were more survivors to 36 weeks in the steroid group but exactly the same proportion of survivors with BPD in each group (65% vs 66%), therefore there were numerically more babies with BPD in the steroid group. Surely being alive with BPD is better than being dead! The distribution of moderate and severe BPD also favoured the steroid group, but that is not factored into the primary outcome analysis.

Methods exist for analyzing not only death, but the severity of the BPD, and giving a greater importance to death than to severe BPD, and lesser importance again to moderate BPD. Those methods can also take into account time to an event, so that, for example, a baby who no longer has oxygen at 6 months of age, is considered less severe than a baby who still needs oxygen at 12 months.

Death between 36 weeks and discharge is not better than death before 36 weeks, all deaths before discharge (there are very few after discharge in the first years of life) should be considered an equally adverse outcome, which is worse than surviving with BPD. Of course in adults, many of whom in cardiac research studies are even older than me, delaying death is a good thing to do, indeed it is the main aim of many interventions, preventing death is not an option! In newborn infants delaying death a few days or weeks is not necessarily a good outcome.

The problem of composite outcomes with differing clinical importance is not only a problem in neonatology, in cardiac research the outcomes often include death and hospitalisation for cardiac reasons, for example. In those studies the time to the adverse outcome may included in the analysis, and ways of comparing patients, so that a patient who dies is considered a worse outcome than one who survives but is hospitalised, and one who is hospitalised 2 years after intervention is considered a better outcome than one who is hospitalised after 2 weeks; also the time to a patient dying is clearly important.

As a result, ways of analysing composite outcomes that take into account the clinical importance of the outcome have been developed, in particular the Win Ratio. Patients are compared in pairs and if one has a longer survival than the other then the treatment they received is considered the winner, if the next pair has one patient dying and the other being hospitalised, then instead of a traditional analysis in which “death or hospitalisation” is the primary, and both patients are considered to have equivalently bad outcomes, in win ratio analysis the patient who survived despite being hospitalised is considered the winner, and their treament is given an extra point and so on.

When subjects are paired, by design, then the analysis is actually quite straightforward, if patients are not paired then you can compare every result from the treatment group to every result from the control group, and the analysis gets much more complicated, especially calculating the confidence intervals of the win ratio which seems to require heavy duty bootstrapping. One can also take into account stratification, and only compare within stratified groups.

I thought I would try this out on the recent data from the Inositol trial. The parts of the composite outcome at 2 years are dichotomous, death, yes or no, “NDI”, yes or no, so I did not need the primary data set to do this. In the Inositol group there were 60 deaths, who therefore were losers for all their comparisons to the control babies, except for comparisons with the 39 control babies who died, where there was no preference. Similarly the Inositol babies with “NDI” were winners when compared with the control deaths, but losers when compared to control baby survivors without “NDI”.

As there were 287 Inositol babies with known death or NDI outcomes, and 289 controls, there were 82,943 possible comparisons (which I evaluated one by one of course!!) The controls won 31,901 comparisons, Inositol won 29,171 comparisons and 21,871 comparisons were null. The win ratio therefore was 1.09. (to be honest Excel is very good at copying large ranges and if you know how to use relative and absolute cell addresses it is a bit laborious but not too difficult).

The calculation of the p-value is then not too complex, and using the formulae in the supplementary data of this article I calculated that z=2.45, which gives a p-value <0.01. (I hope I have calculated correctly), I tried to use an SAS program that was supplied to me, but never having used SAS before I have not yet been able to get it working, even though I downloaded SAS and followed some initial tutorials trying to learn it a bit (the things I do for my readers). The same source kindly sent me a program for calculating the 95% CI of the win ratio, but again I could not get it to work, so I used an approximate method from one of Pocock’s publications, which gave 1.02 and 1.17.

If you use the standard methods, as used in the publication, which give equal weight to death and “NDI” (which is mostly low Bayley scores), there is no significant difference between Inositol and control in the combined outcome. If you use the Win Ratio method, that takes account of the fact that being dead is worse than a low Bayley score, you find that the odds of any pair of patients having a better outcome if they were the one that got placebo was 1.09, with a 95% CI 1.02 to 1.17 (p<0.01).

That means you are significantly more likely to be a winner if you get placebo than if you got Inositol; using the same data that say there is no significant difference in “death or NDI”.

It has been said that, for the example of the SUPPORT trial, “does it really matter that the primary outcome variable was not statistically different? Everyone can read that the lower saturation group had more mortality”. My response is that, yes that is true for SUPPORT (even though, in fact, unadjusted comparisons of death using a chi-square, are not “significantly” different between groups), but in fact it is not true for the Inositol trial, that trial was suspended because of a manufacturing issue, and the unexpectedly higher mortality in the intervention group was not significantly greater at the time of evaluation of the primary outcome (50 in the inositol group and 33 in controls, chi-square with Yates’ correction=2.86, p=0.09). The trial also had more RoP in the treated group than the controls which made the combined outcome better in controls.

The great advantage of the win ratio method is that it can be used, as mentioned in the first paragraph, to give more importance to death than to severe BPD, and also more importance to severe BPD than to moderate BPD, etc. We could, preferably, include in a composite, outcomes that are more clinically important, such as numbers of rehospitalisations in the first 2 years, or duration of home oxygen therapy.

Sometimes parts of a composite may be difficult to prioritize, such as non-surgical NEC and severe BPD. Which is worse? They both have adverse long term impacts as well as short term morbidity, I guess that if we asked a group of parents the answers to that particular comparison would be mixed, so they could be given equal weight in the analysis

Trials are now being designed using methods such as this, and methods for sample size calculation have been published. (Redfors B, et al. The win ratio approach for composite endpoints: practical guidance based on previous experience. Eur Heart J. 2020;41(46):4391-9).

Other methods which can be used for composite outcomes which incorporate the clinical importance of the parts of the outcome are also available (Capodanno D, et al. Computing Methods for Composite Clinical Endpoints in Unprotected Left Main Coronary Artery Revascularization: A Post Hoc Analysis of the DELTA Registry. JACC Cardiovasc Interv. 2016;9(22):2280-8) such as the Weighted Composite Outcome method.

The time has surely come to design trials, especially those in which mortality is a potential outcome (the majority of neonatal ICU trials), using methods that take account of the clinical importance of the various outcomes that we are measuring. And to cease the methods that imply that death and BPD, or death and retinopathy, or death and low Bayley scores, are equivalent.

About Keith Barrington

I am a neonatologist and clinical researcher at Sainte Justine University Health Center in Montréal
This entry was posted in Neonatal Research and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.