Composite outcomes for research; this is how to do it!

As regular readers of the blog will know, I have been very critical of some very important, otherwise excellent, trials over one vital part of their design, that is, the use of composite outcomes such as “death or BPD”, “death or RoP”, or, the worst of all, “death or NDI” which is a composite of composites. The use of these outcomes is, however, understandable for two reasons. Firstly the outcomes are often competing, that is if you die before 36 weeks you can’t have a research diagnosis of BPD, even if you die of lung injury, so a study investigating a method to reduce lung injury has to take into account the babies who die, otherwise the result might be misleading. Secondly composite outcomes may (although this is not always the case, they may do the opposite) reduce the required sample size.

The problems with this approach are numerous, if the components of the primary outcome change in opposite directions then the result may be null, despite a clinically important difference between the treatments. For example the STOP-BPD trial showed no difference in the composite outcome of death or BPD, despite having fewer deaths at discharge. But, surely surviving with BPD is a preferable outcome to dying. In the most famous example of SUPPORT, the result was again null, no difference in the primary outcome of death or severe retinopathy, but there were more deaths in the low saturation group, and less severe RoP. Again, surely being alive and having laser retinal surgery is to be preferred over being dead.

These composite outcomes imply that the parts of the outcomes are equivalent in importance, which those two examples illustrate, is often not true. However, there are alternatives. Several have been proposed, including the win-ratio that I have discussed several times. A few studies have been published using these techniques, although none as yet in neonatology. One that just caught my eye is this new trial in adults with heart failure and preserved ejection fraction. (Shah SJ, et al. Atrial shunt device for heart failure with preserved and mildly reduced ejection fraction (REDUCE LAP-HF II): a randomised, multicentre, blinded, sham-controlled trial. The Lancet. 2022;399(10330):1130-40).

The intervention was the installation of an inter-atrial shunt by catheterisation to decompress the left atrium, an intervention previously shown to have haemodynamic advantages, and potential clinical benefit. It is basically like creating a permanent secundum ASD of 8 mm diameter. There are a number of potentially competing outcomes of clinical importance for patients with this condition, including death, stroke, progression of heart failure and so on. Just as in neonatology, if you die you can’t get worsening heart failure, so the primary outcome was a hierarchical composite and the primary analysis was a form of the win ratio

The primary efficacy endpoint was a hierarchical composite of cardiovascular death or non-fatal ischaemic stroke up to 12 months post-randomisation; rate of total (first plus recurrent) heart failure events (defined as admissions to hospital or urgent visits to a health-care facility for intravenous diuresis, or intensification of oral diuretics) up to 24 months post-randomisation, analysed when the last randomised patient completed 12 months of follow-up; and change in KCCQ overall summary score between baseline and 12 months.

The KCCQ is the Kansas City Cardiomyopathy Questionnaire, and the score is a continuous variable of clinical status. The statistic that was used to compare the intervention and sham procedure groups was the win ratio, and the p-value was calculated using a method that can integrate dichotomous, recurrent, and continuous outcomes, something called the Finkelstein-Schoenfeld approach.

The authors describe the win ratio calculation in the supplementary materials clearly: “The first patient is compared to every patient, one at a time, and this first patient is assigned a score of 1/0/-1 for each comparison if this first patient has a better (did not experience CV death/ischemic stroke and the comparator patient did), same, or worse (experienced CV death/ischemic stroke and the comparator patient did not) outcome, respectively. For every pairwise comparison where the score is 0, the first patient is assigned a score of 1/0/-1 depending on whether he/she has a better (less HF events than the comparator patient), same (same number of HF events as the comparator patient), or worse outcome (more HF events than the comparator patient), respectively. Finally, for every pairwise comparison where the score is still 0, the first patient is assigned a score of 1/0/-1 depending on whether he/she has a better (change in 12-month KCCQ score at least 5 points larger than the comparator), same (change in 12-month KCCQ score within +/-5 points of comparator) or worse (change in 12-month KCCQ 5 at least 5 points lower than the comparator). This algorithm is then repeated for every patient in the study”.

The results showed a win-ratio was 1.0, which means that overall there was no advantage or disadvantage of the procedure compared to a sham procedure on the components of the primary outcome, when considered in this hierarchical fashion. There were very few deaths or strokes in either group, and in both groups the KCCQ score tended to increase (which means an improvement in symptoms).

One of the interesting things about this trial is that the calculated sample size was a manageable 300, despite outcomes which are somewhat uncommon, and relatively small changes in the continuous score, which suggests that this might be easily applicable for neonatal multi-centre RCTs.

One disappointment I do have for this trial is that there is no mention of whether patients or families were involved in developing the outcomes. It has become essential that there is at least some consultation with those impacted by the conditions we are investigating during development of trial designs, especially when it comes to designing the primary outcomes. In addition to assuring that trials collect data on a group of core outcomes, ensuring that the primary outcomes are what matter most to patients (or, in our case, former patients, and families of neonatal patients), and ensuring that any hierarchy in an analysis of a composite outcome follows what they believe to be most important.

About Keith Barrington

I am a neonatologist and clinical researcher at Sainte Justine University Health Center in Montréal

View all posts by Keith Barrington →

2 Responses to Composite outcomes for research; this is how to do it!

Colin Morley says:

21 March 2022 at 16:40

Thank you Keith. This is very interesting because I have worried about composite outcomes. The analysis of the study you presented is very complicated and difficult for my shrinking brain to understand. I think that all neonatal trials should be designed and analysed so they can be easily read and understood by the readers, in our case clinical neonatologists. I suspect that the majority would not understand this analysis and so the research would;d be lost on them.

- Keith Barrington says:
  
  21 March 2022 at 18:44
  
  Thanks Colin, it is a bit tricky, but I think we could explain it a little better, at least to explain that the hierarchy of outcomes counts death first, and only if the paired comparison is equivalent (both die or both survive) does the analysis then proceed to BPD (or whatever the next step is). I don’t think neonatologists are any less able to understand than adult cardiologists! But anything new would take more explanation, I don’t think many neonatologists currently understand General Estimating Equations either, or Analysis of Covariance.