Anti-VEGF vs laser therapy for retinopathy, not worse, but not not worse?

A newer anti-VEGF drug has been invented, and evaluated in retinopathy therapy. This new drug aflibercept works differently to the “-mab” drugs we have been using. Those others are monoclonal antibodies (hence mab) directed against VEGF, whereas this new stuff is some sort of protein that mops up VEGF (intercepts it, I guess, to get its generic name). It is used for therapy of colon cancer (and approved for this indication a few years ago) and for wet macula degeneration, much like bevacizumab. I don’t know what the price is for the tiny doses used for newborn retinal disease, but I guess it won’t be cheap!

The newly published article (Stahl A, et al. Effect of Intravitreal Aflibercept vs Laser Photocoagulation on Treatment Success of Retinopathy of Prematurity: The FIREFLEYE Randomized Clinical Trial. JAMA. 2022;328(4):348-59) is a randomized trial with a 2:1 randomization ratio, which enrolled 118 babies with retinopathy needing intervention (zone I stage 1+, zone I stage 2+, zone I stage 3, zone I stage 3+, zone II stage 2+, zone II stage 3+, or aggressive posterior RoP). The study was designed as a non-inferiority trial, with the primary outcome being treatment failure. If either eye had active RoP at 24 weeks post treatment, or either eye had retinal detachment (or another structural adverse outcome) then the treatment was determined to be a failure. Of course, if only one eye was treated, then only one was evaluated for success/failure.

You can see that there was little difference between the 2 groups, the aflibercept was slightly better in terms of treatment failure, or at least it was not worse. But according to the twisted syntax of the non-inferiority trial, it wasn’t not-worse!

The explanation of that is that they planned the trial such that, if the lower limit of the 95% confidence interval for the difference between groups was above a -5% difference, they would conclude that aflibercept was not inferior. But the difference between groups was 3.4%, with 95% confidence intervals -8% to infinity. Which is bizarre. A 95% CI up to infinity means that there is a 5% chance that aflibercept is more than infinitely better than laser!

The control group was quite small in this trial, and the failure of laser therapy was lower than their sample size calculations expected. Of the 43 randomized to laser, only 38 actually had laser, and for some reason the primary analysis is not by Intention to Treat, but only includes the 38 actually receiving laser. I can’t see anything in the statistical analysis plan to justify not including the 5 who did not receive laser treatment, but, as far as I can see, they at least weren’t analysed as intravitreal injection subjects (I don’t know what treatment they actually received, if any). The largest numbers of patients were from Russia (18) and Japan (17) with smaller numbers from Turkey, Bulgaria and Romania, and then very small numbers from each of a variety of other countries around the world.

One difference between this trial and the other “-mab” trials such as RAINBOW, is that the laser therapy had fewer failures, they thought that around 72% of laser therapy would be successful, whereas they actually had 82% success. The study was, as a result, underpowered. Definitions of failure were not very different between groups, but in RAINBOW there were many more babies in the laser group (18/74) who failed and then received rescue rabinizumab.

The authors try to explain the difference in failures, but I don’t think it needs any, in such a small group of controls 72% and 82% are almost identical proportions, they had approximately the proportion of treatment failures that would be expected.

What that means is, that although aflibercept was not inferior to laser in the primary outcome expressed as a simple percentage, we cannot say with more than 95% confidence that it isn’t worse than laser, the results are compatible with the possibility of aflibercept having a greater failure rate than laser.

If you want to understand a bit more about non-inferiority trials, you could do worse than watch the NEJM youtube video (not words I thought I would ever write) . Or better still, read this article from the NEJM from a few years ago, which includes the following figure for interpretation of different possible results.

You can see the figure doesn’t include an example which matches this trial (called FIREFLEYE), it does have an example which it calls “noninferiority and inferiority” which is a confusing phrase, “the test treatment is worse but at the same time not worse”, which could be restated, ‘the test treatment was worse, but the confidence intervals did not exceed our prespecified margin, so it might be an acceptable alternative”.

The type of findings that this study had, could be represented by the open circle in the imaginary results below, which I guess could be labelled “superiority but not noninferiority”, (smiley ironic emoji), or you could say “the results were numerically better, but not better enough to be sure they are better and, in fact, they are statistically compatible with a chance that they are worse”, or maybe we should just say “inconclusive”.

The study was underpowered to be able to be confident about the relative efficacy of aflibercept. It looks to be probably about as effective as laser, when compared to a group of babies in whom laser worked fairly well. Long term ocular and visual outcomes will, I guess, likely be much better than laser, if they are similar to the longer term ocular and visual outcomes of rabinizumab or bevacizumab, but of course that remains to be proven with longer follow up. Long term outcomes other than ocular/visual outcomes also need to be studied, as we will see in the next post.

About Keith Barrington

I am a neonatologist and clinical researcher at Sainte Justine University Health Center in Montréal

View all posts by Keith Barrington →