Neonatal Research

Measure gastric residuals? Safe to stop?

Posted on 30 April 2019 by Keith Barrington

A new RCT published in JAMA pediatrics compared growth and other clinical outcomes between infants <33 weeks gestation and <1250g who were managed with routine gastric residual measurements or without. (Parker LA, et al. Effect of Gastric Residual Evaluation on Enteral Intake in Extremely Preterm Infants: A Randomized Clinical Trial. JAMA Pediatr. 2019) I was quite interested to read this when I saw the title, unfortunately the way it was analyzed, and the way it is presented make it nearly impossible to interpret. In addition there is at least one major error in the data presented.

The first problem is that although the “standard care” group has residuals measured, there is no indication of how they were interpreted. In the protocol which is provided as a supplemental file, the only mention of the feeding standards is as follows :

In addition, the nurse assesses the infant for any signs or symptoms of feeding intolerance or NEC (i.e., abdominal distension and/or tenderness, increased abdominal girth, visible bowel loops, presence of emesis, and visible blood in the stool). It is standard protocol to aspirate RGC prior to each feeding. However, for this study, this will only occur in infants randomized to Group 1.

What was done with any of this information is not described. Was the volume considered important? The colour? Presumably they didn’t aspirate prior to each feed in order to ignore the findings.

The next big problem is the primary outcome: “weekly enteral nutrition measured in mL/kg for 6 weeks after birth”, I am not sure what that means. Did they add all the intake over 6 weeks and compare between groups? Did they compare after each week, and so do 6 comparisons? Apparently, from the protocol, the plan was to do a t-test, designed for groups with unequal variance (“Welch adjusted” they call it). But the analysis which is presented is a Generalized Linear Mixed Model, which is a term that doesn’t tell me anything, but it seems to have been some sort of repeated measures test, which therefore should account for the multiple comparisons.

So what did they find? What were the primary outcome data for the two groups? I don’t know. Nowhere in the manuscript are the primary outcome results given. They do give a p-value however! In table 2 the first group of numbers are for weekly feedings in mL/kg/d and the p-value for Treatment is 0.048, but the actual numbers are written as NA. The next group of numbers are for the “simple main effect” and give some numbers which are not consistent with anything else they have written, i.e. for week 6 the numbers are “128.4 (119.9 to 136.9)” and “141.6 (133.2 to 150.0)”, according to the methods this should be the weekly feeding volume which seems quite unlikely. I presume this is either the daily volume on the last day of week 6, or the averaged daily volume over the 6th week. And I have to guess that the figures in parentheses are mean plus or minus 1 standard deviation, but that is never specified.

As far as I can tell then, by week 6 the babies were receiving inadequate feeds if they didn’t measure gastric residuals, and even more inadequate feeds if they did! To only achieve 140 mL/kg/d after 6 weeks of feeds in a group of babies with a mean of about 27 weeks and 900 grams seems to be well below what we should be achieving. As a result the growth outcomes are very poor, a 27 week baby weighing 900 grams, should by 6 weeks of age be weighing about 1400g, but, from one of the few results that are presented as interpretable data, both groups weighed just over 1100g (which I think are means adjusted for covariates)

Many of the results are presented as “least square means” which is an SAS (that is a particular stats software package) jargon for means, adjusted for covariates. Which again makes them difficult to interpret. Some of them are presented as the “mean estimated log weights” in the abstract, and sometimes in the abstract they are completely unexplained: “the no residual group were discharged 8 days earlier (4.21 [95% CI, 4.14-4.28] vs 4.28 [95% CI, 4.19-4.36]; P = .01)” 4.21 what? (I could have written WTF? but I am too polite).

It is not really surprising that not measuring aspirates would accelerate feed progression, even though here the weekly increase is from a desperately slow 18 mL/kg/d to an extremely slow 21 mL/kg/d. The big question is, is it safe?

Here again there are problems, in the abstract and in the text it is stated that the Odds for developing NEC in the intervention vs control group are 0.58 [95% CI, 0.18-0.19] vs 0.026 [95% CI, 0.006-0.109]). Which would be a 22-fold increase in the Odds of NEC, or an Odds Ratio of 22. But of course an Odds of NEC in the intervention group of 0.58 would mean that there were 25 cases of NEC and 44 without NEC, so that isn’t likely either, especially as the odds doesn’t lie between its 95% confidence intervals, which is impossible.

There is some potential clarification from the body of the article, in table 5 it is noted that the “odds” of NEC was 0.058 (0.018, 0.19) and in the results at the end of the section describing the subjects it is noted that 4 patients in the intervention group were withdrawn for NEC. Four out of the 69 intervention patients makes an incidence, a rate, or a frequency of 5.8% or 0.058. But it does not make an Odds of 0.58, the Odds of NEC is 4/65 (NEC/no NEC) which is 0.061. It looks like there were probably 2 cases of NEC among the 74 standard care group, for an incidence of 2.7%, and an odds of 0.0278.

After slogging my way through all these results it appeared that there were about twice as many cases of NEC in the intervention group as in the controls. I thought I was getting this all clear when I looked at the flow chart, the CONSORT figure, which states that there were 7 cases of NEC in the intervention group, and 4 cases in the controls. Which completely messes up all my attempts to understand this article. If there were 7 cases of NEC, then the incidence of NEC among the intervention babies is actually 10.1%, and the odds is 0.012, compared to 4 controls. with a frequency of 5.4% and an odds of 0.057.

In the discussion the authors state “we found no differences in incidence of NEC” which is clearly untrue, the incidence of NEC was quite different between groups. A true statement would have been “the difference in incidence of NEC that we found has very wide compatability limits, which include a possibility of a large reduction or a major increase in NEC”.

I think this paper is a complete failure of the review and editorial process of JAMA pediatrics (and of galley editing), how this could have been published in this form I don’t understand. It could have been a nice little RCT adding a bit more data to the question of measuring residuals, and should most clearly have stated that there was inadequate power to determine safety, and that the confidence intervals for the incidence of NEC are extremely wide. (If we assume that there were 4 cases of NEC in the intervention/no residuals group, and 2 in the controls, then the relative risk of NEC is 2.15 with 95% compatibility limits of 0.4 and 11. If there were 7 cases vs 4 cases, the RR is 1.99, 95% CL 0.6-6.5). We should also note that there were 6 deaths in the standard/measured residual group, and only 1 in the intervention/no residual group; which gives an RR of 0.19, 95% CL 0.02 to 1.5).

As it is we still are not clearly any the wiser, after a trial where it is not clear what was done or what was found.

I don’t take note of residual volumes, I have worked at one place which had not measured them for 15 years, and in 2 other places we stopped routinely measuring residuals completely while I was there. All that observational data suggests no benefit, and potential nutritional harms from measuring gastric residuals, but some stronger data, to convince other units to stop the practice if it is indeed safe, would have been helpful to improve nutritional outcomes of our very preterm babies.

Posted in Neonatal Research | Tagged Necrotising Enterocolitis, nutrition, Randomized Controlled Trials, statistics | 1 Comment

Do transfusions trigger NEC? or does anemia?

Posted on 16 April 2019 by Keith Barrington

I am still unconvinced that transfusion associated NEC is a real thing, I think it is possibly a real phenomenon, but I am not sure how to know for sure.

Some of the best evidence I think comes from the PINT trial, a randomized trial of transfusion thresholds. The preterm infants in the high threshold group received many more transfusions, but did not have more NEC, in fact they had less NEC 5.3% vs 8.5% (RR with restrictive transfusion 1.62 (95% CI 0.8, 3.26). The other 2 RCTs included in the Cochrane systematic review that reported NEC were much smaller and did not contribute much to the meta-analysis which thus gives the same overall RR of 1.62).

In the RCTs of later use of erythropoietin, babies in control groups had many more transfusions, but the Cochrane systematic review does not show a major difference in NEC, RR 0.88 with epo, (95% CI 0.45, 1.7).

In contrast, the Cochrane systematic review of early use of erythropoietin does show less NEC with epo, and, of course, the epo babies also had fewer transfusions.

An observational study from 2016 might explain some of the confusion, They suggest that severe anemia might be associated with NEC, rather than red cell transfusion. (Patel RM, et al. Association of Red Blood Cell Transfusion, Anemia, and Necrotizing Enterocolitis in Very Low-Birth-Weight Infants. JAMA. 2016;315(9):889-97). They used the data from a prospective cohort study of transfusion related CMV in preterm infants; because of variations in practice, as indications for transfusions were not standard, they could attempt to analyze the separate impacts of transfusion and anemia, with a hemoglobin less than 80g/100mL. They included 600 VLBW infants, who had 42 episodes of at least stage 2 NEC. About half of the babies were transfused, and they were smaller, less mature and sicker than non-transfused infants, and 18% had at least one hemoglobin under 80.

The rate of NEC was increased in VLBW infants who received RBC transfusions compared with infants who did not (cause-specific HR, 2.33 [95% CI, 1.18-4.60]; P = .01)….

In multivariable analysis, including adjustment for birth weight, duration of breastfeeding, illness severity, severity of anemia, duration of antibiotic treatment, and center, any RBC transfusion in a given week was not independently associated with an increased rate of NEC (cause-specific HR, 0.44 [95% CI, 0.17-1.12]; P = .09) or mortality (cause-specific HR, 1.36 [95% CI, 0.27-6.82]; P = .71)…. In a given week, VLBW infants with severe anemia had a higher estimated rate of NEC compared with VLBW infants without severe anemia (adjusted cause-specific HR, 5.99 [95% CI, 2.00-18.0]; P = .001).

Of course because transfusion is used to treat anemia, and babies with more severe anemia are more likely to be transfused, these are things that are difficult to separate, but these data do at least suggest that it is severe anemia, rather than transfusion which increases NEC.

I think this all might add together, with early epo there is less severe anemia, and thus, if the association is actually causative, there should be somewhat less NEC; in the PINT trial the high transfusion threshold group were unlikely to develop severe anemia, and so were less likely to develop NEC. In normal clinical practice we are more likely to transfuse the most anemic babies, and thus there is an apparent association between transfusion and NEC. Confirmation of this from another database, and analysis of the TOP trial when completed (I think enrolment has finished and outcome assessment should finish this year) will be important to answer these questions.

The study that I am blogging about is fairly old news, for my blog, from 2016, but I was reminded of it as we have been working on developing standardized transfusion criteria, and by a couple of recent publications:

Does severe anemia really affect the gut? A prospective study from Turkey measured fatty acid binding proteins in very anemic babies before and after transfusion (Ozcan B, et al. Severe Anemia Is Associated with Intestinal Injury in Preterm Neonates. American journal of perinatology. 2019). Intestinal FABP and liver FABP are apparently good markers of intestinal injury, and previously liver FABP has been shown to increase with NEC of all grades, and I-FABP only with very severe NEC. In this new study I-FABP was only slightly higher among anaemic babies than among controls, but liver-FABP was appreciably higher, and remained high 48 hours after transfusion.

Another relevant recent publication is from a mouse model (Arthur CM, et al. Anemia induces gut inflammation and injury in an animal model of preterm infants. Transfusion. 2019;59(4):1233-45). In this study they correlated cytokine concentrations in preterm infants with their hemoglobin levels, more anemic samples had higher Interferon alpha levels. They then performed a mouse study gradually bleeding the mice to anemia (PIA ia phlebotomy-induced anemia) and performing a number of fascinating analyses of their intestines.

Gradual induction of PIA in a pre‐clinical model resulted in significant hypoxia throughout the intestinal mucosa, including areas where intestinal macrophages reside. PIA‐induced hypoxia significantly increased macrophage pro‐inflammatory cytokine levels, while reducing tight junction protein ZO‐1 expression and increasing intestinal barrier permeability.

Preventing severe anemia with a combined approach of delayed cord clamping and erythropoietin should lead to less NEC if these findings are real; a systematic review of delayed cord clamping did show a bit less NEC RR=0.88 [95% CI 0.65–1.18], although they marked the quality of this evidence as low. (Fogarty M, et al. Delayed Versus Early Umbilical Cord Clamping for Preterm Infants: A Systematic Review and Meta-Analysis. Am J Obstet Gynecol. 2017), I am not sure how effective delayed clamping is in preventing late severe anemia, I don’t think that has been reported often in the studies, but early hemoglobin is, of course, higher. But a few weeks later, after being in the ICU for a while, with multiple blood sampling and intercurrent illnesses, the effects of delayed clamping on late severe anemia might well be dissipated. On-going trials of erythropoietin for brain-protection in preterm infants may also be able to answer questions about anemia and NEC, depending on doses and duration.

Thinking about it, I am not sure why many of us went of the routine use of erythropoietin, I guess we were all focused on trying to reduce donor exposure, which is generally unaffected with current transfusion practices. I think avoiding blood transfusions and reducing severe anemia are probably valuable goals in themselves. Maybe we should rethink erythropoietin/darbepoietin routine use.

Posted in Neonatal Research | Tagged anemia, Necrotising Enterocolitis, transfusion | 2 Comments

Platelet transfusions don’t close the PDA, but they may increase IVH

Posted on 13 April 2019 by Keith Barrington

I would never have actually thought to ask the question whether platelet transfusion might close the PDA, although early thrombocytopenia is associated with persistent PDA, and platelet plugs seem to be part of the mechanism of closure. A group in India have just published an RCT in preterm infants with a PDA (hemodynamically significant, whatever that means) who had a platelet count under 100,000. Kumar J, et al. Platelet Transfusion for PDA Closure in Preterm Infants: A Randomized Controlled Trial. Pediatrics. 2019. Gestational age averaged 30 weeks, and they were enrolled at a mean of 3 days of age. Median time to PDA closure was identical in the group randomized to receive transfusion (10, 15 or 20 mL/kg depending on the count) and the control group, at 72 hours in each group, data based on repeated echo every 24 hours until closed. All babies received ibuprofen or acetaminophen also. 44 babies were enrolled, and of the 22 in the transfusion group there were 9 new IVH (4 severe, grade 3 or 4) after enrolment, compared to 2 new IVH among the controls, (both severe).

In the much older study by Maureen Andrew and colleagues, (Andrew M, et al. A randomized, controlled trial of platelet transfusions in thrombocytopenic premature infants. The Journal of pediatrics. 1993;123(2):285-91). Preterm infants with a platelet count less than 150,000 were randomized to be transfused or not. 12/78 transfused babies developed a serious grade 3 or 4 IVH, and 9/79 controls. The 33% increase in IVH was “not statistically significant” they said, but as you all know that doesn’t mean that it isn’t real!

In the recent PLANET2 trial there were more serious bleeding episodes in the transfused babies than in the controls, and apparently most of them were IVH, I don’t have access to those numbers, but whatever they are, the effect appears to be in the same direction.

I would like to see a meta-analysis, which would have some limitations given the 3 different thresholds in those 3 trials (which are as far as I know the only RCTs of platelet transfusion at different thresholds), but if the PLANET2 data are indeed consistent, and with a much greater power than the 2 other small trials, that would be very powerful data. It would confirm that not only are platelet transfusions in general ineffective in preventing bleeding at these 3 threshold levels, but they likely increase the risk of IVH.

Why would that be the case? It may be that transfusing adult platelets to babies with newborn plasma, which is already hypercoagulable, causes the effect, either by capillary damage, or by causing infarctions which then become hemorrhagic, or some other mechanism. It could just be the effect of volume expansion, which can certainly cause lesions in newborn beagle puppies (see Laura Ment’s studies from the 80’s and 90’s), and many observational studies that have correlated volume expansion with IVH. Platelets are often given somewhat faster than red cell transfusions, (it does not appear to have been specified inPLANET2, the dose was 15 mL/kg, but the duration isn’t mentioned in the protocol) often over 1 hour. Volume expansion is also probably more effective than with saline, much of which rapidly leaks out of the circulation. I think either some impact on overall coagulation/anticoagulation balance or hemodynamic changes, or both, may be responsible for the apparent increase in IVH.

Posted in Neonatal Research | Tagged IVH, Platelets, Randomized Controlled Trials, transfusion | 2 Comments

Sail Away, Sail Away…

Posted on 8 April 2019 by Keith Barrington

You could probably guess that a post about the SAIL trial (Kirpalani H, et al. Effect of Sustained Inflations vs Intermittent Positive Pressure Ventilation on Bronchopulmonary Dysplasia or Death Among Extremely Preterm Infants: The SAIL Randomized Clinical Trial. JAMA. 2019;321(12):1165-75.) would have to be accompanied by this, as it was when I reported on the presentation at last years PAS :

This is the multicenter randomized controlled trial of sustained inflations at the onset of resuscitation for very preterm infants less than 27 and at least 23 weeks gestation. Enrolled babies received face mask CPAP for up to 30 seconds, and if they needed PPV (i.e. apneic or gasping or heart rate <100) then they were randomized to sustained inflation or standard NRP. Sustained inflation babies started with a 15 second inflation at 20 cmH2O, they were then evaluated on CPAP and, if apneic or gasping or heart rate < 100, they switched to standard NRP, if those things didn’t apply they received a second sustained inflation to 25 cmH2O for 15 seconds. All of which was rather arbitrary, in terms of indications, pressures, and durations, but there wasn’t any reliable data to make more evidence based choices (and still isn’t).

The primary outcome of the study was the infamous “death or BPD”, which I have criticised here frequently enough, I think, but just to be really annoying; being dead and having oxygen at 36 weeks PMA are not equivalent, and a composite outcome which combines them risks the real potential that they could change in opposite directions, and show no effect, or that mortality changes will be overwhelmed by the much more frequent occurrence of BPD. Mortality as one outcome and BPD among survivors, as another outcome makes much more sense. Even better would be a measure of lung injury which reflects respiratory outcomes of importance to families.

As many of you will know by now, the study was stopped by the DSMC after enrolment of 460 patients because of an excess of early deaths (under 48 hours of age) in the sustained inflation group, many of which were considered to be possibly associated with the intervention. As well as stopping the trial the DSMC mandated a Bayesian analysis, which revealed that it was highly unlikely that sustained inflation would be shown to be preferable if the study had continued, and that either a null result, or an advantage of standard care were far more likely results.

This is an important trial with an important message: if you want to do sustained inflation, don’t do it like this. If you want to do sustained inflation using a substantially different approach, you had better do a high quality study with careful surveillance for adverse effects, and don’t do it outside of an RCT.

Failing that, I think that sustained inflation as routine initiation of resuscitation of the preterm infant should be laid to rest.

The authors have done what other trials have also done recently, which is to report BPD at 36 weeks, or death at 36 weeks as being the components of the primary outcome, I still don’t understand this, as it means that death after 36 weeks without BPD is considered a good outcome! Why not survival to discharge as part of the composite? The authors collected survival to discharge (it is secondary outcome number 22), but I cannot see the result in the article or appendix.

My recent discussions about significance and how to refer to results are well illustrated by the following sentence from the discussion.

An unexpected excess mortality rate with sustained inflation in the first 48 hours of life led to early trial closure, although mortality at 36 weeks’ postmenstrual age was not different.

Well, pardon me, but as far as I am concerned 20.9% IS different to 15.6%, they are clearly different numbers! Because the difference between 2 numbers is not “statistically significant” does not make them the same. As you can see from the survival curves below, they are a bit closer together at 12 weeks than they are at 7 days, but they remain different. It would be accurate to say, ‘the p value for the difference in death at 36 weeks is 0.17 with a relative risk of 1.3’, and to note that ‘relative differences in mortality at 36 weeks which are compatible with the data, range from a 10% decrease with sustained inflation, to a 90% increase ‘; but not just to say they are “not different”.

Other secondary outcomes vary between those which are practically identical between groups, such as severe IVH (9.8% vs 10.4%), and those which are very different, e.g. pneumothorax (5.1% with SI vs 9% standard NRP). None of them were “statistically significant”.

Almost simultaneously appeared in print the following Tingay DG, et al. Gradual Aeration at Birth is More Lung Protective than a Sustained Inflation in Preterm Lambs. Am J Respir Crit Care Med. 2019 a very interesting study in preterm lambs examining a sustained inflation strategy, where they used 35 cmH2O and maintained it until there was no more volume entering the lungs, and then for another 10 seconds. This was compared to ventilation with PEEP, and a 3rd strategy of ventilation with PEEP, and added progressive increases in PEEP until compliance was maximized, at which time, PEEP was progressively decreased. The sustained inflation group had very uneven lung aeration, and increased signs of lung injury. This confirms I think that we could still have some benefit from finding novel ways of ensuring early adequate uniform lung inflation, but simple sustained inflation is not the answer, at least in the immature lung.

Posted in Neonatal Research | Leave a comment

To p or not to p, what is the alternative?

Posted on 3 April 2019 by Keith Barrington

I started writing the previous post several weeks ago, and, of course, the ideas are not original with me, in fact, a whole recent issue of “The American Statistician” is dedicated to not just trying to eliminate talk of statistical “significance”, but to provide alternatives.

One of the problems is illustrated by this figure from an editorial in “Nature” which discusses that journal issue: (Amrhein V, et al. Scientists rise up against statistical significance. Nature. 2019;567(7748):305-7) The figure showing real life data from 2 studies:

For example, consider a series of analyses of unintended effects of anti-inflammatory drugs. Because their results were statistically non-significant, one set of researchers concluded that exposure to the drugs was “not associated” with new-onset atrial fibrillation…. and that the results stood in contrast to those from an earlier study with a statistically significant outcome.

Now, let’s look at the actual data. The researchers describing their statistically non-significant results found a risk ratio of 1.2 (that is, a 20% greater risk in exposed patients relative to unexposed ones). They also found a 95% confidence interval that spanned everything from a trifling risk decrease of 3% to a considerable risk increase of 48% (P = 0.091; our calculation). The researchers from the earlier, statistically significant, study found the exact same risk ratio of 1.2. That study was simply more precise, with an interval spanning from 9% to 33% greater risk (P = 0.0003; our calculation).

It is ludicrous to conclude that the statistically non-significant results showed “no association”, when the interval estimate included serious risk increases; it is equally absurd to claim these results were in contrast with the earlier results showing an identical observed effect.

Similar things happen all the time in our field, where results with wide confidence intervals which cross a relative risk of 1 are reported as showing “no effect” or “no statistically significant effect”.

Here is a real neonatal example, the classic interpretation of the Davidson study would be that inhaled NO does not prevent ECMO in term babies with hypoxic respiratory failure, as the 95% confidence intervals for their RR of 0.64 include 1.0. The classic interpretation of the other two studies is that inhaled NO does prevent ECMO, but one, NINOS, had a relative risk that was actually less extreme than Davidson, at 0.71, but the confidence intervals don’t include 1. In reality all 3 studies show about the same effect, two being more precise than the third. In some (most) journals you would have to state the results in that way, and would not be allowed, when reporting the Davidson trial, to note the fact that ECMO was less frequent after iNO (although clearly it was), because it is not “statistically significant”.

I think we have to be ready to embrace uncertainty, to realize that dichotomizing our research into reports of things that work and things that don’t work, is unhelpful and may retard clinical advances.

The whole issue of ‘The American Statistician” is devoted to “moving to a world beyond p<0.05” and the opening editorial is well worth the read (Wasserstein RL, et al. Moving to a World Beyond “p < 0.05”. The American Statistician. 2019;73(sup1):1-19). One of the major themes is to stop saying “statistically significant” as a term, as the distinction between the statistical and the ordinary world meaning of “significant” is now hopelessly lost.

no p-value can reveal the plausibility, presence, truth, or importance of an association or effect. Therefore, a label of statistical significance does not mean or imply that an association or effect is highly probable, real, true, or important. Nor does a label of statistical nonsignificance lead to the association or effect being improbable, absent, false, or unimportant. Yet the dichotomization into “significant” and “not significant” is taken as an imprimatur of authority on these characteristics. In a world without bright lines, on the other hand, it becomes untenable to assert dramatic differences in interpretation from inconsequential differences in estimates. As Gelman and Stern famously observed, the difference between “significant” and “not significant” is not itself statistically significant.

So what should we do? There are useful suggestions at the end of that editorial, and the authors of each paper were asked to come up with positive suggestions, rather than just a list of “don’t”s.

Overall the suggestions are given the mnemonic “ATOM” Accept uncertainty, be Thoughtful, Open and Modest.

One specific suggestion is that we might continue to report P-values, but as exact continuous values, (p = 0.08, or 0.46) without any threshold implications by the use of < or > notation. I think that could be useful as a way to eliminate the tyranny of p<0.05. It could reduce the risk of “p-hacking”, which is the tweaking of analysis, or even of data, in the search for a p-value which is just under 0.05. They further suggest that such exact p-values should be accompanied by other ways to present the results, such as s-values, Second generation p-values (SGPV), or the false positive risk, all of which they explain, and all of which themselves carry difficulties or unknowns.

Another suggestion is to refer to what are now called confidence intervals as “compatibility intervals”, the idea being that you would state that your result is most compatible with a range of effect sizes between Y and Z, rather than concluding that if the 95% confidence interval includes 1 the difference is not real, but, if it just excludes 1, then there is a real difference between the results. (That would be no better than relying on p<0.05).

The nexus of openness and modesty is to report everything while at the same time not concluding anything from a single study with unwarranted certainty. Because of the strong desire to inform and be informed, there is a relentless demand to state results with certainty. Again, accept uncertainty and embrace variation in associations and effects, because they are always there, like it or not. Understand that expressions of uncertainty are themselves uncertain. Accept that one study is rarely definitive, so encourage, sponsor, conduct, and publish replication studies. Then, use meta-analysis, evidence reviews, and Bayesian methods to synthesize evidence across studies.

I would recommend anyone involved in designing and analysing research to read the editorial and the article which immediately follows it (Ioannidis JPA. What Have We (Not) Learnt from Millions of Scientific Papers with P Values? The American Statistician. 2019;73(sup1):20-5) which is a review of many studies that John Ioannidis has published which show the insidious impacts of the term “statistical significance” and the focus on testing for p-values below a threshold.

One unexpected benefit of eliminating the words “significant” and “significantly” as well as their opposites would be a reduction in the number of words in a manuscript, which could be used for other things. In the recent publication from the Stop-BPD trial that I posted about recently, the words significant and significantly were used 19 times.

In contrast, I am currently revising an article for publication, and it is actually quite difficult! It is so ingrained to think of p<0.05 being significant that trying to come up with other ways of talking about the results of statistical tests can require some actual thought about the meaning of your results!

More seriously, the tyranny of p<0.05 and the use of the words “significant” and “non-significant” lead to a distortion of the English language. For example, a study with 100 patients per group might find that one group has a mortality of 10% and the other has a mortality of 20% (p=0.075,), it would be dangerous and misleading to state “there was no difference in mortality” just because the p-value was too large “p>0.05”, or “NS”.

This is also not a “trend”, a word which implies that things are moving in that direction, it is a real finding in the results, but like all real findings it can only give an estimate of what the actual difference would be if the 2 treatments were given to the entire population. That actual difference is unknowable, and we should be more careful about pretending we know what the actual difference is. Any result from a trial is only an estimate of the true impact of the intervention being tested, an estimate which gets closer to the likely probable true impact as the compatibility intervals become smaller, as long as there are no biases in the trial.

It is also, I think wrong to suggest that the difference is “non-significant” only because of lack of numbers. That always presupposes that a larger trial would have found the same proportional difference (100/1000, vs 200/1000), and that it would then become significant (p<0.001, sorry about the < sign, but the software doesn’t give actual p-values when they are that small!) In reality a larger study might show a mortality difference anywhere within, or beyond, the compatibility intervals of the initial trial.

A better way of presenting those data would be the actual continuous p-value from Yates corrected chi-square, which is 0.075, the actual risk difference in deaths, 0.2 – 0.1, that is, 0.1 and the 95% compatibility intervals of that difference which are 0.07 to +0.26. So the sentence in the results should read something like, “there was a 10% absolute difference in mortality between groups 10% vs 20%, p=0.075, a difference which is most compatible with a range of impacts on mortality between a 7% increase and a 26% decrease”. That is longer than saying “no difference in mortality”, but it has the advantage of being true, and of using some of the words you saved by eliminating “significant” from the paper. It also alerts readers and future researchers that there is a potential for substantial differences in a major clinically important outcome, which does not happen when the terms non-significant, NS, p>0.05, or no impact, are used.

I am going to do my best to avoid thinking of statistical tests as yes or no, true/not true, effective/not effective, and to avoid the word “significant” in my publications, I wonder how long until an editor tells me that doesn’t work, and I have to say it, or makes me say “no difference” because p>0.05.

Posted in Neonatal Research | Tagged Research Design, statistics | 2 Comments

To p or not to p, that is the question.

Posted on 1 April 2019 by Keith Barrington

I can’t claim preference for this title, although I wish I could. I copied it from an article published in an ENT journal (Buchinsky FJ, Chadha NK. To P or Not to P: Backing Bayesian Statistics. Otolaryngol Head Neck Surg. 2017;157(6):915-8).

I think the word “significant” should be banned. (Not in life; I am not a fascist; you can say whatever you want is significant, but in medical research there is so much confusion about the term that we would be better to never use it!)

I think authors who find a potentially positive result in a good quality study should be allowed to say things like “if there were no other unanticipated biases in our research design, the likelihood that our results are due solely to random variation is less than 1 in 20”, which is less sexy, but more accurate, compared to saying “our results were significant”. (If there are any real statisticians out there reading this, and I say anything which is not accurate, please let me know, I only have basic statistical training and would be happy to be corrected!)

It would certainly be much better than assuming that p<0.05 means that you definitely found an effect, or that p>0.05 means that there is nothing there!

In this blog I usually try to avoid the term “statistically significant” (or not), as the term is often used to imply “proven effect” as compared to “proof of no effect”. I hope we all know that the threshold, where p=0.051 means no effect, and p=0.049 means proven effect, is nonsense. Some journals have banned the reporting of p-values and even confidence intervals, as a result. I think this is extreme, I think we should be able to report confidence intervals, but that multiple confidence intervals, 90, 95, and 99% should perhaps be demanded. And also appropriate wording, similar to what I suggested above. The risk is that a 95% confidence interval which excludes unity will be considered to be proof that there is a real difference, which is no better than using a p-value threshold. The differing confidence intervals could be used to give an overall estimate of an effect, and its potential ranges.

In this blog I probably sometimes get caught up in the usual patterns of referring to p-values, but usually I try to say something like “not likely to be due to chance alone”, which does not mean that a difference is necessarily due to a real effect of the intervention, but that the data would be unlikely if you picked the numbers at random out of a soup of numbers. All sorts of things might cause a p-value to be less than 0.05 when you compare outcomes between 2 groups with a different intervention, only a minority of which are due to a true impact of the intervention.

One recent paper that I liked was by Doug Altman and a group of co-workers (Greenland S, et al. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol. 2016;31(4):337-50) they list the many errors that people make when talking about the statistical test results, when I read the list it makes me think of the many similar errors I have read, and probably made myself.

A study with an unknown bias might well provide a “significant” p-value when there is no real effect of the intervention, just as a study with a “non-significant” p-value might report a major advance in medicine.

The authors of that recent paper put it this way :

It is true that the smaller the P value, the more unusual the data would be if every single assumption were correct; but a very small P value does not tell us which assumption is incorrect. For example, the P value may be very small because the targeted hypothesis is false; but it may instead (or in addition) be very small because the study protocols were violated, or because it was selected for presentation based on its small size. Conversely, a large P value indicates only that the data are not unusual under the model, but does not imply that the model or any aspect of it (such as the targeted hypothesis) is correct; it may instead (or in addition) be large because (again) the study protocols were violated, or because it was selected for presentation based on its large size.

There have been recent publications suggesting that the critical P-value should be shifted to a much smaller number (such as p<0.005), particularly for epidemiological, rather than interventional studies. But I think that will just shift the problem, and will make it harder to find really useful beneficial effects, or to potentially harmful results.

Abandoning the term “statistically significant” should be enforced, and will force us to makes more nuanced and reasonable evaluations of our data.

Posted in Neonatal Research | Tagged Research Design, statistics | Leave a comment

Partnering with parents

Posted on 31 March 2019 by Keith Barrington

For a few years now Annie Janvier in our unit has been developing programs of partnership with families. Using contacts with mostly “veteran parents”, and occasionally veteran patients, we have developed partnerships in research, patient care, and education.

The “PAF” team (équipe Partenariat Famille) have now published a report of how such family partnerships can be developed, how their impacts can be evaluated, and how our partnerships have developed and expanded as a result of those evaluations (Dahan S, et al. Beyond a Seat at the Table: The Added Value of Family Stakeholders to Improve Care, Research, and Education in Neonatology. JPediatr 2019;207:123-9 e2). Last year we published a review of integration of parents in research endeavours, (Janvier A, et al. Integrating Parents in Neonatal and Pediatric Research. Neonatology. 2019;115(4):283-91) and included in that review some of our endeavours and our research about family participation specifically in research. The group also published a review article about what has been published about family participation in the NICU (Bourque CJ, et al. Improving neonatal care with the help of veteran resource parents: An overview of current practices. Seminars in fetal & neonatal medicine. 2018;23(1):44-51).

The new article is an in-depth evaluation of the PAF team development, evaluation, and improvement, some of the mistakes made along the way, and some principles, many of which are probably generalizable, that can be used to help in the process.

https://ars.els-cdn.com/content/image/1-s2.0-S0022347618316986-ympd10451-fig-0002_lrg.jpg

The title, I think, is apposite, although many of us have been discussing how to involve parents over the past few years, often the involvement of parents has been seen as a “nice extra”. In contrast, I think we should consider that everything that we do will benefit from the full integration of resource parents in our teams, and that having a token parent seat at the table is not enough.

For anyone who doesn’t have full text access to the Journal of Pediatrics, Annie gave me permission to include the following link in this blog post https://authors.elsevier.com/a/1YnEL55CrsVAw the first 50 people accessing the link can download a free full text.

The PAF initiative costs very little, but there are some costs, mostly for parking, snacks, our wall of hope, and other minor costs. Our goal for fundraising this year is only $12,000 (Canadian), please consider making a small donation to our team. If you like this blog, please consider making a large donation!

Please Follow the link to our fundraising page. and click on “Donate Now”.

Posted in Neonatal Research | Leave a comment

How should we evaluate heart rate during neonatal resuscitation?

Posted on 28 February 2019 by Keith Barrington

Many babies receive some sort of “resuscitation” during their transition from intra-uterine to extra-uterine life.

How do we decide when a baby needs intervention? A baby who is active and breathing is usually left alone, a baby who is neither of those things might need intervention, and many of our decisions are based on the baby’s heart rate.

Bradycardia= needs ventilation. Mild bradycardia= optimize ventilation and reassess, good heart rate = observe and wait. I like things to be simple!

Recent studies have focused on heart rate determination as the best indication that adaptation is appropriate, but that begs the question:, how to determine heart rate? Should we listen to their heart sounds, palpate their pulses, or watch their ECG? It seems that getting an accurate heart rate is faster with immediate ECG application (Katheria A, et al. A pilot randomized controlled trial of EKG for neonatal resuscitation. PLoS One. 2017;12(11):e0187730) and that this might lead to more rapid institution of corrective actions. But electrical activity of the heart does not mean that it is pumping well; in animal models pulseless electrical activity is frequent. Many immature animals, after resuscitation, have periods of electrical activity without mechanical activity. If that happens with babies, then we may have to readjust our algorithms; presence of an ECG signal does not mean that you necessarily have adequate cardiac function.

A group of us interested in the issues have been discussing this for a while, and decided to write a brief article, focusing on the results from Po-Yin Cheung and Georg Schmolzer’s lab in Edmonton. (That, I always like to point out, used to be my lab! (here’s one example) but Po-Yin and Georg and doing better work from that lab than I ever did.) Patel S, et al. Pulseless electrical activity: a misdiagnosed entity during asphyxia in newborn infants? Archives of disease in childhood Fetal and neonatal edition. 2018. The new article notes that PEA (which I always used to call electro-mechanical dissociation (EMD)) occurs frequently in animals that have been exposed to clinically relevant models of perinatal asphyxia.

Does this actually happen in human newborns? Yes. Luong D, et al. Cardiac arrest with pulseless electrical activity rhythm in newborn infants: a case series. Archives of disease in childhood Fetal and neonatal edition. 2019. Four cases are reported in this article, and I know personally of two others, I wasn’t able to get them into the article (of which I am co-author), but this is not something that is vanishingly rare; how frequent is it? We really don’t know, but I think we should investigate that somehow.

What I think this means is that, when resuscitating depressed newborns, the ECG might be very helpful to get an accurate heart rate quickly, and if the heart rate is slow we should respond according to NRP algorithms.

At some point we should confirm that there is actually cardiac contraction, not just electrical activity. If the infant starts to move and breathe, that is probably enough evidence. BUT, if the ECG heart rate is present but the baby isn’t improving, we should immediately evaluate whether there is sufficient cardiac activity.

In the cases we report there was ECG activity, but no actual cardiac function detectable, when that was recognized and interventions followed, all the babies were severely damaged, and they all died. I wonder if the situation had been recognized faster, could there have been better outcomes? We could even ask if those babies would have been better treated without the ECG?

Maybe the introduction of the ECG as a routine measure of cardiac activity during neonatal resuscitation has been an error?

How should we determine that the heart is actually contracting effectively? I think if the pulse oximeter is giving a reliable signal, at the same rate as the ECG, that means there is at least some arterial pulsation in the right wrist/hand and probably perfusion is at least minimally effective: if the pulse oximeter is not (yet) functioning, then palpation of the pulses may be adequate, or perhaps clear heart sounds are enough evidence that the heart is actually moving…

I’m not sure what the best approach is, but recognizing that the ECG only identifies electrical activity, and that actual cardiac pumping is what the baby needs, is the first step.

Posted in Neonatal Research | Tagged asphyxia, Resuscitation | 3 Comments

Death or oxygen, which is worse?

Posted on 27 February 2019 by Keith Barrington

We have a big problem in neonatal research. We have constructed composite outcomes that have become the “standard of design”, but are not of much use for anyone. Because we are, rightly, concerned that death and other diagnoses may be competing outcomes, we often use as the primary outcome measure “death or BPD” or “death or severe retinopathy” or death or “neurodevelopmental impairment”. We have done this because dead babies can’t develop BPD, or developmental delay.

The idea, of course, is that we want to see if an intervention will improve survival without lung injury, for example. There are two problems with this, if the outcome is more frequent, but neither part of the outcome is individually significantly affected. What then? The other problem is that we might well find that death is less frequent but that lung injury is more frequent. And what then? If the composite outcome is unchanged, then strictly speaking we can only say that the study found no effect on the outcome, and an analysis of the parts of the composite outcome are considered secondary analyses.

This happens. The SUPPORT trial showed no effect of oxygen saturation targets on the primary outcome, but the low target babies had more mortality, while the high target babies had more retinopathy.

Study designs like this are effectively equating the parts of the primary outcome in importance for the analysis.

By studying the outcome of “death or BPD” we are effectively saying that an adverse outcome is being dead or being on low-flow oxygen at 36 weeks. I don’t think many readers of this blog would agree, if they themselves were critically ill, that surviving with a need for long-term domiciliary oxygen and being dead were equivalent.

This has again become painfully clear with the publication of the STOP-BPD trial. (Onland W, et al. Effect of Hydrocortisone Therapy Initiated 7 to 14 Days After Birth on Mortality or Bronchopulmonary Dysplasia Among Very Preterm Infants Receiving Mechanical Ventilation: A Randomized Clinical Trial. JAMA. 2019;321(4):354-63). This was a very high quality, important trial of hydrocortisone in ventilator dependent babies. Infants less than 1250 g birthweight and <30 wk gestation were randomized to placebo or to hydrocortisone 1.25 mg/kg/dose 4 times a day for a week, then 3 times a day for 5 days, then twice a day for 5 days then once a day for 5 days.

They had to be ventilator dependent at 7 to 14 days of age with a respiratory index (product of mean airway pressure and the fraction of inspired oxygen) equal to or greater than 3.5 for more than 12 h/d for at least 48 hours.

Which would mean for example a mean airway pressure of 8 and an FiO2 of 0.44.

During the initial months of the trial, participating centers noted that many infants receiving ventilation and considered at high risk of BPD had a respiratory index of less than 3.5 and were treated with corticosteroids outside the trial. Based on this feedback, the respiratory index threshold was reduced to 3.0 and finally to 2.5 (in May 2012 and December 2012, respectively) via approved protocol amendments.

By the end of the trial, then, an infant at 7 days of age, with a mean airway pressure of 8 on 32% oxygen or more would have been eligible.

The definition of BPD was oxygen requirement at 36 weeks (with an O2 reduction test if needing less than 30%). Death was also recorded to 36 weeks for the primary outcome. Which means that dying between 36 weeks and discharge would be considered a good outcome, if you didn’t have BPD.

The primary outcome occurred in 128/181 hydrocortisone babies (70.7%), and 140/190 controls (73.7%). In other words there was no impact of the hydrocortisone, which is what the abstract states. But at 36 weeks there were significantly, and substantially, more babies who received hydrocortisone alive than controls, 84.5% vs 76.3%, which was “statistically significant” p=0.048. Between 36 weeks and hospital discharge there were several deaths in each groups, and the difference had narrowed slightly, with 80% of hydrocortisone babies and 71% of control babies being alive, p=0.06.

This happened despite a very high rate of open-label hydrocortisone use in the control babies. In fact 108 of the 190 control babies received hydrocortisone.

The protocol is available with the publication, and it notes the following :

In case of life threatening deterioration of the pulmonary condition, the attending physician may decide to start open label corticosteroids therapy in an attempt to improve the pulmonary condition. At that point in time the study medication is stopped and the patient will be recorded as “treatment failure”.

This could occur during the 21 days of study drug use. In addition, physicians could give steroids after the 21 days of the study drug:

Late rescue therapy outside study protocol (late rescue glucocorticoids): Patients still on mechanical ventilation after completion of the study medication, i.e. day 22, may be treated with open label corticosteroids.

I’m not quite sure about this, but I think that 86 of those 108 control babies who received hydrocortisone got it during the 21 days study drug window, and 22 others received steroids after the study drug period. In the hydrocortisone group I can see no indication of how many got open-label steroids during the study drug period, but there are 6 who got steroids after the end of that period.

The substantial differences in mortality are despite a very high rate of treatment of babies randomized to control who received hydrocortisone, which will of course dilute the potential impact of the intervention.

There are modest differences in BPD between the groups, with the hydrocortisone babies having slightly more (100 cases vs 95), but if you express this result as “BPD among survivors”, the numbers are actually identical; just over 65% in each group.

I think the best interpretation of this study would be as follows: eligible babies who received immediate hydrocortisone, compared to those who waited and only received hydrocortisone in the case of a “life-threatening” deterioration, were less likely to die, but, if they survived had the same likelihood of developing BPD.

I hope there is neurological and developmental follow up planned for this trial, although the power of the study to say very much, when so many control babies received hydrocortisone, will be quite limited.

This is now a huge problem, the published article states there is no effect of hydrocortisone, but that is not what I get from the data.

Here is the cute graphic that accompanies the paper

Effect of Hydrocortisone 7-14 Days After Birth in Very Preterm Infants Receiving Mechanical Ventilation

What can we do about this? Based on this study, the use of hydrocortisone in a similar dose, to infants with substantial oxygen requirements after 7 days of age would be a reasonable choice. Waiting for life threatening deterioration (it would be interesting to know what that meant to the attending physicians!) seems to increase your risk of dying. I think it is unlikely that any neurological or developmental impacts of hydrocortisone are severe enough to be worse than dying, and I just hope that any long term outcome study of these infants does not use the outcome “death or low Bayley scores”.

Analyzing the deaths differently using survival curves gives the following, with a p-value suggesting that this is unlikely to be due to chance alone. I know it’s a bit more than .05, but there is only 1 chance in 17 that completely random numbers would give a difference like this :

https://cdn.jamanetwork.com/ama/content_public/journal/jama/937780/joi180157f2.png?Expires=2147483647&Signature=hTq13Xvb7CsNt4iGV2YbDuLylAyVKsFskeO4jGl-XR6W9ihP9Q0XV-8XD7qq1u6T9YHsWah4bxUVcNRetg35BknOyw-SIXqB~w9PCowxd3BK0ul8p3AW3W6vA4rPZ1xs692EplAQbCHUO8kO8sabQc5oz1pYmtXkfy8YYN08jU02zWEk2tkzFz101~tnugmfOeeBmeVxcEXpOgPonfnbHUNcRDSE6ZFwY0u5JjSI2j02wkWBS9TC00K825Y8uXCSs~RnBByZXc4~lLqxqly6LZ0Z-Qit1oSyfZqJi57eWT4JDQj1lthE17nAPVXhYNWXwYNm3Ft-5paaUTbCBpAcvA__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA

I think we have to stop using “death or BPD” as a composite dichotomous outcome for our studies.

There are alternatives, even when death and the other outcome of interest are competing.

One way is to analyze the same data differently. One method, for example, is to compare each babies outcome to all of the babies in the other group. A baby who dies receives zero points in comparison to the other group babies who died, receives -1 point in comparison to the other group babies who survived. Each surviving baby with BPD is then scored +1 point in comparison with the other group babies who died, zero points in comparison with the other group babies with BPD and -1 point in comparison with the surviving babies without BPD, and babies without BPD score +1 in comparison with babies who died or survived with BPD, and score 0 in comparison with babies who survived without BPD. The ratio of winning to losing babies is then referred to as the “win ratio”.

This is a variant of the method used by the study I discussed in my last post, Beitler et al examining different ways of determining optimal PEEP. Finkelstein DM, Schoenfeld DA. Combining mortality and longitudinal measures in clinical trials. Statistics in Medicine. 1999;18(11):1341-54. In fact it is more generally applicable, and there have been multiple publications about the method (and other related methods) as well as many publications using the methods, mostly in cardiology, where composite outcomes including death or a revascularization procedure, as one example, are common, but recognized to have differing weights. Pocock SJ, et al. The win ratio: a new approach to the analysis of composite endpoints in clinical trials based on clinical priorities. European Heart Journal. 2012;33(2):176-82.

For example, if you ran a study with 20 babies per group, and the results showed group A had 5 deaths and 10 survivors with BPD, group B had 10 deaths and 5 with BPD. Our usual analysis would say there was no impact on “death or BPD”. The analysis that I have just suggested, in contrast, gives a score in group A to each one of the dead babies of -10, and -15 to those in group B. The BPD babies each score+5 in group A and 0 in group B, and the survivors without BPD score +15 in both groups. The win ratio for the trial is 3.0 for group A, as there are 15 babies who win overall in most of their pairwise comparisons, and 5 who lose. Calculating the p=value for this is complicated, but well described, and methods for calculating the confidence interval of the win ratio are, also.

Effectively, what this kind of analysis does is to rank the adverse outcomes, death being scored before BPD.

I would be fascinated to see what the results of STOP-BPD would look like if this kind of analysis was performed, the win ratio of the hydrocortisone group works out to 5.2 to my calculation, compared to 3.4 for the controls. It could be that such a difference is statistically significant, and such an analysis might enable future trials to be designed using this method.

You can also with this technique examine different severities of BPD, with BPD being scored as moderate vs severe. This kind of analysis can also include longitudinal quantitative measures, such as duration of home oxygen therapy, or number of admissions after discharge. Things which are, I would suggest, far more important to parents than whether the oxygen is stopped before or after 36 weeks.

Before there are any other trials counting death and BPD as equally important outcome measures, or death and retinopathy, or death and developmental delay, or “death, BPD, NEC, LOS, IVH, ROP” we should reconsider how we measure and analyze outcomes. We should be including outcomes that are important to families, rank them according to their relative importance to parents, and analyze them using methods which are now well validated which take into account their relative importance.

Posted in Neonatal Research | Tagged mortality, Randomized Controlled Trials, Research Design, steroids | 6 Comments

Clinical evaluation vs Technology

Posted on 26 February 2019 by Keith Barrington

Two recent trials in adult ICU patients ask very interesting questions, questions which are only linked by testing something clinically simple versus a more technologically demanding evaluation.

The first was comparing the use of serum lactate concentrations versus capillary filling time in adults with septic shock. (Hernandez G, et al. Effect of a Resuscitation Strategy Targeting Peripheral Perfusion Status vs Serum Lactate Levels on 28-Day Mortality Among Patients With Septic Shock: The ANDROMEDA-SHOCK Randomized Clinical Trial. JAMA. 2019;321(7):654-64). If capillary filling is a valid indicator of peripheral perfusion, then it should react more quickly to changes in perfusion than the lactate. For example, if an intervention improves perfusion, then capillary filling should improve immediately, but the improvement in metabolism leading to lactate clearance and then actual reductions in serum lactate will take much longer.

This might be most important in countries with limited access to laboratory results, but in all countries, if our clinical evaluation of perfusion is accurate and can be followed in almost real-time and responds more quickly to changes in actual perfusion, then it would be an advantage for all of us to include it in our protocols.

28 hospitals in 5 South American countries participated in a trial which enrolled 424 adults with septic shock. The protocols for following either serum lactate or capillary filling time were clearly documented, and mostly followed. Cap filling was performed in a highly standardized way with a glass slide used to compress a nail bed of hand until blanched for 10 seconds, then released and the return of colour was monitored with a stop-watch.

To spare you all the details of the protocols, they were very similar apart from the methods used to evaluate perfusion.

Mortality was 35% in the cap filling group and 43% in the lactate group. This was not “statistically significant” but it looks to me like a big deal! I guess I should say now that the study was underpowered but having read this really clear blog post about using that phrase (https://towardsdatascience.com/why-you-shouldnt-say-this-study-is-underpowered-627f002ddf35) I will say that the study was underpowered to detect really important differences in mortality, differences that that anyone who has septic shock might be interested in.

In fact the 95% confidence intervals for the hazard ratio only just include 1.0 (0.55 to 1.02) and the p-value was 0.06, which might not conventionally be “significant”, but is at least highly suggestive. The patients in the cap refill group received about 0.5 L less fluid during the resuscitation phase, and had less organ dysfunction. Again suggesting that maybe we often give too much fluid to patients with septic shock, leading to organ dysfunction and death.

The second trial was in adults with severe ARDS, and the choice of PEEP between a clinical approach, adjusting the PEEP according to the FiO2, and a more physiologic invasive approach requiring the estimation of pleural pressure by inserting an esophageal pressure catheter. (Beitler JR, et al. Effect of Titrating Positive End-Expiratory Pressure (PEEP) With an Esophageal Pressure-Guided Strategy vs an Empirical High PEEP-Fio2 Strategy on Death and Days Free From Mechanical Ventilation Among Patients With Acute Respiratory Distress Syndrome: A Randomized Clinical Trial. JAMA. 2019). Of note, the PEEP could be increased to 24 (!) cm H2O if the adult was in 100% oxygen in the FiO2 directed group, and was as high as 36 (!!) cmH2O in the esophageal pressure group, and mortality was about 30% in each group.

There were no advantages shown to the more invasive strategy, so the simple clinically directed adjustment of PEEP was equally effective.

One reason for discussing this study is that they don’t analyze “BPD or chronic lung disease” as an outcome! The primary outcome and the method of analysis are fascinating :

The prespecified primary end point was a ranked composite score that incorporated death and days free from mechanical ventilation through day 28, calculated in such a manner that death constitutes a worse outcome than fewer days off the ventilator. Time free from mechanical ventilation was calculated as the number of days between successful liberation from the ventilator and study day 28. Each patient was compared with every other patient in the study and assigned a score (tie: 0, win: +1, loss: −1) for each pairwise comparison based on whom fared better. If one patient survived and the other did not, scores of +1 and −1 were assigned, respectively, for that pairwise comparison. If both patients in the pairwise comparison survived, the assigned score depended on which patient had more days free from mechanical ventilation: the patient with more days off the ventilator received a score of +1, while the patient with fewer days received a score of −1. If both patients survived and had the same number of days off the ventilator, or if both patients died, they both were assigned a score of 0 for that pairwise comparison. For each patient, scores for all pairwise comparisons were summed, resulting in a cumulative score for each patient. These cumulative scores were ranked and compared between treatment groups via the Mann-Whitney technique.

I think there are some very important lessons to be learned here. You can incorporate potentially competing outcomes without giving them the same importance in the analysis. In this analysis death is clearly considered more important than getting extubated more quickly.

The next post will take this discussion further.

Posted in Neonatal Research | Tagged ARDS, Assisted ventilation, Sepsis, Shock | Leave a comment

Neonatal Research

Measure gastric residuals? Safe to stop?

Do transfusions trigger NEC? or does anemia?

Platelet transfusions don’t close the PDA, but they may increase IVH

Sail Away, Sail Away…

To p or not to p, what is the alternative?

To p or not to p, that is the question.

Partnering with parents

How should we evaluate heart rate during neonatal resuscitation?

Death or oxygen, which is worse?

Clinical evaluation vs Technology

Recent Posts

breathe, baby, breathe

Follow Neonatal Research via Email

Respire, bébé, respire!

RSS Links

Canadian Premature Babies Foundation

Sainte Justine Hospital

Canadian Neonatal Network

Préma-Québec

Categories

Transport Néonatal

Archives

Meta

Posts with most views, last 48 hours

Blog Stats

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Recent Posts

breathe, baby, breathe

Follow Neonatal Research via Email

Respire, bébé, respire!

RSS Links

Canadian Premature Babies Foundation

Sainte Justine Hospital

Canadian Neonatal Network

Préma-Québec

Categories

Transport Néonatal

Archives

Meta

Posts with most views, last 48 hours

Blog Stats