The CAP babies are now 11

I was privileged to be part of the CAP trial group, a pivotal neonatal trial that showed improved Bayley scores and improved motor function at 18 to 21 months corrected age, among infants randomized before 10 days of age, with a birthweight under 1251 g, and considered eligible for caffeine therapy.

By 5 years of age there continued to be some advantages of caffeine treatment, compared to placebo, mostly in the motor domain.

Schmidt B, et al. Academic Performance, Motor Function, and Behavior 11 Years After Neonatal Caffeine Citrate Therapy for Apnea of Prematurity: An 11-Year Follow-up of the CAP Randomized Clinical Trial. JAMA Pediatr. 2017. A new publication presents data from 920 of 1200 of the originally enrolled patients. Quite a remarkable achievement. There were actually 2000 in the original trial but for various reasons (mostly language issues for the standardized testing, and centers who decided not to participate in the longer follow-up) there were 1200 of them eligible for this follow-up.

Functional impairment was less frequent with caffeine than with placebo, but the 95% confidence intervals of the adjusted Odds Ratio were 0.59 to 1.02, which means we can ‘only’ have about 93% confidence that the Odds of escaping impairment are improved by caffeine compared to placebo.

That might be “not significant” in some terms, but I think that’s a pretty significant degree of confidence, for a treatment lacking in any serious side effects.

This new publication confirms that there is no sign of adverse impact at this age of caffeine. The academic achievement mean scores are almost identical between caffeine and placebo groups, with slightly more caffeine children being more than 2SD below the mean.

The motor scores (movement ABC) showed more subjects below -2SD for the various subscales, which was described as being “motor impaired”, in the control group compared to the caffeine treated subjects. I don’t know much about the movement ABC, so I am not sure if “motor impairment” by this measure really has an impact on the child’s life. Only one third of the children with “motor impairment” had cerebral palsy, which was more frequent among placebo babies, 6% compared to 4.3% with caffeine.

We can be re-assured that there has never been evidence of adverse effects of caffeine, now up to 11 years of age, when given as in the CAP trial. It would be great to know if longer durations of treatment (up to term, or even after), or higher doses, could further improve these outcomes. I would be a bit reticent to try higher doses, especially following these publications (Vesoulis ZA, et al. Early High-Dose Caffeine Increases Seizure Burden in Extremely Preterm Neonates: A Preliminary Study. Journal of Caffeine Research. 2016;6(3):101-7. McPherson C, et al. A pilot randomized trial of high-dose caffeine therapy in preterm infants. Pediatr Res. 2015;78(2):198-204), which showed potential serious harm.

But a trial of prolonging caffeine therapy, at the point when it would normally be considered to be stopped, would really be worthwhile. Caffeine would probably, at least temporarily, reduce hypoxic events. Hypoxic events leading up to discharge, and beyond, have been associated with worse developmental outcomes. I think a large multi-center RCT, randomizing children to caffeine or placebo when they have their clinically indicated caffeine stopped, and continuing to term, with dose adjustments to take into account the accelerating clearance of caffeine at this developmental stage, could have real benefits for future babies in our care.

Posted in Neonatal Research | Tagged , , | 2 Comments

Are preterm babies frequently Iodine deficient?

Williams F, et al. Supplemental Iodide for Preterm Infants and Developmental Outcomes at 2 Years: an RCT. Pediatrics. 2017.

We all need some iodine in order to make thyroid hormones (iodine doesn’t actually do anything else for us as far as I know), but do preterm babies get enough? They may need more than older children as they have very small stores, but it also is easy to make them toxic. One of my colleagues and friends, Tony Ryan, I remember as a fellow showed that using iodine containing liquids to lavage the peritoneum during surgery for perforated bowel caused hypothyroxinemia.

I also remember starting a research project to prove whether echocardiographic signs of cardiac output were reliable, performing a dye-dilution cardiac output study on a full term baby, to compare with ultrasound parameters. The dye I used, indocyanine green, contained a lot of iodine, so I had previously decided to do thyroid function studies to make sure that there was no impact. The very first patient had seriously deranged thyroid function after the first study, so the research stopped dead in its tracks (the baby did fine).

Preterm babies receive very little iodine during intravenous feeding, and it has been thought that many preterm babies are deficient, contributing to low thyroxine levels which are statistically associated with poorer developmental outcomes.

So if preterm babies really are frequently iodine deficient, then giving more should improve thyroxine production, which might possibly improve developmental outcomes (if the link between low T4 and slower development is causal). In contrast if too much is given, then transient thyroid suppression can occur. So we need a large, well designed, adequately powered, pragmatic clinical trial to answer the questions.

And here it is.

1273 babies in the UK who were all less than 31 weeks gestation were randomised in the first 2 days of life to receive sodium iodide, or placebo. They received 30 microg/kg per day, either intravenously or enterally until 34 weeks post-menstrual age. The primary outcome variable was the Bayley III scores at 2 years corrected age.

There was absolutely no detectable effect of supplementation with sodium iodide on outcomes. There was little effect on T4 or TSH either, with a small elevation of TSH levels in the iodide supplemented group in some groups at some postnatal ages, but no impact on T4.

288 infants were defined as being hypothyroid, with T4 less than the 10th %le, which looks odd at first how can 20% be below the 10th percentile? Infants were called hypothyroid if they were below that cutoff on any one of 3 occasions, day 7, 14 or 28, which explains this. On subgroup analysis, among hypothyroid infants, there were some signs of a minor benefit:

These figures show the mean differences between iodide supplemented and control infants for the various domains of the Bayley Scales. The hypothyroid babies who received iodide have slightly higher scores than controls on the cognitive and language scales. The euthyroid babies in contrast had very slightly lower scores.

The implications are, I think, that, overall, our babies are already receiving as much iodine as they need for their thyroid function, and for their development. There may be a small subgroup who are hypothyroid and might benefit from a supplement, but it is difficult to identify them, and not certain that there is a clinically significant impact.

Posted in Neonatal Research | Tagged , | Leave a comment

Early low dose hydrocortisone seems to not affect medium term development; PREMILOC outcomes at 22 months.

The PREMILOC trial was a multi-center RCT of hydrocortisone, 0.5mg/kg twice per
day for 7 days followed by 0.5 mg/kg per day for 3 days, given starting within 24 hours of age to infants of 24 to less than 28 weeks gestation.

Neurological and developmental follow-up has just been published (Baud O, et al. Association between early low-dose hydrocortisone therapy in extremely preterm neonates and neurodevelopmental outcomes at 2 years of age. JAMA. 2017;317(13):1329-37.)
There were 523 infants initially enrolled and 406 who survived to 2 years of age, 93% of those were seen at between 21 and 23 months corrected age, for examination and evaluation with standardized instruments.

You probably remember that the primary outcome of the trial was survival without BPD, which was somewhat reduced by the intervention (51% compared to 60% in controls). This was as a result of fewer deaths (18% compared to 23%) and less BPD (22% compared to 26%) neither of which component of the primary outcome was individually significant. In this follow-up study the authors not that after the 36 week end of the main data collection there were a further 8 deaths, 7 in the control group and 1 in the hydrocortisone group, 5 of which were from severe BPD (4 vs 1). (These deaths were also reported as the deaths before discharge in the initial publication, but I don’t think the causes were noted).

All of the babies followed had a standardized neurologic evaluation, but unfortunately only 80% of them had the revised Brunet-Lézine evaluation of developmental progress, which gives a developmental quotient, standardized, as usual, with a population mean of 100 and SD of 15.

Basically there were no differences between the groups on neurological signs of impairment, or developmental scores. For example there were 6% of the hydrocortisone and 5% of the control group who developed cerebral palsy. Mean Global Development score was 91.7 in the hydrocortisone group and 91.4 in the control group.

I guess one could say that if there is less BPD and no increase in neuro or developmental adverse effects, we should think of using this as routine therapy?

But the group also report clinically important respiratory outcomes up to 2 years of age :

You can see from their table 2 that there is no sign of better respiratory health (or incidentally any effect on growth outcomes) among the survivors, with some of the minor differences being in one direction, some in the other direction.

Which calls into question again the use of oxygen at 36 weeks, as an outcome for RCTs even when combined with an oxygen reduction test, as in this trial. If kids are more likely to be out of oxygen at 36 weeks, but no more likely to go home on oxygen (14 babies in each group) and not more likely to have respiratory problems in follow-up, then the significance of getting extubated earlier, or needing oxygen for fewer days is questionable, at least the significance to families.

I think those outcomes are indeed benefits to families, its much better to see your baby with CPAP or non-invasive ventilation than intubated, but if there is on clear long-term benefit then we should be pretty certain that there is no harm before instituting this as routine therapy.

Currently, is there any other evidence of harm from this approach?

In the initial data from this trial, late onset sepsis was higher (31% vs 25% had at least one episode), NEC was higher (7% vs 5%) GI perforation was higher (5% vs 4%) use of insulin for hyperglycemia was higher (38% vs 34%) and severe RoP was higher (2% vs 1%) all of which could be due to chance effects, but the study was not powered to detect such small, but potentially important, differences; indeed in one subgroup, the most immature infants, the impact of steroids on late onset sepsis was, indeed quite different, 40% vs 23%, and their analysis showed this was unlikely due to chance. Its interesting in the on-line supplementary appendix that the major difference in late onset sepsis arose after the end of the treatment period.

It is also interesting that this dose of hydrocortisone had no evident impact on blood pressures, nor on the use of dopamine.

I think that all of these worrying differences between the groups, favoring control, with no evidence of long-term benefit, and the only evidence of short-term benefit being shorter intubation and shorter duration of oxygen therapy, that we should not introduce this regime as a routine in our patients.

There is a minor difference in survival with the hydrocortisone treatment though, with 19% mortality before discharge (and before 2 years) compared to 25% in the control group. I calculate the 95% confidence intervals of this 6% difference as being between 13% fewer deaths and 1% more deaths, using early low dose hydrocortisone in similar babies.

Unfortunately, I think I have to say that this therefore warrants further study. A larger trial with enough power to detect a 5% difference in mortality, perhaps in a region where the survival at 24 and 25 weeks is above 65% (as in this French multi-center trial; compared to for example 78% in the CNN database from 2015) should be performed.

I think a future trial should not use this as a definition of bronchopulmonary dysplasia, other definitions have been suggested, such as this recent publication from the CNN (Isayama T, et al. Revisiting the Definition of Bronchopulmonary Dysplasia: Effect of Changing Panoply of Respiratory Support for Preterm Neonates. JAMA Pediatr. 2017;171(3):271-9.) In this study the best discrimination between those who had serious respiratory morbidity after discharge (when seen at 18 month follow up) from data collected during the neonatal period, was the need for oxygen or respiratory support (anything that gave positive pressure including high-flow cannulae at more than 1.5 litres per minute) at 40 weeks post-menstrual age.

Serious respiratory morbidity was defined as either (1) 3 or more rehospitalizations after NICU discharge owing to respiratory problems (infectious or noninfectious); (2) having a tracheostomy; (3) using respiratory monitoring or support devices at home such as an apnea monitor or pulse oximeter; and (4) being on home oxygen or continuous positive airway pressure at the time of assessment between 18 and 21 months corrected age.

Just as important, a recognition that lung injury in the newborn is a continuous spectrum, and that artificially dividing that into 2 categories, with and without lung injury is an artificial distinction designed to aid research design, not to help babies, or their families. A description of long term respiratory morbidity between groups is essential, rather than a label based on an intermediate outcoem. Mortality, in contrast, is truly a dichotomous outcome, and if it can possibly be improved by low dose early hydrocortisone, than we should pursue that possibility with more studies.

Posted in Neonatal Research | Tagged , , , , | Leave a comment

Longer-term outcomes of very preterm babies, what should we measure, when and why?

Two recent articles have discussed the issue of what outcomes we should measure to analyze neurological and developmental progress in the preterm baby. Both are thoughtful critical pieces that say many things that we need to think about as we follow our patients.

McCormick MC, Litt JS. The Outcomes of Very Preterm Infants: Is It Time to Ask Different Questions? Pediatrics. 2017;139(1).

This review/opinion piece describes some of the limitations of our current approach, and how many important outcomes are not routinely evaluated. It does, unfortunately, refer to many studies that have evaluated Bayley scores as if they measured IQ (for example Betty Vohr’s 2004 study, comparing outcomes between NICHD network sites on 20 month Bayleys, is referred to as showing variations in IQ. IQ, which has major limitations as a measure of outcomes itself, does at least have some correlation with school success and difficulty, it should not be confused with developmental quotients from a BSID exam, which have no clear correlation with functional outcomes). Near the end they state the following:

there is a need to shift to multifaceted conceptual frameworks accounting for physiologic and environmental influences on health and development. Broadly construed, such models should incorporate longitudinal observations of function and changes in function due to maturation, family dynamics, and social environmental contexts. Of particular importance is the
identification of appropriate interventions to buttress the child’s
ability within his or her familial environment.

In other words, trying to reduce outcomes to a single number (such as a Bayley cognitive composite score), is a reductionist approach that is absurd, we need to examine the range of the childs abilities, functions, behaviour and emotional life in order to help them.

Kilbride HW, et al. Prognostic neurodevelopmental testing of preterm infants: do we need to change the paradigm? J Perinatol. 2017.

This second article reiterates many of the issues I have been ranting about on this blog for a while, they start with discussing why one might want to do neurodevelopmental testing of preterm infants

(1) the results can help determine which children  should receive early intervention or enhanced educational services; (2) the assessments can be used as outcome measures in research protocols to determine whether specific neonatal interventions lead to better results and (3) such information may also be used to inform clinicians and parents about the appropriateness of providing care for certain groups of infants.

I think there are also other reasons for performing such testing, preparing parents for the future, and increasing the understanding of the patterns and developmental trajectories of very preterm babies, are 2 examples.

The authors then describe some of the tests that are used, focusing on the various editions of the Bayley tests of infant development, and the 3 editions of that test. They note the now well-publicised shifts in the norms of those tests, and then after a short section discussing the adult and adolescent outcomes of the very preterm baby, discuss whether early developmental testing can be used to predict later intellectual function test scores.

In general, the ability to predict cognitive outcomes at school age from infancy and preschool ages has been described as a conundrum. The elusive nature of estimates of IQ stability may be due to differences in sample selection, data analytic approaches, the presence of appropriate control groups as well as validity of assessment instruments, as discussed earlier. Even in the best of testing circumstances, defining impairment in early childhood is imprecise and is likely to over-estimate level of disability.

They note that there are major socio-economic impacts on development of the very preterm baby, and that those factors become more important over time; the CAP study cohort was a good example of this, the change in scores between the 18 month Bayleys and the 5 year WPPSI was greater in children whose parents had more social advantages.

IQ scores, from testing close to school age, are more closely associated with school performance than earlier developmental testing, but we should ask whether even those scores can, or should, be used as a way of determining whether a child’s life is worthwhile or not. For that is the implication of our use of developmental or IQ testing as a way of dichotomizing the lives of the survivors of NICU into those who are impaired and non-impaired, or intact and non-intact, or disabled and non-. Whatever the terminology the outcome calculators have the advantage of not just relying on gestational age to predict outcomes, but the huge disadvantage that they are used by practitioners to predict which side of the dichotomous outcome “survival without disability” compared to “dead or disabled” a baby will likely fall.

In reality the outcomes of our babies are not dichotomous, being dead is not the same as being disabled, all types of disability are not the same, and how a child with impairment experiences their own life and how they impact a family are not dichotomous phenomena, good or bad, either.

Telling a parent-to-be that a child has a predicted 21% chance of ‘survival without profound impairment’, in the example they use, actually means that they have a 33% chance of survival, and among survivors 64% do not have very low scores on Bayley-II testing at 20 months of age or disabling cerebral palsy. Saying that to parents requires that we know something about the outcome data that our statements are based on, and the major imitations of those data,

Categorization of children based on composite findings should be limited to outcome measurements for research purposes. Providers who counsel families prenatally regarding risk for
extreme preterm or other difficult newborn conditions need to
fully understand the implications of 24-month  neurodevelopmental findings to avoid using terminology that overstates what is known.

I don’t fully agree with the first sentence there though, I think we need to rethink how we use composite outcomes when we design research, as I’ve mentioned previously the SUPPORT oxygen targeting trial was actually a negative trial, the composite outcome of “death or severe retinopathy” was not significantly different between groups, only the individual parts of that outcome were different, with death being higher and retinopathy being lower with the lower saturation targets. But to demonstrate that authors have done 3 analyses, the composite and individually, RoP, and survival, which inflates the risk of a type 1 error, and it has been suggested that should be taken into account in the analysis. Other ways of analyzing trial outcomes with potentially competing outcomes have been proposed, instead of creating potentially confusing composites. I don’t actually think anyone really wanted to know what was the impact of different oxygen saturation targets on “survival without severe RoP” we wanted to know, was it safe to aim for lower targets that some people were already targeting (we were really asking that question about longer term outcomes, not expecting a difference in mortality), and did it really further reduce RoP.

To return to the comment about prenatal counselling, I have to agree with the authors, we should completely avoid presenting outcomes as a risk of a composite outcome compared to not having that composite outcome. The risks of death and of potentially life-affecting impairments must be presented separately, some parents will want to explore the different kinds and severities of various potential outcomes, some will want much less detail, or only focus on the chances of the most severely limiting outcomes.  It is important that we don’t just note something like “parents do not want a handicapped child” without exploring what that means to them. In studies when parents have been asked what they meant by phrases such as that one (and there aren’t many such studies), they generally state that to them an outcome which would make them accept withholding or withdrawal of life-sustaining interventions is a child that ‘cannot think’, or has “no ability to communicate”. In other words, certainly not a low Bayley score or learning difficulties at school, but the most profound limitations.

A brand new publication by parents of an extremely preterm baby and Mark Hudak, a neonatologist from Florida has just appeared. The father writes a blog “They don’t cry” that I have often visited (which is unfortunately not mentioned in the article, and includes a great video of the baby, now a 4 year old child, reading a dinosaur book). The article recounts the experiences of the parents, and is well worth reading, if you don’t have access to the article Eric Ruthford (the dad) recounts some of the same experiences in the early posts on his blog. One horrifying interaction came just before his son, Gabriel was born:

When birth became imminent at 22 weeks and 6 days, 2 neonatologists counseled us that standard practice was to not resuscitate infants born before 23 weeks and 0 days and that many neonatologists in our region believed that resuscitation was unethical in the 22nd week.

The neonatologist who arrived 30 minutes after Miri’s water broke said, “At this stage, I don’t recommend that babies should be intubated because the results are so poor. If you give birth after midnight—that’s just the line for when we’ll intervene—I’ll be the one who comes and resuscitates the baby, but my heart won’t fully be in it.”

I hope the neonatologist who said that, and suggested that the approach would change at midnight (is that on the first stroke of midnight, or was he going to wait until the 12 chimes had all rung?) is embarrassed by that now. Apparently he did finally come to resuscitate Gabriel at 11:20 pm at 22 weeks and 6 days, and did a good job according to the parents. Who offer the following advice:

  • Physicians should seek to understand the values and motivations that underlie the wishes that parents express. If parents ask the physician to not resuscitate their infant, the physician can probe this by saying, “What, in your mind, are some reasons for this decision?” Although some may think this is insensitive, an honest response will help illuminate underlying parental concerns and allow the physician to speak directly to them.

Our motivations were driven both by our religious value that all life, no matter how brief, glorifies God and by our belief in Gabriel’s autonomy—if he could survive, we owed him that chance.

  • When an infant is going to be born in to the “gray zone” in which resuscitation is a parental choice, the physician can say, “Your child will be welcome in our nursery.” Such an approach would have greatly diminished our stress without introducing bias either way and would have affirmed Gabriel as a person. Miri remembers being especially frustrated during the antenatal counseling that the doctors talked about him as a medical condition, not as Gabriel—we had picked his name at that point—or even as, “your baby.” Miri viewed 22 weeks and 6 days as a description of her condition, not as a way of describing Gabriel, and she regarded the statistics relating gestational age to outcomes as being similarly impersonal.

  • The physician can talk about the differences between a child who lives an hour in the delivery room versus one who lives for a few days or weeks in the NICU. Some parents might believe that a short goodbye would be easier. Other parents might feel worse if they did not give their child a chance to survive. We were in the latter camp, and were sobered but not dissuaded when the doctor who recommended against resuscitation told us that setbacks and failures in an infant’s treatment become harder to take later on. In our 5-month NICU stay, Gabriel did have setbacks that frightened us and we often feared that he might not survive: but we never had second thoughts about our decision to offer him a chance for life.

  • For some parents, statistics about functional outcomes will influence decisions. Optimally, outcomes should be more robustly descriptive. “Profound to severe disability” and “severe to moderate disability” sounded to us like “life without parole.” It would be helpful to hear directly from the parents of a premature infant about their perception of their child’s happiness—and their own. For parents concerned about their child’s future abilities, a visit from a pediatric neurologist or developmental specialist who can provide first-hand knowledge about the daily lives of former premature infants could be similarly instructive. For parents concerned about the expense of care and about their inability to leave money in their wills to a potentially disabled adult, a visit from a financial case worker could help. Alternatively, an online system or binder with printed materials might convey information in all 3 areas.

The parents’ thoughts are accompanied by a thoughtful discussion by the neonatologist who states :

They suggest that parents have an opportunity to talk with other parents of premature infants who survived with disability. Perhaps neonatologists should have the same opportunity to challenge their biases. An increasing literature attests to the fact that many disabled survivors of prematurity self-report an acceptable quality of life and do not regret their survival. And should not that be a key consideration for all of us?

The final section is written by the 3 authors together, it ends:

Exploring the fundamental motivations behind parental desires can guide information sharing to be more illuminating than a recitation of survival statistics or graded descriptions of long-term neurodevelopment that do not meaningfully convey a child’s potential abilities. Under similar circumstances, 2 sets of parents may reach different but nonetheless supportable informed decisions. A physician often has these discussions thinking what he or she would decide in a similar circumstance. Yet in the gray zone, the physician is obliged to put aside personal bias to forge a partnership with the parents and to support their most informed decision on behalf of themselves and their child.

That picks up some vitally important issues, policies and position statements have in the past focussed on ensuring that we tell parents all of the bad things that can happen, and all the potential limitations of extremely preterm babies. When do we tell them the positives? What most preterm babies can do, how they positively impact the lives of their families, along with the difficulties?

The outcomes that we should be measuring should be broader and related to function and abilities. They should be reported in ways which describe the range of capacities of our graduates, and show their abilities, not just their disabilities. We should not lump together outcomes which have very different implications for parents, and for the child themselves. As Saroj Saigal and I wrote in an editorial once, (Barrington KJ, Saigal S. Long-term caring for neonates. Paediatr Child Health. 2006;11(5):265-6) we should be proud of the way that neonatologists invented the field of outcomes research, but we need to do still more, to ensure that we don’t just identify and measure problems but study ways to lessen their impacts and further improve the lives of our patients.

Posted in Neonatal Research | Tagged , | 4 Comments

Single Family Rooms in the NICU

We have just moved to a brand new NICU, with 80 beds, in 60 single family rooms, and 10 twin rooms. It is enormous, and beautiful, each room has a parent space with a smallish pull out bed (not enough room for a couple to sleep, maybe that was the idea!), at the same time as moving we had to renew all our monitors, and we added some ventilators and got rid of others, so that we now have only 2 kinds of ventilator, the VN500 and a few creaky Sensormedics, with the others that we occasionally used no longer in service.  We also, around the same time, changed the way we constitute the teams doing service, so we now have 5 teams instead of 4 and divide up the babies differently.

All of which is a preamble to saying that if we compare differences between our previous outcomes, in our mostly double-room setup before the move, and our future outcomes, in the mostly single rooms with much more space for families; even though we have the same group of neonatologists, and we haven’t made any huge change in clinical protocols, so many things have changed that to ascribe them to just the NICU environment would be questionable.

This means of course that observational studies are very limited, any study comparing outcomes with historical controls needs to be viewed with a touch, or more, of scepticism, even though we might ascribe any improvement in BPD incidence (for example) to the move to single rooms, it might well be a combination of other unrelated factors which are responsible.

It is also important, I think, to distinguish between single patient rooms, and single family rooms, some single room NICUs have very limited space for families, and the impacts maybe very different to the NICUs with family-room concepts.

I really like our new unit, even though I say that having been involved in much of the planning (not right at the beginning with the choice of a single family room design, nor right at the end with some of the final details being settled): but is an NICU like that good for babies? and for families?

How to answer a question like that scientifically? Clearly we can’t randomly admit babies to an NICU with single family rooms, single patient rooms, or an NICU with larger rooms having several babies in them. We can either do historical control studies (with limitations such as those I have already discussed) or we can study contemporary groups in different NICUs and try to correct for all the potential differences between the groups. We might be able to look at a single group or region where they have both types of NICU, and where patient admission was pseudo-random (i.e. not based on patient characteristics, but based on other factors such as bed availability).

There are two publications that demonstrate the problems with these approaches,

Pineda RG, et al. Alterations in brain structure and neurodevelopmental outcome in preterm infants hospitalized in different neonatal intensive care unit environments. J Pediatr. 2014;164(1):52-60 e2.

Vohr B, et al. Differential Effects of the Single-Family Room Neonatal Intensive Care Unit on 18- to 24-Month Bayley Scores of Preterm Infants. The Journal of pediatrics. 2017.

The first study, from Terrie Inder’s time in St Louis, compared outcomes between babies admitted to the single room wing of a new NICU and those admitted during the same period, but to the traditional “airplane hangar” NICU. Admission was based on bed availability, and the outcomes the group studied were brain imaging, short term functional outcomes (aEEG and neurological exams), and neurodevelopmental progress, including language, at 2 years of corrected age. 136 infants less than 31 weeks gestation were included, with 127 having most of the measures, and then 107 being eligible for follow-up (after deaths and dropouts) of whom 86 were seen. At 2 years the language scores were 5 points lower among the babies in the single rooms (1/2 a standard deviation). Why would this be? An important new study from the same group has analyzed the type of noise that preterm babies are exposed to, in the two types of environment, they used an automated analyzer which divided periods of noise into those with speech, distant voices, electronic sounds, other noise and silence (Pineda R, et al. Auditory Exposure in the Neonatal Intensive Care Unit: Room Type and Other Predictors. The Journal of pediatrics. 2017). Each recording episode lasted 16 hours, starting before 10am in the morning. There were more periods of silence in the single rooms, and less distant words, the duration of exposure to meaningful words was very short in both types of environment, and increased towards discharge, only around 8 minutes per 16 hour period at birth, up to about 30 minutes per 16 hour period at term.

This certainly all suggests to me that there is a great opportunity in single rooms, to increase exposure to parental, and other positive human voice sounds. Encouraging parents to talk to, sing to, and read to, their babies, and even to record their voices doing those things so the baby can hear sounds that might encourage speech development should be studied more. Is there a saturation effect? Should voice exposures be limited to when the baby is awake?

The study by Betty Vohr, compares human milk intake and developmental outcomes before and after their group moved to a single patient room, about 300 babies under 1250 g birth weight are compared.

Human milk provision increased after the move, particularly after the first 3 weeks, and Bayley III language and cognitive scores improved, with a correlation between those 2 outcomes.

A previous study from this group showed that language outcomes were critically dependent on parental involvement (Lester BM, et al. 18-Month Follow-Up of Infants Cared for in a Single-Family Room Neonatal Intensive Care Unit. The Journal of pediatrics. 2016). When they analyzed hours spent in kangaroo care, breast-feeding and involvement with other care procedures, they found that there was more parental involvement in the single rooms, and the babies with higher parental involvement had better cognitive and language scores at 2 years.

I guess what we need is a systematic review, et voila! (Servel AC, Rideau Batista Novais A. Les chambres familiales en néonatologie : effets sur le nouveau-né prématuré, ses parents et l’équipe soignante. Revue systématique de la littérature. Archives de Pédiatrie. 2016;23(9):921-6). This group searched pubmed for studies in the last 15 years that have evaluated impacts of a single family room design on babies, families and staff. They eliminated studies of single patient rooms without extra family space. They found 12 publications with varying designs and sample sizes, including one randomized trial, despite my comments at the top.

That randomized trial was in two level 2 nurseries in Sweden, who had built new spaces for families, patients were randomized if there was a bed available in both the new and the older 4 bedded spaces, and if a parent could stay for 24 hours a day for the hospitalisation. Babies were admitted either after birth or from the local level 3 NICU. That study showed shorter hospitalisation in the single room, by about 5 days on average (mostly among the babies under 30 weeks on subgroup analysis).

All the other studies were observational with differing designs; the authors of the review note that there seems to be improved weight gain in two studies, and increase in exclusive breast-feeding at discharge in one study, another study showed decreased nosocomial sepsis. From the parents point of view there was an increase in satisfaction in one study, had a greater sense of intimacy with their baby in another study. In contrast parents in one study had a greater sense of isolation, having fewer interactions with other parents, and fewer with the care team.

The nursing and medical staff felt that they worked in a better environment (3 studies) they had higher satisfaction scores (1 study) and had higher quality of work life (1 study).

These results are possibly subject to all sorts of biases: it isn’t clear often which were the primary outcome variables, and which were chosen after the data were collected; there are response biases, staff who have no choice about the NICU design (it is impossible to go back to a large multi-patient room once you have built a new single family unit) might well score their new circumstances better, because they have no choice really but to make the best of their new situation; and so on.

Nevertheless this review suggests mostly improved outcomes in single family rooms, with concerns about family isolation, and decreased aural stimulation.

Finding ways to overcome the downsides of these rooms, while maintaining those advantages might well help to improve many different outcomes of our premature infants.

Posted in Neonatal Research | Tagged , , , | 1 Comment

Running for Neonates, and their families

On April the 23rd I will be running a half marathon, as part of the PAF-Néonat team of Sainte Justine Hospital.

We are raising funds for the partnering with families program, which involves parents in clinical care, research and education in our neonatal service.

We have a quite innovative program and want to expand it further.

Our team for the run includes children, parents, and professionals, who will be running anything from 0.5k (children only!), 5k, 10k or the 21 kilometer half marathon.

To make a donation click on this link, at least 95% of the funds raised go directly to support our program.

Posted in Neonatal Research | 2 Comments

Reading Research: Subgroups and Observational studies

In publications of randomized controlled trials, subgroup analyses are frequently performed. The idea behind such analyses being to determine whether one group or another has a different result to the overall results, for example, whether boys or girls have more benefit from an intervention. Sometimes this is done to try to salvage some possibly positive results when the overall result is negative, sometimes to try to refine indications for interventions based on the results.

The first thing to realize is that it would be bizarre if every subgroup had exactly the same result from an intervention, just based on random effects. Simply because, to use my own example, girls had more improvement in a particular outcome than boys, does not mean that the difference is due to some biologic difference between them, it may just be chance, and the next trial might show more impact in boys than in girls.

Interpretation of subgroup analyses always has to be taken with a grain (or even a handful) of salt.

When you examine the results of your trial and then decide to do a subgroup analysis based on a suspicion that the girls did better, you are entering dangerous territory. Such post-hoc subgroup analyses should be avoided like a plague, it is far too easy to be led astray; if by chance blond babies did much better with the intervention and brunettes only did slightly better, and you notice in your data set that this is the case, and then do statistical analysis to show that the results are significant in blonds, and not in brunettes, what should you do? The best idea is to not do such analyses. Stick with subgroup analyses that were decided before the study was started based on a reasonable supposition that one group or another might have a different response. Deciding a priori on a small number of subgroups that might feasibly have different responses, (and not a priori listing every subgroup that you can think of) is the first step. Then the statistical analysis requires an evaluation of the interaction between the intervention and the subgroup, it is not enough to show a significant result in one group and not in another, it requires a statistical test to show that the responses are actually different, and that such a difference is unlikely to be due to chance.

Even when you do all that, the only way to be sure that the difference is real, is to do a prospective trial, which might only include the group who had the apparent benefit, if the overall study was a null trial. Post hoc subgroup analyses are not usually strong enough evidence to even do that, which is why a clear statement of whether a subgroup analysis was decided before or after commencing the trial is important, and why publication of protocols, including a description of planned subgroup analyses, is important.

Sometimes things change during a trial, I remember a trial of an established medication, and the company changed the preparation part way through the trial, which changed bio-availability dramatically, which mandated a subgroup analysis that was not planned before starting. Of course in such a circumstance the publication should describe exactly what was done and why, and why the subgroup analysis became important. Something similar happened in the oxygen targeting trials, when Masimo recalibrated the oximeters in use in several of the trials, the changes in saturations actually achieved required a subgroup analysis.

A publication from 2012 investigated claims of significant subgroup effects in RCTs, and showed that only 50% reported a significant test of interaction (and only 2/3 of those actually reported the test or gave the data). Sun X, et al. Credibility of claims of subgroup effects in randomised controlled trials: systematic review. BMJ. 2012;344:e1553.
That study included a list of criteria for deciding whether a claim of a subgroup effect might be reliable:

Ten criteria used to assess credibility of subgroup effect

  • Was the subgroup variable a baseline characteristic?

  • Was the subgroup variable a stratification factor at randomisation?*

  • Was the subgroup hypothesis specified a priori?

  • Was the subgroup analysis one of a small number of subgroup hypotheses tested (≤5)?

  • Was the test of interaction significant (interaction P<0.05)?

  • Was the significant interaction effect independent, if there were multiple significant interactions?


  • Was the direction of subgroup effect correctly prespecified?

  • Was the subgroup effect consistent with evidence from previous related studies?

  • Was the subgroup effect consistent across related outcomes?

  • Was there any indirect evidence to support the apparent subgroup effect—for example, biological rationale, laboratory tests, animal studies?

A new publication in JAMA Internal Medicine (Wallach JD, et al. Evaluation of Evidence of Statistical Support and Corroboration of Subgroup Claims in Randomized Clinical Trials. JAMA internal medicine. 2017) specifically looked at subgroup analyses in published RCTs. The investigators examined whether such analyses were performed, whether appropriate statistical tests of interaction were performed, how common significant differences were, and then whether any follow-up studies had been done. They found 64 RCTs with 117 analyses making claims of important subgroup differences and :

Of these 117 claims, only 46 (39.3%) in 33 articles had evidence of statistically significant heterogeneity from a test for interaction. In addition, out of these 46 subgroup findings, only 16 (34.8%) ensured balance between randomization groups within the subgroups (eg, through stratified randomization), 13 (28.3%) entailed a prespecified subgroup analysis, and 1 (2.2%) was adjusted for multiple testing. Only 5 (10.9%) of the 46 subgroup findings had at least 1 subsequent pure corroboration attempt by a meta-analysis or an RCT. In all 5 cases, the corroboration attempts found no evidence of a statistically significant subgroup effect.

Most claims of a subgroup difference, then, are not supported, even by the evidence in the actual publications where the claims are made (note to anyone involved in peer review, make sure that statistical tests of interaction are reported before accepting that subgroup differences might be real). In the few cases where later randomized trials are performed which tried to determine whether there really were subgroup differences, they were all negative.

In neonatology, one study which answered most of the above criteria is from the CAP trial: Davis PG, et al. Caffeine for Apnea of Prematurity Trial: Benefits May Vary in Subgroups. The Journal of pediatrics. 2010;156(3):382-7.e3. That secondary analysis showed that age at starting treatment (a baseline characteristic, but not a prespecified subgroup, or a factor for stratification) had a significant impact on the age of  extubation and the age of stopping oxygen. Starting treatment before 3 days had a greater impact than after 3 days, and the interaction was significant, at least for postmenstrual age at last extubation and post-menstrual age of finally stopping CPAP. That publication also showed that the infants who were receiving positive pressure ventilatory support at randomization also had a greater impact on their neurodevelopmental outcome. Both of these findings are biologically plausible, and both are accompanied by subgroup differences for other outcomes which (even if not statistically significantly interactions) were in the same direction, such as a reduction in bronchopulmonary dysplasia.

Observational studies also need to be carefully interpreted. Methods for adjusting for baseline risk differences in cohort studies, such as multivariate regression, propensity analysis and instrumental variable analysis, might help to balance groups for prognostic variables, but there will always remain the potential for unknown prognostic variables to bias the results. A fantastic new addition to the “Users’ guides to the medical literature” series in JAMA has just been published.   Agoritsas T, et al. Adjusted analyses in studies addressing therapy and harm: Users’ guides to the medical literature. JAMA. 2017;317(7):748-59.  A great read for anyone who uses the medical literature and sometimes reads observational studies, which I think is most of us. They describe the various methods of adjustment (in non-statistican language, thankfully) including the “instrumental variable analysis” which was new to me as a term, but the concept is simple. When variations in the application of a treatment occur which are not related to prognosis, then you can use that variation as a substitute for randomization. In other words if a treatment is applied differently in one hospital compared to another (such as inhaled NO in the very preterm) but the hospitals treat the same kind of patients, with the same risk characteristics, then you can use that fact to mimic cluster randomized allocation. The problem is that even the statisticians can’t agree exactly how to do that, and there is still a possibility of unbalance in other prognostic factors.

The authors of the article end with a list of major publications that reported observational studies showing a positive or negative effect of a medication, which was disproved by prospective randomized trials

Comparative effectiveness research relying on observational studies using conventional or novel adjustment procedures risks providing the misleading effect estimates seen with hormone replacement for cardiovascular risk, β-blockers for mortality in noncardiac surgery, antioxidant supplements for healthy people, and statins for cancer. If RCTs cannot be conducted, it will remain impossible to determine whether adjusted estimates are accurate or misleading

The abstract ends with this sentence “Although all these approaches can reduce the risk of bias in observational studies, none replace the balance of both known and unknown prognostic factors offered by randomization.”



Posted in Neonatal Research | Tagged | Leave a comment