Combining the results from Clinical Trials (Pulse Article 2001)

This article is part of a series on Critical Reading.

In an article on sub-group comparisons I warned about the danger of paying too much attention to results from patients in particular sub-groups of a trial, arguing that the overall treatment effect is usually the best measure for all the patients.

In the same way, when the results of all available clinical trials are combined in a Systematic Review (for example in a Cochrane review) care is still required in the interpretation of the results from each individual trial, and the main focus is on the pooled result giving the average from all the trials. The results are often displayed in a forest plot as demonstrated below. The result of each trial is represented by a rectangle (which is larger for the bigger trials) and the horizontal lines indicate the 95% confidence interval of each trial. The diamond at the bottom is the pooled result and its confidence interval is the width of the diamond.1

As hospital admissions for acute asthma were rare in each trial (shown in the columns of data for Holding Chamber and nebuliser) the uncertainty of the individual trials is seen in wide confidence intervals but when these are pooled together the uncertainty shrinks to a much narrower estimate. The pooled odds ratio of one indicates no difference shown between delivery methods for beta-agonists in acute asthma as far as admission rates are concerned, but the estimate is still imprecise and compatible with both a halving or a doubling of the odds of being admitted to hospital. So we have to say that we do not know whether there is a difference in the rate of admissions between the two delivery methods.

Before all the results are combined it is wise to carry out statistical tests to look for Publication Bias. There is evidence that positive results from Clinical trials are more likely to be published in major journals, and in the English language than similar trials that report negative results. When published studies are combined this leads to a tendency to overestimate the benefits of treatment. The easiest way to look for this is using a funnel plot of the results from the trials, where the results of each trial are plotted against the size of each study. Chance variations mean that small studies should show more random scatter in both directions around the pooled result. If all the small studies are showing positive results there is a suspicion that other small studies exist with negative results but were not published. The funnel plot shown below is taken from a Cochrane review of the use of Nicotine gum for smoking cessation and is reasonably symmetrical.


A further important check is to look for Heterogeneity. The individual trials will again show chance variation in their results and in a Systematic Review it is usual to test whether the differences are larger than those expected than by chance alone. The Forest plot above shows that the Heterogeneity in this set of trials is quite low. However if significant Heterogeneity is shown (in other words the results are more diverse than expected) it is recommended to explore the reasons why this may be. Although statistical adjustments can be made to incorporate such Heterogeneity (using a so called Random Effects Model) this should not be accepted uncritically. It may be more sensible not to try to combine the trial results at all.

An example of this can be found in the BMJ in October 1999 in which a group from Toronto published a meta-analysis of Helicobacter eradication (1). The statistical tests showed considerable Heterogeneity between the trials that was largely ignored by the authors. Inspection of the trials shows that there were two types; some with outcomes measured at six weeks using single treatments and others using triple therapy and measuring dyspepsia at one year. There is no good clinical reason to put these together and this may well explain the diversity of the results (2).

The message is to use your common sense when deciding whether the differences between the outcomes measured and the treatments used in each trial mean that it is safer not to calculate a single average result (not least because the average is not easy to interpret and apply to clinical practice).

1. Jaakimainen RL, Boyle E, Tuciver F. Is Helicobacter pylori associated with non-ulcer dyspepsia and will eradication improve symptoms? A meta-analysis. BMJ 1999;319:1040-4

2. Studies included in meta-analysis had heterogeneous, not homogeneous, results. Cates C. BMJ 2000;320:1208