Subgroups compared (BMJ 2003)

Statistical notes in the BMJ in January 2003 contain a nice article by Altman and Bland demonstrating a way of testing whether there is a statistically significant difference between subgroups in a clinical trial or meta-analysis.

The key question to answer is whether the effect in one sub-group of patients is significantly different from another group and the point estimate and confidence interval for each group can be used to test this.The P-value for each group should NOT be compared because this addresses the wrong question.This was discussed in a previous set of statistical notes in 1996 .

The problem is that the individual P-values for each subgroup merely tell us the likelihood of the trial results in that subgroup occurring if the null hypothesis is true .Now for both subgroups the null hypothesis is the same, namely that there is no difference between the experimental and control treatments.However, the chance any given result occurring is highly dependent upon the number of patients in the group.Exactly the same point estimate will have a very different P-value as the size of the group gets larger.Thus a small group may have a non-significant P-value for the same point estimate and a large group with the same point estimate can have a much smaller significant P-value.

This is not surprising when likened to tossing a coin, where the null hypothesis is that you will toss a head as often as a tail.If you obtain 60% heads after tossing a coin ten times you will not be surprised, but if it is still 60% after a thousand tosses the coin becomes decidedly suspect.The random chance of tossing 6/10 heads with an unbiased coin is much higher than the random chance of tossing 600/1000 heads.

This demonstrates one of the weaknesses of being over reliant on P-values when looking at the results of clinical trials, as it is very much influenced by the number of patients included in each group.The confidence interval (CI) is much more informative, as they give an idea of where the trial suggests that the true effect of the treatment lies (technically if the trial were repeated 100 times the 95% confidence interval would include the true population effect in 95 of those trials).In a simplified form we can be 95% sure that the true population effect of the treatment is within the 95% confidence interval.

The width of the confidence interval is also affected by the number of patients included, and will get narrow for larger groups, but using the confidence intervals from two subgroups is much more informative than just comparing the P-values.If the two confidence intervals from the subgroups do not overlap, you can start to wonder if there is an important difference between the two subgroups.The Altman paper shows how to assess this more accurately.

A word of warning on subgroups; before attaching too much importance between subgroups you need to check if the groups were defined a priori and whether the division is based on good biological or other grounds.Richard Horton reminded us of the danger of relying on subgroup analysis in his Lancet Editorial (From star signs to trial guidelines. Lancet 2000;355:1033-4.).There is also an entertaining paper with a useful set of questions to ask about subgroups by Freemantle in the BMJ in 2001 entitled “Interpreting the results of secondary end points and subgroup analyses in clinical trials: should we lock the crazy aunt in the attic?”

Whilst as clinicians we would like to know how well a treatment would work in the patient sitting in front of me, we have data from clinical trials outlining the average effect of the treatment on a population of patients.It takes very large numbers of patients to tease out whether individual groups benefit more or less than the overall average.One recent example of this comes from the ALLHAT trial ( Major outcomes in high-risk hypertensive patients randomised to angiotensin-converting enzyme inhibitor or calcium channel blocker vs diuretic: The Antihypertensive and Lipid-Lowering Treatment to Prevent Heart Attack Trial (ALLHAT). JAMA 2002;288(23):2981-97).In this trial a total of 33,357 participants aged 55 years or older with hypertension and at least 1 other CHD risk factor from 623 North American centers were randomised to different drug regimes. The overall conclusion was that chlorthalidone produced cardiovascular outcomes that were at least as good as lisinopril or amlodipine, but is this true in diabetics as well as in non-diabetics? There were a large enough number of patients in the trial to compare the results in different subgroups and compare the findings in diabetic and non-diabetic patients

The relative risk of combined cardiovascular disease in non-diabetics in ALLHAT comparing lisinopril with chlorthalidone was 1.12 (95% CI 1.05 to 1.19) whilst in diabetics it was 1.08 (95% CI 1.00 to 1.07).The diabetics are a smaller group so do not reach statistical significance as the 95% CI just includes no difference (relative risk 1.0). It is tempting to conclude from this that diabetics do not show the same advantage for chlorthalidone that is seen in the non-diabetics.However, this conclusion is unreliable because it is based on the wrong question.We do not want to know what is the chance (P-value) that chlorthalidone matches lisinopril in the diabetics, because this is dependent upon how many diabetics are included in the trial.

What we want to test is whether the diabetics differed significantly from non-diabetics in their response to the different treatments. This can be tested using the method outlined in the recent Altman paper, and the results in diabetics compared to the non-diabetics show that both are very similar.The Risk Ratio between the subgroups is 1.04 (95% CI 0.86 to 1.25). In other words the difference between the patient groups is not statistically significant, nor is the confidence interval very wide, so I would therefore take the overall trial result to apply to diabetics as well.The data from this trial suggest that thiazides may be back in first place for hypertension in diabetics as well as non-diabetics.