Why are 95% confidence intervals more informative than P values?
This is an introductory article from the Update series on Statistics
The sheer volume of material published in medical journals each week is well beyond any of us to keep up with, and in order to save us from drowning in information the writers of systematic reviews aim to collect together and appraise all the evidence from appropriate studies addressing a focussed clinical question. The Cochrane Collaboration has been working at this task for the past twenty years and, in September 2016, there were 7038 completed reviews on the Cochrane Database of Systematic reviews and a further 2520 protocols that will become reviews in the future.
The File-Drawer Problem
So what was wrong with the traditional narrative review from an expert in the field? The previous emphasis has been on understanding the mechanisms of disease and combining this with clinical experience to guide practice.(1) The main problem with this approach is that we all have our preferred way of doing things, and there is a natural tendency to take note of articles that fit in with our view. We may cut these out and keep them in our filing cabinet, whilst articles that do not agree are filed in the rubbish bin. This means that when asked to review a topic it is natural for an expert to go the drawer and quote all the data that supports their favoured approach.
What is a Systematic Review?
So how is a systematic review different? Let’s start with a definition:
Systematic review (synonym: systematic overview): A review of a clearly formulated question that uses systematic and explicit methods to identify, select and critically appraise relevant research, and to collect and analyse data from the studies that are included in the review. Statistical methods (meta-analysis) may or may not be used to analyse and summarise the results of the included studies.
The difference here is that the way the papers were found and analysed is clearly stated. The reader still needs to be satisfied that the search for papers was wide enough to obtain all the relevant data. Searching Medline alone is rarely enough, and if only English language papers are included this may leave out potentially important evidence.
All Cochrane reviews start as a published protocol; this states in advance how the review will be carried out (searching for data, appraising and combining study data). There is therefore some protection against the danger of post-hoc analysis, in which reviewers find that by dividing up the trials in a particular way spurious statistical significance can be generated in sub-groups of patients or treatment types.
Is the Question focussed?
But we have moved on to thinking about how the review was carried out before checking whether the question being addressed is an important one. The PICO structure set out in the first article in this series(2) can be used here to check that the Patient groups, Interventions used, Comparator treatment and Outcomes are sensible. Watch out in particular for surrogate outcomes that may not relate well to the outcome that matters to the patient. One example of this can be found in trials relating influenza vaccine to the prevention of asthma exacerbations. Some trials measure antibody levels to the flu-vaccine given, but what really matters is whether asthmatics have fewer exacerbations or admissions to hospital, and there is precious little data from randomised controlled trials about this (3).
What was the quality of the trials found?
A further issue to think about in Systematic Reviews is whether the type of included studies is appropriate to the question being asked. In a previous article in this series(4) the problems of bias was discussed. In general in questions related to treatment I would expect the review to focus on randomised controlled trials, as this will minimise the bias present in the included studies. Whilst Meta-analysis can be used to combine the results of observational studies, this is unreliable because they may all suffer from the same bias, and this will be combined in the pooled result from all the trials.
When looking at randomised controlled trials the reviewers should report whether the allocation of patients to the treatment and control groups was adequately concealed (allocation concealment). Allocation is best decided remotely after the patient is entered into the trial; even opaque sealed envelopes can be held up to a bright light by trialists who want to check which treatment the next patient will receive. Poor allocation concealment, failure to blind and poor reporting quality in reviews have all been shown to be associated with overoptimistic results of randomised controlled trials.(5)
Publication bias remains a problem, in that studies that may happen to produce results that are statistically significant are more likely to be published than ones that do not, since editors of medical journals like to have a story to present. This will never be fully overcome until all trials are registered in advance and the publication of results becomes mandatory (whether they show significant differences or not).
The results of a Systematic Review are often shown graphically as a Forest plot (6). An example from the 2013 update of a Cochrane Review comparing Spacers with Nebulisers for delivery of Beta-agonists(7) is shown below.
Figure 1 Forest Plot of Hospital Admissions for Adults and Children with Acute Asthma when treated with Beta-agonist delivered by Holding Chamber (Spacer) compared to Nebuliser (edited to include data in the 2013 update of the review).
The left hand column lists the included studies, which have been sub-grouped into those relating to adults and children. The columns listed ‘Holding Chamber’ and ‘Nebuliser’ list the proportion of patients in each group admitted to hospital and the Relative Risk of admission is shown next to them as a graphical display. Admission is undesirable so the squares and diamonds to the left of the vertical line favour the spacer group. The size of the blue square relates to the weight given to each study in the analysis; this is listed in the next column and generally increases for larger studies. The width of the horizontal line is the 95% confidence interval for each study and this is reported in text in the final column.
The pooled results from adults are shown in the top diamond, for children in the lower diamond . This shows that by combining all the studies in children we can be 95% sure that the true risk of admission when using a spacer lies between 0.47 and 1.08 in comparison with using a nebuliser. There is no significant difference between the two methods and the confidence interval suggests that, in children, nebulisers are at best no more than 8% better than spacers and may be up to 53% worse.
So how can these results be translated into clinical practice? This question will be the focus of the next article in this series.
1. Haynes RB. What kind of evidence is it that Evidence-Based Medicine advocates want health care providers and consumers to pay attention to? BMC Health Serv Res 2002;2(1):3 http://www.biomedcentral.com/1472-6963/2/3
2. Cates C. Evidence-based medicine: asking the right question. Prescriber 2002;13(6):105-9.
3. Cates CJ, Jefferson TO, Bara AI, Rowe BH. Vaccines for preventing influenza in people with asthma (Cochrane Review). In: Cochrane Library: Update Software (Oxford); 2000.
4. Po A. Hierarchy of evidence: data from different trials. Prescriber 2002;13(12):18-23.
5. Bandolier. Bias. Bandolier 2000;80-2:1-5 http://www.jr2.ox.ac.uk/bandolier/band80/b80-2.html
6. Lewis S, Clarke M. Forest plots: trying to see the wood and the trees. BMJ 2001;322(7300):1479-1480.
7.Cates CJ, Welsh EJ, Rowe BH. Holding chambers (spacers) versus nebulisers for beta-agonist treatment of acute asthma.Cochrane Database of Systematic Reviews 2013, Issue 9. Art. No.: CD000052.DOI: 10.1002/14651858.CD000052.pub3. (added in 2016)
Reproduced with permission and edited October 2016.
Which diabetic patients benefit from statin treatment?
This is the question that the HPS study (1) sought to answer in 5963 adults with diabetes (aged 40 to 80 years). Previous studies on statins have included small numbers of diabetic patients, and showed results that were in keeping with the overall benefit of statins in other groups of patients. However, the wrong conclusion can be drawn if the results from a small subgroup of patients do not show a significant effect (as the confidence interval will be wide when fewer patients are included).
The HPS study was powered to reliably detect a reduction in risk of a quarter by including about 3000 diabetic patients in each arm of the study (one group receiving simvastatin 40mg daily and the other a placebo tablet). When all the patients having a first vascular event were measured over the five-year study period, the rate in the placebo group was 25.1% and in the statin group was 20.2%. This represents a relative risk of 0.76 (95% confidence interval 0.72 to 0.81) and is most unlikely to represent a chance finding (p< 0.0001).
We now have the results from a large enough study to show that statins are effective at preventing vascular events in diabetes, which supports the previous evidence that diabetics seem to derive similar benefit to other risk groups.
An advantage of the large size of the study is that sub-group analysis can be carried out. This showed that the proportional reduction in risk is largely independent of the type of diabetes, the degree of glycaemic control when the statin was started and also was not detectably altered by the lipid concentrations. The benefit with statins was similar whether the patients had a cholesterol level of greater or less than 5 mmol/L when they started on the statin. Patients derived significant benefit whether their initial cholesterol level was raised or not, and the test for difference between groups (or heterogeneity) was negative (p value of 0.7).
Relative Risk and Absolute Difference
Once the study has shown that the relative risk is similar in all diabetics studied the issue for implementation is how much absolute benefit the patients will obtain from being given the statin treatment. This will be determined by the baseline risk of the diabetics being considered, and the rate of major vascular events over 5 years was 36% in patients with arterial disease and diabetes, and 13% in those with diabetes and no previous arterial disease. Therefore when the diabetics without pre-existing arterial disease are considered, although the relative risk is very similar the event rates are lower. In the placebo group 13.5% suffered a vascular event over 5 years, whilst 9.3% of those on simvastatin suffered a similar event. The figures for the diabetics whose pre-treatment LDL cholesterol was under 3.0 with no known occlusive arterial disease were 11.1% in the placebo group and 8.0% in the simvastatin group.
So what does this mean in practice? The writers of the paper suggest that it is time to move away from making decisions about statins for diabetic patients on the basis of their initial cholesterol level and instead look at their overall risks of vascular disease. Stopping smoking, reducing blood pressure and reducing cholesterol are all of benefit in such patients and the challenge to the health care providers is to decide how to tackle all three areas. The threshold for starting statins should be determined by overall cardiovascular risk, and the level set for treatment will need to be determined by the local health economy. I suspect that this may generate a vigorous debate.
Visual Rx pictures (Cates Plots)
I have used the Statin Calculator on this website with an overall risk ratio of 0.75 to generate smiley face plots of 100 patients given statins from two levels of risk and these are shown below. The largest benefit is seen in the patients with both diabetes and arterial disease (five year NNT is 12), and although the relative risk is the same in those diabetics with no arterial disease, the absolute benefit is less (five year NNT is 33). The benefits of treatment may be underestimated as 17% of patients in the placebo group took non-study statins during the course of the study. The inclusion of subsequent vascular events would also make the event rates larger and the NNT smaller.
Diabetics with occlusive arterial disease
The picture shows that the 74 green faces will be free from a vascular event on placebo as well as on simvastatin. The 27 red faces will suffer a vascular event over the five years even if all the patients are given simvastatin, but if all 100 patients are put on simvastatin the 9 patients with yellow faces will avoid a vascular event that they would have suffered on placebo. The NNT is therefore 12 because this is the number that need to be put on simvastatin for five years to prevent one vascular event. The calculation to get the NNT is 100 divided by 9 (100/9 = 11.1) and rounded up to the next whole number, which is 12.
Cates plot showing the impact of simvastatin on vascular events over five years in Diabetics with occlusive arterial disease
Diabetics without occlusive arterial disease
In this case 87 out of 100 patients treated with simvastatin would not have suffered a vascular event anyway (the green faces) and the 10 red faces still suffer an event in spite of the statin, but the 3 yellow faces are saved from having a vascular event. As we do not know who these 3 patients will be, all 100 have to be given the statin, meaning that the NNT is 31. In other words 31 patients from this group need to be treated to prevent one vascular event.
Cates plot showing the impact of simvastatin on vascular events over five years in Diabetics without occlusive arterial disease
You may be wondering why the NNT is not 34, since 100 divided by 3 is 33.3 which would round up to 34. In fact the three yellow faces actually represent a drop in the treatment group risk to 9.75% (a risk difference of 3.25%), as shown in Table 1 below. The 3.25% risk difference is rounded to the three yellow faces in the Cates plot. Have a go yourself using the Statin calculator and you will see what I mean (but remember to enter 13% as the baseline risk). You should obtain a data table in Visual Rx which looks like this:
Table 1. Table of Natural frequencies.
|Treatment with statins to reduce the risk of heart attacks and strokes
|Outcome: a heart attack, stroke or bypass surgery
|Duration: 10 years
|Control group risk
|Treatment group risk (95% CI)
|NNTB (95% CI)
|(9.10% to 11.05%)
|(NNTB 26 to NNTB 52)
|In the control group 13 people out of 100 had a heart attack, stroke or bypass surgery over 10 years, compared to 10 (95% CI 9 to 11) out of 100 for the active treatment group.
- Group HPSC. MRC/BHF Heart Protection Study of cholesterol-lowering with simvastatin in 5963 people with diabetes: a randomised placebo-controlled trial. Lancet 2003;361:2005-16.
The Prescriber series on evidence-based medicine aims to provide the reader with an easy-to-follow guide to a complex topic. Using practical examples, the articles will help you apply evidence-based medicine to daily practice. In this final article, we look at how cost-effectiveness is calculated.
Previous articles in this series have described the statistical methods used to find out whether treatments are effective in clinical trials and, before embarking on cost-effectiveness analysis, it is wise to check first that there is good evidence that the treatment works. There are many extra levels of uncertainty when costs are considered, as this article demonstrates, and it is important to ensure that the foundational evidence of clinical benefit is in place before building a cost-effectiveness analysis that may rest on a treatment that has not reliably been shown to be better than placebo.
Let us take as an example the recent report on the benefit of ramipril (Tritace) in the secondary prevention of stroke from the Heart Outcomes Prevention Evaluation (HOPE) investigations.1 This was a large study of 9297 high-risk patients over 55 who were treated with either 10mg ramipril daily or placebo for an average of 4.5 years. The study showed a highly statistically significant 32 per cent reduction in the risk of stroke – 95 per cent confidence interval (CI) of 16-44 per cent reduction. The risk of fatal stroke was reduced by 61 per cent (95 per cent CI of 33-78 per cent reduction) over the 4.5 years of the study.
So here we have convincing evidence that ramipril was better than placebo. Subsequent correspondence, however, has pointed out that the presentation of the results concentrates on relative rather than absolute benefits and there is no mention of the potential costs involved in preventing strokes with ramipril.2 Various points are made in the letters, both about the way the results are presented and about the remaining uncertainty in relation to whether this effect is specific to ramipril (or ACE inhibitors in general) or is a general benefit of blood pressure reduction.
In order to carry out a cost-effectiveness analysis, the consequences of a treatment must be measurable in suitable units (those that measure an important outcome),3 so in this case the unit could be one stroke. By making the unit of analysis ‘one stroke prevented’, the costs of caring for stroke can be set on one side and the costs of different treatments can be calculated. There is debate about whether non-NHS costs should be included, but for simplicity we will restrict ourselves to the ingredient costs of the drugs used for stroke prevention and ignore the costs of blood tests for monitoring.
We find that because strokes only occurred in 4.9 per cent of the patients in the placebo group, the impressive 32 per cent relative risk reduction actually translates into an absolute risk reduction of 1.5 per cent and a number needed to treat (NNT) of 66 people (95 per cent CI of 49 to 128) for 4.5 years to prevent one stroke.
The cost of the drug for 66 people for this length of time is about £58 000 (£196 per year each), and the confidence intervals of the NNT translate into a range of £43 000 to £113 000 to prevent one stroke. The temptation to make direct cost comparisons with the results of other drugs in reducing stroke is strong but care needs to be exercised.
There is a recognised difficulty in comparing NNTs for different treatments that do not have the same duration4 and this can be overcome by looking at cost-effectiveness as the duration of treatment is taken into account. This is because more events will be prevented with longer trial durations but the costs of treatment will go up in parallel, so the cost per event should stay the same whatever the duration considered.
There is, however, a residual problem in relation to any kind of absolute treatment effect (including cost-effectiveness). The size of absolute benefit is closely related to the baseline risk of the patient being treated, so high-risk patients will tend to show lower NNT and lower costs per event saved. This is because the relative risk reduction tends to be fairly consistent across different levels of baseline risk. This is demonstrated in the ramipril results where the relative risk reduction is very similar for patients with high and normal blood pressure,1 but those with higher blood pressure have higher absolute risks of stroke and therefore derive more benefit from treatment.
A further example of this relates to the cost of using statins. To prevent one cardiovascular event, fewer patients need to be given a statin when they are used for secondary prevention (where the baseline risk is high) in comparison with primary prevention (lower baseline risk). For this reason, before comparing costs between trials or meta-analyses of different treatments against placebo, it is important to check that the baseline risk of the patients in the placebo group is similar. In fact the patients included in the Heart Protection Study5 did have similar baseline risks of stroke and a similar duration of treatment.
Here it is reasonable to compare the costs of using a statin and this works out as more expensive, at around £100 000 to prevent one stroke – this allows for the fact that some placebo arm patients ended up on a statin and not all the active patients stayed on treatment. Aspirin is many orders of magnitude cheaper at around £500 per stroke prevented, but hopefully most patients will be receiving this already.
Head-to-head comparisons of different interventions in a single trial can overcome the above difficulties, but in order to generate the power required to reliably detect small differences, prohibitively large numbers of patients need to be recruited. This in turn raises a further question about whether the costs of finding the answer outweigh the benefits of knowing it!
It is a mistake to think that economic analysis is only about minimising the costs of the treatment itself; if this were the only concern all asthmatics would be treated with oral steroids (the cheapest option). Clearly this ignores the known risks of long-term systemic treatment with oral steroids and would be entirely unethical.
In some situations, however, there is enough reliable information to persuade us that different treatments lead to similar outcomes, and in this instance a cost minimisation approach can be used. An example of this is the use of different delivery devices in asthma. A systematic search of the literature6 found that there is little evidence for any of the devices producing superior outcomes in clinical trials, so a cost minimisation analysis was carried out in which the costs of the devices were directly compared. Since a metered-dose inhaler with spacer is the cheapest method available this is the preferred first-line delivery method to try, but of course this does not mean that some patients will not need dry powder devices or breath-activated inhalers.
In some cases treatments cannot be directly compared using one of the simpler methods above as the treatments alter quality and quantity of life. Many of the treatments used in cancer fall into this category and assessments have to be made that incorporate both mortality and quality of life (QoL).
One way of judging how much people value their current health status is by using a standard gamble technique. Patients are asked to consider the theoretical possibility of having a treatment for their condition that had a chance of leaving them in perfect health or causing death; the odds of each outcome are adjusted until they are unsure whether to accept the treatment or not, and this can be used to rate their current QoL. This information can then be turned into quality-adjusted-life-years (QALYs) to allow the results of treatments for different diseases to be compared.
Since all economical analysis requires assumptions to be made about the cost of treatments and the value of outcomes, it is usual to carry out a sensitivity analysis to see how much the results of the analysis vary when the assumptions are altered. In particular, it may be necessary to predict what would happen beyond the timescale of the trials by using modelling techniques. If the results are very unstable when the assumptions are adjusted, this should be made clear and the reader will need to interpret the analysis with more caution.
Decisions have to be made
In the real world medical needs will always exceed the ability of any healthcare system to provide them. Hard choices have to be made every day about how best to use the resources that are available to us. The best available evidence of treatment efficacy (usually from systematic review of the results of randomised controlled trials) has to be combined with an economic analysis. Then hard choices must sometimes be made.
These are the processes used by the National Institute for Clinical Excellence (NICE), and they should be as transparent as possible so that we can see how the decisions were reached, even if we do not agree with all of them.
Table 1. Glossary of terms
A form of economic study design in which consequences of different interventions may vary but can be expressed in identical natural units; competing interventions are compared in terms of cost per unit of consequence
An economic study design in which the consequences of competing interventions are the same and in which only inputs are taken into consideration; the aim is to decide which is the cheapest way of achieving the same outcome
A form of economic study design in which interventions producing different consequences in both quality and quantity of life are expressed as utilities; the best known utility measure is the quality-adjusted-life-year or QALY; competing interventions can be compared in terms of cost per QALY
A technique that repeats the comparison between inputs and consequences, varying the assumptions underlying the estimates – in doing so, sensitivity analysis tests the robustness of the conclusions by varying the items around which there is uncertainty
I would like to thank Professor Miranda Mugford for permission to use the glossary terms from Elementary Economic Evaluation and for helpful comments on this article.
I would like to thank Professor Miranda Mugford for permission to use the glossary of terms from Elementary Economic Evaluation in Health Care and for helpful comments on this article.
1. Bosch J, Yusuf S, Pogue J, et al. Use of ramipril in preventing stroke: double blind randomised trial. BMJ 2002; 324:699-702.
2. Badrinath P, Wakeman AP, Wakeman JG, et al. Preventing stroke with ramipril. BMJ 2002;325:439.
3. Jefferson TO, Demicheli V, Mugford M. Elementary economic evaluation in health care. 2nd ed. London: BMJ Books, 2000;132.
4. Smeeth L, Haines A, Ebrahim S. Numbers needed to treat derived from meta-analyses – sometimes informative, usually misleading. BMJ 1999;318:1548-51.
5. MRC/BHF Heart Protection Study of cholesterol lowering with simvastatin in 20 536 high-risk individuals: a placebo-controlled randomised controlled trial. Lancet 2002;360:7-22.
6. Brocklebank D, Ram F, Wright J, et al. Comparison of the effectiveness of inhaler devices in asthma and chronic obstructive airways disease; a systematic review of the literature. Health Technol Assess 2001;5:1-149.
Have the results changed?
Yes and No! The overall difference made by antibiotics remains very similar in terms of the Odds Ratio of being in pain at 2 to 7 days; this is still 0.6 and has not been changed much by the data from the new trial in children under 2 years of age (1). The Odds Ratio from this study is also in keeping with the overall effect at 0.55 but what is strikingly different is the proportion of children in this group who are still in pain on placebo treatment. For all of the other studies combined the proportion still in pain at 2 to 7 days is 14% (145/1005) whilst in the Damoiseaux study the figure is 70% (89/123). When all the studies are combined together the proportion in pain is 21% (234/1128) and combined with the pooled Odds Ratio of 0.6 this means that seven out of every 100 treated will see a benefit, and leads to an overall NNT of 14 as was shown in an article in Prescriber on putting evidence into practice. This overall Cates plot is shown below, but it conceals differences between the Damoiseaux study and the other trials.
If we take the pooled Odds Ratio of 0.6 as the best measure of effect and enter this in Visual Rx with the Control event rate of 14% (from all the other trials) this will give an NNT of 19, as five (yellow faces) will benefit for every 100 treated. This is very similar to the original review and is shown in the Figure below:
In contrast the same Odds Ratio of 0.6 applied to the 70% Control Event Rate in the Damoiseaux trial in children under 2 there will be 12 (yellow faces) who benefit for every 100 treated as shown in the Cates plot below, and this will give an NNT of 9 .
This would suggest that in children similar to those studied by Damoiseaux in the younger age group, there is more benefit in using antibiotics than in those from the other trials. Perhaps the age of the child should be taken into consideration, as well as the level of fever and systemic illness (2) in making a decision about deferring antibiotics in children with ear infections.
Please also see the paper on discharging ears to help decide which children may benefit most from antibiotics for their ear infections.
- Damoiseaux RAMJ, van Balen FAM, Hoes AW, Verheij TJM, de Melker RA. Primary care based randomised, double blind trial of amoxicillin versus placebo for acute otitis media in children aged under 2 years. BMJ 2000;320(7231):350-354
- Little P, Gould C, Moore M, Warner G, Dunleavey J, Williamson I, et al. Predictors of poor outcome and benefits from antibiotics in children with acute otitis media: pragmatic randomised trial. BMJ 2002;325(7354):22-8
An introduction to Evidence Based Medicine
What do you mean by evidence-based medicine? Whilst the term evidence based medicine (EBM) is probably familiar to most readers, it is worth pausing initially to think about what we understand by the term. The claim that a position is “evidence based” can be used to try to silence any questions or argument. On the contrary, asking questions about the evidence for any suggested course of action is at the heart of EBM philosophy. I can do no better than to quote the introduction to one of my favourite books in this area, Follies and Fallacies in Medicine(1), in which the authors describe themselves as suffering from incurable “scepticaemia”.
The aim of our book is to reach inquisitive minds, particular those who are still young and uncorrupted by dogma. We offer no solutions to the problems we raise because we do not pretend to know of any. Both of us have been thought to suffer from scepticaemia* but are happy to regard this affliction, paradoxically, as a health promoting state. Should we succeed in infecting others we will be well content. *Scepticaemia: An uncommon generalised disorder of low infectivity. Medical school education is likely to confer life-long immunity.
The first step towards using EBM to inform our daily practice is to be prepared to question whether we always know the best course of action or have looked at the evidence that underpins the decisions that we make.
We are certainly influenced by our own past experience, what our colleagues do and what experts tell us. These often enlighten us and inform our practice, but we must also be aware that experiences are subject to chance variation, and that the person who is closest at hand may not give the best advice. For example, the experience of the last patient with a condition is not necessarily the best pointer for the next one. What we were taught in medical school may also now be out of date. We do well, however, to remember that our own experience and those of our patients are always important and worth exploring. How many times have you had the experience of suddenly understanding why a patient has presented with a longstanding headache when they let slip that a friend at work had been diagnosed as having a brain tumour?
What EBM is not
Whilst it is invaluable to know what the evidence is in relation to problems that we have to investigate and treat, you may be surprised to learn that the advocates of EBM would be the first to agree that evidence is only a small part of making clinical decisions (see box).
"First, evidence alone is never sufficient to make a clinical decision. Decision-makers must always trade the benefits and risks, inconvenience, and costs associated with alternative management strategies, and in doing so consider the patient's values." Users Guides to the Medical Literature(2)
EBM is not a kind of cookbook medicine full of easy answers to difficult questions, and it can be quite time-consuming. In general as we dig into the evidence we find that there is much that is unknown, but tolerance of uncertainty is well known to us in primary care, and in my experience sharing this uncertainty carefully with patients is often surprisingly well received.
'For every complex problem there is a simple answer, and it's wrong.' HL Menken
Why is EBM important?
There is an ever-increasing quantity of medical literature published each week and keeping up to date is a huge challenge. It is simply not possible to read all the relevant literature (even in our areas of special interest), so how can we stay in touch with recent developments? If you have written a personal learning plan I wonder whether this is a recognised problem and how you plan to address it?
Increasingly we are put under pressure by patients who have read about a new treatment in the paper or found an article on the Internet, or by consultants who advocate particular referral or treatment pathways for patients with particular symptom presentations. So how are we to respond?
The medical literature is a powerful resource for us, but we have to recognise that it serves many different needs. Those who commission and carry out medical research need somewhere to publish the findings of their work. This may be of high or low quality, and it is not necessarily safe to assume that publication of a paper in a peer-review journal means you can believe all that the authors say. Just look at the subsequent correspondence if you want to see what I mean!
The bottom line is whether this paper means that I should change what I am currently doing, and in order to assess this some basic skills are needed. Many of these, including some explanation of statistical concepts, will be covered in later articles in this series, but the first useful skill is being able to turn a vague concern into an answerable question.
We need to be able to pose a question that reliable research studies can answer. The structure of such a question in relation to treatment options will have 4 parts to it and can be summarised using the acronym PICO. We need to consider the Patient’s problem, the Intervention suggested, the possible Comparative treatments and the Outcomes that matter (see Box).
Thus “Does my child need antibiotics for this ear infection?” might be rephrased “In children with acute otitis media, how much difference do antibiotics make in comparison with paracetamol alone, in terms of duration of pain, deafness, recurrent infections and serious complications”.
Once we have determined the question that we want to ask, we can move on to decide what is the most valid evidence to answer the question and how to find it.
Archie Cochrane’s Challenge
I was impressed as a student by Archie Cochrane’s book ‘Effectiveness and Efficiency’ in which he pointed out that we could be as efficient as we like in providing medical care, but that if it is not effective care we are wasting our time(3). He set out a challenge in 1979 as follows(4):
It is surely a great criticism of our profession that we have not organised a critical summary, by specialty or subspecialty, updated periodically, of all relevant randomised controlled trials.
In response to this challenge the Cochrane Collaboration prepares and updates such summaries in the form of systematic reviews of the best evidence available, and there are now over 1,000 of these on the Cochrane Library. Whilst there will inevitably be gaps in this database for some time to come, increasing numbers of reviews do address issues related to primary care.
I would be the first to admit that Cochrane reviews are not light reading, but a later article in this series will address the subject of how to understand systematic reviews. Moreover part of the purpose of publications such as Clinical Evidence is to summarise the results of Cochrane reviews in a concise understandable format.
EBM in daily practice
If we want to practice better medicine we will need to keep up to date with new developments and decide how to integrate them into our practice. The concept of Clinical Governance challenges us to demonstrate whether we have been able to measure changes in our practice as a result. This can be challenging and exciting but we have to be realistic about how much can be achieved in the face of numerous demands made upon us and the volume of uncertainties that we face every day. We also need to avoid efficiently implementing treatments that are not effective!
There is little point wasting time looking for answers that probably do not exist, and in my experience the quickest place to start looking is in a synopsis of published research that has already been assessed for quality, such as Clinical Evidence or Best Evidence (an electronic summary of Evidence Based Medicine Journal and ACP Journal Club). Whilst searching Medline may be more familiar the best data tends to be buried in a sea of other material. Again this will be dealt with in more depth in a future article.
So if all this sounds like hard work – it is! But it is worth it and it can be fun, so look out for the future topics in this series that may change the way you read journals and perhaps even how you practise in the future.
1. Skrabanek P, McCormick J. Follies and Fallacies in Medicine. 3 ed: Tarragon Press; 1998.
2. Guyatt G, Rennie D. Users’ Guides to the Medical Literature: AMA Press; 2001.
3. Cochrane A. Effectiveness and Efficiency: The Nuffield Provincial Hospitals Trust; 1971.
4. Cochrane A. 1931-1971: a critical review, with particular reference to the medical profession. In: Medicine for the year 2000. London: Office of Health Economics; 1979. p. 1-11.
Reproduced with permission.
The Lancet reported the results of the Progress trial in which Perindopril was used to lower the blood pressure of patients following strokes in 2001. There were some problems in the analysis of the results from this trial because the investigators were allowed to choose one of two alternative regimens for the patients. In one group the patients were randomised to Perindopril or placebo, and in the other group where the doctors felt it was appropriate to use a thiazide diuretic as well as Perindopril the patients were randomised to Perindopril and Indapamide or double placebo.
In the analysis of the trial results the combination of Perindopril and Indapamide resulted in statistically significant benefits to the patients in terms of prevention of stroke and of major vascular events, whereas Perindopril alone did not reach statistical significance. It is already known that Indapamide improves clinical outcomes so it is possible that the main benefit in the combined drug group comes mostly from the Indapamide, and it is not possible to separately assess the added benefit of Perindopril.
If the second group had been randomised to all get Indapamide and then either Perindopril or Placebo on top it would have been reasonable to combine the results from all the included patients to make an overall assessment of the benefit attributable to Perindopril. The combined results presented in the paper are seriously confounded by the addition of Indapamide in some of the patients and are therefore uninterpretable.
I am grateful to the GP Trainers in Kingston (London) who suggested looking at this paper in a Critical Appraisal Workshop. When we asked what question this study was trying to answer it became clear that the two regimens were addressing different questions: the first is assessing the benefit of Perindopril against placebo, the second Perindopril and Indapamide against double placebo. These are quite distinct and if these were two separate studies I would not be happy to combine them into a single result in a Meta-analysis (as the trialists have done in the paper). What is more they carry out a test for heterogeneity (difference) between the two regimens and find highly significant differences in outcome for both stroke and all adverse events (p< 0.001 in Figure 5 of the original paper).
We concluded that a thiazide diuretic is worth considering to lower the blood pressure of all patients following a stroke, but the case for Perindopril is unproven. The editorial team at the Lancet seem to think this is a fair criticism as they have published my letter making this point, and if you want to look for yourself at the paper or the letters they are both available on the Lancet Website.
The lowering of blood pressure after stroke. Cates C. The Lancet – Vol.358, Issue9297, 08 December 2001,Page1993
In the study of the perindopril protection against recurrent stroke study (PROGRESS) Collaborative Group (Sept 29, p 1033), the first group were given perindopril alone and did not differ significantly from the combination therapy group in rate of stroke or major vascular events by comparison with placebo. The second group were given perindopril and indap amide, but this treatment was compared with double placebo.
If all patients in the second group were given indapamide and randomly assigned perindopril or placebo, it would make sense to combine the results from both groups, on the basis that both were placebo comparisons of perindopril, but with different co-interventions. Use of the second placebo means that the two groups are actually answering separate questions of the efficacy of perindopril’s alone (group one) and in combined treatment (group two).
Pooling the results of the two groups makes little sense to me under these circumstances, since the known efficacy of indapamide is a serious confounding factor. Moreover, the question arises of how much benefit perindopril adds to use of indapamide alone. In view of the 10% of patients taking perindopril who withdrew in the run-in period and the surprising lack of efficacy in relation to the average fall in blood pressure noted by Jan Staessen and Jiguang Wang, my take-home message from this report would be to start patients on a thiazide diuretic after a stroke or transient ischaemic attack. Addition of perindopril might be beneficial, but, unfortunately, the double placebo in this study makes the study design unsuitable to address that question directly.”
Response to: PROGRESS Collaborative Group. Randomised trial of a perindopril-based blood-pressure-lowering regimen among 6105 individuals with previous stroke or transient ischaemic attack.Lancet 2001;358:1033-1111
This article is part of a series on Critical Reading.
In an article on sub-group comparisons I warned about the danger of paying too much attention to results from patients in particular sub-groups of a trial, arguing that the overall treatment effect is usually the best measure for all the patients.
In the same way, when the results of all available clinical trials are combined in a Systematic Review (for example in a Cochrane review) care is still required in the interpretation of the results from each individual trial, and the main focus is on the pooled result giving the average from all the trials. The results are often displayed in a forest plot as demonstrated below. The result of each trial is represented by a rectangle (which is larger for the bigger trials) and the horizontal lines indicate the 95% confidence interval of each trial. The diamond at the bottom is the pooled result and its confidence interval is the width of the diamond.
As hospital admissions for acute asthma were rare in each trial (shown in the columns of data for Holding Chamber and nebuliser) the uncertainty of the individual trials is seen in wide confidence intervals but when these are pooled together the uncertainty shrinks to a much narrower estimate. The pooled odds ratio of one indicates no difference shown between delivery methods for beta-agonists in acute asthma as far as admission rates are concerned, but the estimate is still imprecise and compatible with both a halving or a doubling of the odds of being admitted to hospital. So we have to say that we do not know whether there is a difference in the rate of admissions between the two delivery methods.
Before all the results are combined it is wise to carry out statistical tests to look for Publication Bias. There is evidence that positive results from Clinical trials are more likely to be published in major journals, and in the English language than similar trials that report negative results. When published studies are combined this leads to a tendency to overestimate the benefits of treatment. The easiest way to look for this is using a funnel plot of the results from the trials, where the results of each trial are plotted against the size of each study. Chance variations mean that small studies should show more random scatter in both directions around the pooled result. If all the small studies are showing positive results there is a suspicion that other small studies exist with negative results but were not published. The funnel plot shown below is taken from a Cochrane review of the use of Nicotine gum for smoking cessation and is reasonably symmetrical.
A further important check is to look for Heterogeneity. The individual trials will again show chance variation in their results and in a Systematic Review it is usual to test whether the differences are larger than those expected than by chance alone. The Forest plot above shows that the Heterogeneity in this set of trials is quite low. However if significant Heterogeneity is shown (in other words the results are more diverse than expected) it is recommended to explore the reasons why this may be. Although statistical adjustments can be made to incorporate such Heterogeneity (using a so called Random Effects Model) this should not be accepted uncritically. It may be more sensible not to try to combine the trial results at all.
An example of this can be found in the BMJ in October 1999 in which a group from Toronto published a meta-analysis of Helicobacter eradication (1). The statistical tests showed considerable Heterogeneity between the trials that was largely ignored by the authors. Inspection of the trials shows that there were two types; some with outcomes measured at six weeks using single treatments and others using triple therapy and measuring dyspepsia at one year. There is no good clinical reason to put these together and this may well explain the diversity of the results (2).
The message is to use your common sense when deciding whether the differences between the outcomes measured and the treatments used in each trial mean that it is safer not to calculate a single average result (not least because the average is not easy to interpret and apply to clinical practice).
1. Jaakimainen RL, Boyle E, Tuciver F. Is Helicobacter pylori associated with non-ulcer dyspepsia and will eradication improve symptoms? A meta-analysis. BMJ 1999;319:1040-4
2. Studies included in meta-analysis had heterogeneous, not homogeneous, results. Cates C. BMJ 2000;320:1208
This article is part of a series on Critical Reading.
Controlled clinical trials are designed to investigate the effect of a treatment in a given population of patients, for example aspirin is given to patients with ischaemic heart disease. Inevitably there will be differences between the patients included in the trial (men versus women, older versus younger, hypertensive versus non-hypertensive).
It is tempting to look at the effects of treatment separately in different types of patient in order to decide who will benefit most from being given the treatment. Although this analysis of the sub-groups of patients is widely carried out in the medical literature, it is not very reliable. And the ISIS-2 trial gives a clear example of how this can be misleading . The trial looked at the effect of aspirin given after acute myocardial infarction, and when the results were reported the editorial team at the Lancet wished to publish a table of sub-group analyses. The authors agreed as long as the first line in the table compared the effects in patients with different birth signs .
The analysis showed that aspirin was beneficial in all patients except those with the star signs of Libra and Gemini. This served as a warning against the over interpretation of the results of the other sub-groups reported in the paper. The problem is that the play of chance can lead to apparently significant differences between sub-groups, and these are really only helpful in very large trials which show really big overall differences in the treatment and control groups.
Two examples of the use of sub-group analysis are somewhat contentious. The first was reported in the Lancet and looked at the evidence from different trials of mammography to try to reduce deaths from breast cancer. The overall result from all the trials together showed mammography to be of significant benefit, but the authors looked at the characteristics of the trials and felt that some were more reliable than others. The data from these selected trials did not show a benefit from mammography. On this basis the authors concluded that screening for breast cancer was unjustified.
Use of aspirin
Similarly a recent paper in the BMJ suggested that aspirin may not be useful for primary prevention in patients with mildly elevated blood pressure on the basis of the results of patients in this sub-group . I would suggest that before deciding about aspirin for such patients you ask yourself whether you would still treat those with the Libra and Gemini birth signs with aspirin following an MI. Moreover if patients on aspirin for secondary prevention of ischaemic heart disease ask whether they should stop if their blood pressure is up a bit, my answer would be no.
The bottom line is that the best overall estimate of the effect of a treatment comes from the average effect on all the patients and not from the individual sub-groups . Sub-group analysis is generally best restricted to the realm of generating hypotheses for further testing rather than evidence that should change practice.
1. Horton R. From star signs to trial guidelines. Lancet 2000;355:1033-34
2. ISIS-2 Collaboration group. Randomised trial of intravenous streptokinase, oral aspirin, both, or neither among 17,187 cases of suspected myocardial infarction. Lancet 1988; ii:39-60
3. Gotzche PC, Olsen O. Is screening for breast cancer with mammography justifiable? Lancet 2000;355:129-34
4. Meade TW, Brennan PJ, on behalf of the MRC General Practice Research framework. Determination of who may derive the most benefit from aspirin in primary prevention; subgroup results from a randomised controlled trial. BMJ 2000; 321:13-7.
5. Gotzsche PC. Why we need a broad perspective on meta-analysis. BMJ 2000; 321:585-6