Do I need to change my practice (Pulse Article 2001)?

This article is part of a series on Critical Reading.

When speaking to registrars about critical appraisal, one of the commonest question is “How do I decide whether the paper is good enough to warrant a change in my current practice?” In the article on asking a good question I described how to break down the question addressed by a research paper into its four components, and having done this you next have to decide whether the findings of the paper are likely to be important to you and especially to your patients.

Is it valid?

In particular is the approach being described in the paper worth trying on the next patient who presents with the relevant condition. To answer this we need to look at issues relating to the validity of the paper in question. Two types of validity have been described: internal validity which relates to the mechanisms of the study itself and external validity which is more to do whether the results of the paper can be extrapolated to the patient in our own practice. In the rest of this article I will concentrate on issues of internal validity using as an example an imaginary study of olive oil for children with acute otitis media.

Choosing controls

The key issue to think about in relation to internal validity is to look at how a comparison group is chosen in relation to the patients who are given the experimental treatment. In a case-series (for example a set of 6 patients who are given a new treatment in routine practice) there may be no comparison group at all, so the immediate concern is that they might have achieved a good result anyway. For example I might tell you that I have treated a series of 100 children with acute otitis media with warm olive oil and that 85 were better in a few days. This sounds impressive until you look at the results of placebo treatment in antibiotic trials for this condition and find a similar recovery rate.

Better than a case series would be a case-control study in which the records of patients who had prolonged pain following ear infections were checked to see how many had been given olive oil; this proportion receiving olive oil could then be compared to the proportion of olive oil use in other patients who did not have prolonged pain. The problem now is being sure that the children do not have other differences influencing the olive oil usage, and this is rarely possible.

Better still a group of children could be compared by offering parents the choice of whether they use the oil or not; this would constitute a prospective cohort study but uncertainty remains about possible important differences between those who chose to have the oil and those who refuse it.

Overcoming Bias

In both the case-control study and the cohort study design the threat to internal validity is related to bias in the choice of the comparison group (selection bias), as well as other possible biases which may be present because both the patient and the doctor are well aware of the treatment that they have received. It will be no surprise to you that the only secure way around these biases is to use a randomised controlled trial that is preferably double-blind, and these will be addressed in the next article.

HRT and heart disease

So are any of these biases important. They certainly can be and a couple of examples may help to show how. In the early non-randomised studies of Hormone Replacement Therapy the results suggested that women on HRT had lower rates of heart disease, and HRT has therefore been advocated as a measure to reduce risks of Ischaemic heart disease(1). Some of the authors of these early studies did point out that there were some problems, particularly as the rates of road traffic accident deaths were also lower in the group receiving HRT. The more recent evidence from randomised controlled trials (such as the HERS study[2]) has not confirmed the protective effect and it is probable that the women who opted for HRT had other differences from the control group and may have had generally lower risk factors for heart disease.

Preventing Teenage Pregnancy

Another example of this was a cross-sectional survey in the BMJ reporting the association between teenage pregnancies and practice characteristics in different areas (3). The results include this statement “On multivariate analysis, practices with at least one female doctor, a young doctor, or more practice nurse time had significantly lower teenage pregnancy rates. Deprivation and fundholding remained significantly associated with higher teenage pregnancy rates.” The problem here is that we have no evidence that the age or sex of the doctors caused the lower rates of pregnancy, and the unexplained association with fund-holding practices having higher pregnancy rates should perhaps ring some alarm bells. No one  suggested that the end of fundholding would solve the teenage pregnancy problem!

A fuller discussion of association and causation can be found in Follies and Fallacies of Medicine (Tarragon Press) [4] which I would recommend as both amusing and informative background reading for all registrars.


1. Barrett-Connor E, Grady D. Hormone replacement therapy, heart disease and other considerations. Annu Rev Public Health 1998;19:55-72

2. Hulley S, Grady D, Bush T et al. Randomised trial of estrogen plus progestin for secondary prevention of coronary heart disease in postmenopausal women. JAMA 1998;280:605-133.

3. Association between teenage pregnancy rates and the age and sex of general practitioners: cross sectional survey in Trent 1994-7. Julia Hippisley-Cox, Jane Allen, Mike Pringle, Dave Ebdon, Marion McPhearson, Dick Churchill, and Sue Bradley. BMJ 2000; 320: 842-845.

4. Follies and Fallacies in Medicine. Skrabanek and McCormick. Tarragon Press.

Asking a good question (Pulse Article 2001)

This article is part of a series on Critical Reading.

Where do you start when trying to judge papers in medical journals? All too often we are in a hurry and glance briefly at the title and then the conclusion of the abstract. However I would suggest that you try to get inside the mind of the writer of the article; try to work out why they carried out this piece of work. It is easier to do this if you have a structure to work to and I suggest using a four part question at this point.

  1. What are the characteristics of the Patients in the trial?
  2. What is the Intervention being studied?
  3. What is it Compared with?
  4. What Outcomes are measured?

Take a piece of paper and jot down the answers to the four questions shown in the box and you will have a neat summary of the question that your paper is trying to answer. You should have a note of the characteristics of the patients in the trial, the main intervention studied, what it was compared with and what outcomes were measured. You can remember the headings using the acronym PICO (Patient, Intervention, Comparison, and Outcome).

Is this an important question?

If you have been able to identify the four parts of the question that the paper is trying to answer the next thing to ask yourself is whether the answer is going to be relevant to you and the patients that you are looking after. Much research is driven by academic or industrial interest and the question may not be relevant to you.

All too often the outcomes chosen are surrogates that are easy to measure but may not reliably indicate whether the treatment will be of real benefit to the patient. Also the comparison may be with the wrong alternative treatment, or the patients in the trial may not be representative of those seen in your practice. Two examples may help to illustrate the point.

Antibiotics for Acute Otitis Media

There is not shortage of randomised controlled trials that have compared one antibiotic with another for the treatment of acute otitis media, and this is an important issue for pharmaceutical companies introducing new antibiotics. However the first question to answer is whether any antibiotic is needed at all, and this cannot be assessed from comparing two antibiotics with each other. What is needed is evidence from trials comparing antibiotic with placebo to decide how much overall difference they make, and indeed the evidence from all identified trials of this type showed limited benefit of antibiotics balanced by side effects from the treatment. (1)

Nebulised Steroids in Asthma

Here again the crucial question is what nebulised steroids are compared with; the obvious alternative delivery method is using a spacer and metered-dose inhaler since the two delivery methods appear to be equally effective when used for delivery of beta-agonists in acute asthma (2). In spite of this there are very few randomised controlled trials that compare these two delivery methods for steroids. Nebulised fluticasone has been shown to reduce the requirements for oral steroids in severe asthmatics when compare with placebo, but to my mind this is not really the key issue. The costs of nebulised steroids are considerably more than using spacer delivery after all, so we need clear evidence of superiority against spacers not placebos in this instance.

In a nutshell

So in summary use the 4 part question to summarise what the paper is about and then decide if it is a question that is worth spending the time to read in more detail. Consider if the question is an important one and if it is you will then need to think about the validity of the research method used before taking too much notice of the results; this will be the subject of the next article in this series.


1. Del Mar C, Glasziou P, Hayem M. Are antibiotics indicated as initial treatment for children with acute otitis media? A meta-analysis. BMJ 1997;314:1526 –1529

2. Cates C J, Rowe BH. Holding chambers versus nebulisers for beta-agonist treatment of acute asthma (Cochrane Review). In: The Cochrane Library, Issue 2, 2000. Oxford: Update Software.

Critical Reading Articles Overview

This section contains a series of articles on critical reading. Six of these were originally written for Pulse magazine in 2001 and have been edited in 2016. There are also articles from a series in Update in 2005. Other articles highlight bias that can occur in the way that research is reported and draw attention the sort of problems that may be worth looking for when reading the medical literature.

NICE 2000 guidance on the use of zanamivir for influenza

Purpose of the NICE guidance: to target zanamivir to at-risk patients with a high likelihood of having influenza.  Hence the restriction to use in such patients and only when the level of circulating influenza-like illness has been confirmed to be above 50/100,000.  Fever of over 38°C and a clinical picture of flu (sudden onset of illness with muscle pains and dry cough) are also a requirement before treatment with zanamivir is considered as many people think they have flu when they have much milder viral illnesses.

Benefits of the treatment: very modest with only a single day reduction in duration of illness and 7% reduction in complications requiring antibiotics.  In other words 14 patients need to be treated with zanamivir for one patient to avoid the need for antibiotics.  No proven benefit in terms of reducing hospital admission or mortality.  (See picture below.)

Side effects: zanamivir can cause wheezing in asthmatic and COPD patients, so such patients are advised to have their reliever inhaler to hand when they take the treatment!  In a study in healthy asthmatics one in 13 developed wheezing.

Children: zanamivir is not licensed for children under 12

Workload:  NICE recognise that there could be a considerable extra workload caused by this guidance (in terms of telephone calls and home visits).  Practices may wish to have a plan prepared to deal with this eventuality.  A possible scenario would be 2 extra visits per GP per day with an unknown extra number of telephone calls for patients enquiring about their suitability for treatment. A questionnaire has been prepared for nurses to use in triaging telephone queries from patients and presumably this will be used by NHS direct, but could also be implemented at practice level. However issuing of prescriptions for zanamivir without seeing the patient seems unwise, in view of the possibility of complications (such as pneumonia), and the fact that it was a new ‘black triangle’ medication.

Cates plot on preventing complications of flu by using zanamivir

If 100 patients are all given zanamivir for a flu-like illness 74 will not suffer a complication requiring antibiotics anyway (shown as green smiling faces below); 20 will still need antibiotics (shown as red faces) and 6 (shown as yellow faces) will be saved from having antibiotics by the use of zanamivir.

Appendix 1

Summary of Nice Guidance on the Use of Zanamivir (Relenza) in the treatment of Influenza

Issue date: November 2000

Review date : June 2002

1.         Guidance

1.1       For otherwise healthy adults with influenza, the use of zanamivir is not recommended.

1.2       Zanamivir is recommended, when influenza is circulating in the community, for the treatment of at-risk adults, who present within 36 hours of the onset of influenza like illness (ILI) and who are able to commence treatment within 48 hours of the onset of these symptoms.

1.2.1   Based on the evidence from clinical trials, at-risk adults are individuals falling into one or more of the following categories:

age 65 years or over

chronic respiratory disease (including chronic obstructive

pulmonary disease and asthma) requiring regular medication

significant cardiovascular disease (excluding individuals with hypertension)


diabetes mellitus

1.2.2   Community based virological surveillance schemes should be used to indicate when influenza is circulating in the community (see paragraph 5.4).

1.2.3   Effective targeting of zanamivir for the at-risk adult population with a high incidence of true influenza is essential to maximise both the clinical and cost effectiveness of this therapy.

1.3       The guidance does not cover the circumstances of a pandemic or a widespread epidemic of a new strain of influenza to which there is little

or no community resistance. In such circumstances, the Department of Health and the National Assembly for Wales might wish to consult the Institute on the need for supplementary guidance.


Choosing controls in non-randomised studies (Lancet 2000 DVT and flying)

I wonder what your views are in relation to the risks of deep vein thrombosis (DVT) and long-haul flights? If you have the opportunity to travel by air do you take an aspirin before you go and perhaps even wear support stockings for the journey (one of my senior colleagues does) as well as getting up and walking about on the flight.

As far as I know none of these approaches has been tested in randomised trials on air passengers, so we have to rely on other types of study such as the research letter published in the Lancet recently (Kraaijenhagen RA, Haverkamp D, Koopman MMW, Prandoni P, Piovella F, Büller HR. Travel and risk of venous thrombosis. Lancet 2000;356:1492).

The authors of this study decided that the ideal control group for patients with deep vein thrombosis, which they could confirm on ultrasound or venogram, was the 75% of patients who presented to hospital with clinical signs of a DVT but tested negatively. On the basis of this control group they found that there was no association between any of the forms of recent travel that they asked the patients about (plane, rail or car travel) and whether their swollen leg was shown to have a clot in the deep veins.

They concluded that this was of some reassurance that travel was not associated with DVT, but there is a major flaw in their reasoning. They have made an implicit assumption (which is not discussed in their research letter) that there is NO association between travelling and swollen legs that do not contain a clot.

I am not sure that a great deal is known about the aetiology of clinically suspicious leg swelling that is found negative on ultrasound or venography, and it is at the least plausible that flying could increase the likelihood of this condition occurring. If for example there was a five-fold increase in both types of swollen leg after flying (that is those that are venogram positive and negative), the odds ratio of having previously flown in the past 4 weeks would still be one when the two groups were compared. For this reason I am not personally reassured by the findings of this study and plan to take whatever precautions I can when I am next on a long-haul flight.

When randomisation is possible the comparability of controls and cases should be less of a problem, but in non-randomised studies the assumptions arising from the choice of controls have to be examined carefully!


Antibiotics and Ear Infections – Patient Handout (BMJ 1999)

Ear infections in children will often get better without needing to use antibiotics; the collected evidence from trials performed in several different countries has shown that most children with ear infections given Paracetamol suspension (such as Calpol) were better in a few days. In fact 17 out of 20 children got better in this way without the use of an antibiotic. In comparison if all 20 children took antibiotics only one extra child got better over the same period, and at present there is no way of knowing which one of the 20 given antibiotics would benefit. Also if the 20 children were all given antibiotics, one was likely to suffer a side-effect as a consequence (such as a rash, diarrhoea or vomiting).

Antibiotics did not reduce pain in the first 24 hours and there was also no difference in the likelihood of a further ear infection or hearing difficulty. In the Netherlands antibiotics have not been used routinely for some years for ear infections; they have less of a problem with antibiotic resistance than in this country.

Change of Policy

In view of the above evidence we have changed our policy and no longer give antibiotics routinely for ear infections in children. We would recommend treatment with Paracetamol suspension, which will reduce pain and fever. It should be given at full dose until the earache is gone. If the ear infection persists, or the child is particularly unwell, then antibiotics may be tried. This will be discussed on an individual basis with you during your consultation with the doctor.

Cates C. An evidence based approach to reducing antibiotic use in children with acute otitis media: controlled before and after study. BMJ 1999;318(7185):715-6 doi: 10.1136/bmj.318.7185.715

A 2000 paper that changed our contraceptive practice

In 2000 my senior partner presented the results of a paper published in the Lancet (1)comparing the standard combined oestrogen and progesterone method (Yuzpe) for post-coital contraception with two doses of progesterone (levonorgestrel) only. Until then women to take large numbers of tablets, but a formulation in two single tablets had become available in the UK (Levonelle-2). The comparison was quite clear cut: less vomiting following the progesterone only regimen and also less pregnancies.

I decided to check this out further on the Cochrane Library and found a review covering emergency contraception which was updated in March 1999. The review found two randomised controlled trials which compared levonorgestrel and Yuzpe (including the WHO study in the Lancet). These Pooled results are displayed graphically below as Cates plots using Visual Rx (version 4). In this case Yuzpe and levonorgestrel have been compared and the graphical displays represent 100 patients who are treated.

Figure 1 demonstrates the pregnancy rates; the green faces are patients who do not fall pregnant whichever regimen they receive, and the one red patient will fall pregnant anyway. The single yellow face represents a patient would be pregnant if given Yuzpe but not with levonorgestrel. This represents a Number Needed to treat of 63 (95%CI 45-193) with progesterone only compared to Yuzpe to prevent one extra pregnancy.

Figure 2 looks at the numbers of patients who will vomit; again the green faces will not be sick with either treatment, and the red ones are sick with both. Here the 14 yellow faces will be patients who do not vomit with levonorgestrel but would have done so with Yuzpe. The Number Needed to Treat is 7 (95% CI 7-8) to prevent one patient vomiting.

Although the new treatment was more expensive we estimated that switching to levonorgestrel should save between one and two pregnancies in one hundred patients attending for post-coital contraception. The extra cost of levonorgestrel was about £200 per pregnancy prevented as it was more expensive than Yuzpe in the UK, but in France it was already available to patients directly from the chemist. For us the extra prescribing cost compared well with the alternative cost and inconvenience of terminations of pregnancy!

We abandoned Yuzpe in our practice and switched to levonorgestrel instead. The only unhappy member of the practice team is one of my other partners who had the topic lined up for her own presentation a few weeks later and had to find a new topic to present!


1. Randomised controlled trial of levonorgestrel versus Yuzpe regimen of combined oral contraceptives for emergency contraception. Task Force on Postovulatory Methods of Fertility Regulation. Lancet 1998; 352: 428-33

2. Cheng L, Gülmezoglu AM, Ezcurra E, Van Look PFA. Interventions for emergency contraception. (Cochrane Review). In: The Cochrane Library, Issue 1, 2000. Oxford: Update Software.


Figure 1: Levonorgestrel v Yuzpe – Patients who became pregnant

Figure 2: Levonorgestrel v Yuzpe – Patients who suffered vomiting

Antibiotics ‘no use’ for acute cough: an example of biased reporting (Pulse Article 1999)

In a previous article I explained the advantages of reporting results of studies as an effect size with Confidence Intervals (usually 95%). The interval defines how certain the study result is in terms of its ability to predict the true average value of a treatment if it were to be given to everyone in the world with a certain condition. In the same issue of the BMJ in which Simon Chapman eloquently exposed the misuse of the lower confidence interval of data presented about the risk of passive smoking, a systematic review of evidence relating to antibiotics and acute cough was published.

Numbers Needed to Treat (NNT)

The review “Quantitative systematic review of randomised controlled trials comparing antibiotic with placebo for acute cough in adults” (Fahey T, Stocks N, Thomas T. BMJ 1998; 316: 906-910) carefully collected together the data from trials which addressed this important question. Nine trials were found but one was excluded because it did not fit the inclusion criteria, leaving eight trials with around 700 patients with results that could be analysed. The results are clearly presented as numbers needed to treat and harm in the Implications section of the paper and the authors calculated that “for every 100 people treated with antibiotic nine would report an improvement after 7-10 days if they visited their general practitioner but at the expense of seven who would have side-effects from the antibiotic. The resolution of illness in the remaining 84 people would not be affected by treatment with antibiotic.”

This information could be extremely useful in discussing with patients whether they need an antibiotic for their acute cough, although it should be noted that the majority of trials used doxycycline or Co-trimoxazole, which are perhaps not first choice antibiotics in this group of patients now. Unfortunately the reporting of the results earlier in the paper is not quite so elegant, and I wonder if the authors have been striving to push the figures into the form they want in order to obtain statistical significance. The diagram below shows the results from the meta-analysis for clinical improvement at day 7-11 and side effects of antibiotics.


To my mind these two effects are quite well balanced and fit with the description of the results for numbers needed to treat above. The authors however take a different view. They report the benefit of giving an antibiotic as being none (presumably because the 95% Confidence Interval includes the possibility of no difference as shown), whilst the possibility of side effects is reported as a non-significant increase. In view of the symmetry shown above this is not exactly even handed. Moreover they then proceed to adjust the data by removing the only trial that showed an excess of side effects in the placebo group, (which might be expected by chance in some trials with small numbers), and suddenly the non-significant trend reaches statistical significance!

All this makes me suspicious that the authors were keen to deliver the message that antibiotics are not much use in acute cough, and perhaps they have been a bit biased in the way that the results are displayed. This may not always be easy to spot in a paper, but it is certainly worth looking at the way results are reported when they take the form of a trend which does not reach significance as this may give clues about the authors’ views on the data.

Sensitivity Analysis, Sub-group analysis and Heterogeneity.

Sensitivity analysis is an expected part of meta-analysis and it involves excluding the data of lower quality to see whether the overall result is changed. It is also possible to carry out sub-group analysis to look for differences between different groups of patients or treatments, so for example the data could have been divided into trials which used Erythromycin as one sub-group, Doxycycline as a second group and Co-trimoxazole as a third. There are however dangers in data dredging and it is safest when specified in advance for a small number of sub-groups. It should also be pointed out that the sub-groups do not randomise one treatment against another and the protection against bias is lost in this type of comparison.

A final reason to split up the data is if significant heterogeneity is shown between the trials; normally this would be presented as a Chi-squared statistic for each outcome and hopefully will be accompanied by its p value. A simple shortcut when looking at the graphical display for the trials is to see whether the 95% Confidence Intervals all overlap; if they do not there are probably significant differences between the trials.

Reporting Results of Studies: can passive smoking really be good for you? (Pulse Article 1999)

Passive smoking and health risks.

“Passive smoking may be good for you” or so the tobacco companies would like us to believe! This idea arose from a misrepresentation of the confidence interval for data on passive smoking, and provides a good example of why we need a working knowledge of some statistics to deal with the propaganda that comes our way in General Practice. Sadly statistics is reported to be one of the subjects least liked by medical students, and those of us who have been in practice for more than a few years may be unfamiliar with some of the ways that results of studies are now reported. There has been a shift away from the use of p values towards Confidence Intervals (CI) in many medical journals, and the British Medical Journal now expects authors of papers to present data in this way.

Don’t forget common sense

Before going into more detail about the use of Confidence Intervals the example quoted for passive smoking above may be swallowed by the public, and even in some cases by journalists, but hopefully most GPs would be suspicious that such a finding just does not make sense. It does not fit with all the other data that has emerged in the past 20 years, and therefore needs some further looking at. Never leave common sense behind when looking at statistical reports!

Confidence Intervals or P values

So what are Confidence Intervals all about and how did they get misused in this example? In general when research is undertaken the results are analysed with two separate questions in mind. The first is how big is the effect being studied (in this case how big is the risk of lung cancer for passive smokers)? The second question is how likely is it that the result is due to chance alone? The two issues are connected, because a very large effect is much less likely to have arisen purely by chance, but the statistical approach used is different depending on which question you are trying answer. The “p” value will only answer the question “what is the chance that the study could show its result if the true effect was no different from placebo”? The Confidence Interval describes how sure we are about the accuracy of the trial in predicting the true size of the effect.

Both questions relate to the fact that we cannot know what the effect would be of a treatment or risk factor on everyone in the world; any study can only look at a sample of people who are treated or exposed to the risk. We then have to assume that if, say, one hundred identical studies were carried out in the same way on different groups of patients the results found would be normally distributed around the average effect size of the treatment. The larger the number of patients included in the trial the closer the result of that trial are likely to be to the true effect in the whole population. The result of any particular trial can therefore be presented as showing an effect of a certain size, and the Confidence Interval describes the range of values between which you can be 95% certain that the true value lies.

The data on Passive Smoking

Perhaps this can be illustrated with the passive smoking data. The results were that the on passive smoking study in seven European countries showed that there was an extra risk of developing lung cancer of around 16% for non-smokers who were exposed to smoke in the workplace or who had a spouse who smoked. This was comparing 650 lung cancer cases with 1542 controls in Europe and was accompanied by an estimate that 1100 deaths occurred each year in the European Union as a result of passive smoking.

common2The 95% Confidence Interval associated with this data is shown in the diagram and the tobacco industry had just chosen to highlight the lower end of the Confidence Interval, which shows a small chance that passive smoking could be associated with a 7% lower rate of lung cancer! Unsurprisingly they did not report the equal chance that the risk may be as high as 44% more lung cancer in passive smokers, and the Sunday Telegraph swallowed the story whole. More details are provided in the excellent article by Simon Chapman in the BMJ 1998;316:945.

Gardner and Altman mention this danger in their book “Statistics with Confidence”, and they suggest that results should be presented with the effect size, confidence interval and p value to prevent this kind of misunderstanding. The first two chapters are well worth reading if you want a fuller understanding of the rationale behind the use of Confidence Intervals. A final point about the Confidence Interval is that when it crosses the no-difference line (as shown in the diagram above) then the results do not reach significance at the level chosen (usually 5%).

Simon Chapman points out however that a meta analysis in the BMJ in the 18 October 1997 issue compared 4626 cases with 477924 controls and showed a 24% excess risk of lung cancer in non-smokers living with smokers. The 95% Confidence Interval was 13%to 36% which is well clear of the no-difference line and hence highly statistically significant, with a p value of >0.001. Again this data was conveniently ignored.

The moral of the story is that you cannot believe it just because you read it in the Newspaper. As far as the advantages of passive smoking are concerned, they can join the other myths and misunderstandings documented in one of my favourite books Follies and Fallacies in Medicine by Skrabanek and McCormick.

Statistics with Confidence MJ Gardner and D Altman BMJ Publishing 1989

Follies and fallacies in Medicine Skrabanek and McCormick Tarragon 1998


Can you trust what you read? Why we need Randomised Trials (Pulse Article 1999)

How can you tell if a paper is reliable? This was the question that many of the registrars wanted to have answered at a recent half-day release session on critical reading.

The Challenge of Archie Cochrane

Before he died Archie Cochrane expressed his sadness that no-one had gathered together the most reliable data available so that it could be used a basis for practice and research in Health Care. In response to this challenge the Cochrane Collaboration has emerged as a group of dedicated individual doctors and health care professionals who have set out to collect together data from controlled clinical trials, and summarise what they have found in the form of systematic reviews. The Collaboration is an international organisation and is structured by health problem areas to avoid duplication of effort. Many Journals have been hand-searched to identify controlled trials, and the reviews are structured so as to reduce bias at each stage of the process (which includes a ban on drug companies sponsoring individual reviews). In the UK many of the editorial bases are funded through the NHS Research and Development Programme.

The output of the Collaboration which includes a database of over 250,000 controlled trials identified and over 600 systematic reviews, is published in electronic form in the Cochrane Library, and a future article in this series will give an example of how it can be used.

The place of RCTs

There is considerable misunderstanding at this point about the place of Randomised Controlled Trials (RCTs). It would be unfair to say that you should not bother to read anything that is not an RCT, but it is also true to say that the most reliable way to study causation is with a systematic review of randomised controlled trials.

The way I like to look at the issue of randomisation is as follows; ask yourself the question “Could this trial have been randomised?” If you decide that it could have been randomised but it was not, then a large question mark should be placed over conclusions about whether the paper can reliably answer any questions related to the intervention causing good or bad outcomes.

Evidence for HRT

Hormone Replacement therapy is a good current example of this. Most of the current evidence relating to the purported benefits of HRT comes from non-randomised studies, and the results are therefore likely to be biased by differences between the type of women who opt for HRT and those who do not. An excellent editorial in the British Journal of General Practice in 1998 presents the current state of play in this area is recommended reading. Randomised Controlled Trials are currently under way to assess the effects of using HRT, but these will not report findings for a few years yet.

Sometimes you cannot randomise

There are of course some areas in which Randomisation is either impossible or unethical; you could not carry out a trial in which patients were randomised into cigarette-smoking or not! The very strong evidence on the dangers of smoking comes from large well conducted cohort studies, which are quite enough to leave little doubt about the size of the dangers involved.

Does it matter how you do it?

Whilst on the subject of Randomisation, how it is done matters too. The technical term to describe the actual randomisation is “allocation concealment” and if you read reports of older trials this often used to be done by using the patient’s hospital number to decide which treatment type they should receive or even alternate between treatments. It has been shown that trials with inadequate allocation concealment of this sort tend to show larger benefits of the intervention under study and it is not too difficult to imagine why.

Magic cure for warts?

Imagine you have developed a new treatment for removing warts and you arrange a trial to test it against one of the current methods. A patient walks into your surgery with a whole mass of horrible looking large warts, which you think that no treatment on earth will remove, and you can tell from the unconcealed alternated or random allocation that they would be in turn to receive your new technique. What will you do? Human nature is such that you will find some reason that this person will not quite fit into the trial and you will move on to another patient who has a nice small wart to treat next time. Obviously in this instance the advantage of randomisation in removing bias in the allocation process has been lost.

A better way to do it

When assessing the allocation concealment in Randomised trials I would look for at least opaque sealed envelopes which contain a random sequence of numbers to determine the next patient’s treatment. Even better would be a separate centre (such as the hospital pharmacy) to randomly allocate the treatment to the patient after the decision has been made to include the patient in the trial.

Try it yourself

So next time you are reading a paper, after you have asked yourself what is the question the paper was trying to answer, just pause to consider whether the trial could have been randomised. If it was randomised how easy would it have been to tell which treatment the next patient was getting? If you are satisfied on both of these fronts then read on, and if not perhaps move on to another paper.


1) Schulz KF, Chalmers I, Hayes RJ, Altman DJ. Empirical evidence of bias: dimensions of methodological quality associated with estimates of treatment effects in controlled trials. JAMA 119;273:408-12

2) Hannaford PC. Is there sufficient evidence for us to encourage the widespread use of hormone replacement therapy to prevent disease? BJGP 48;427:951-2

Further Reading

So what’s so special about randomisation? Kleijnen J, Gotzsche P, Kunz RA, Oxman AD, Chalmers I. Chapter 5 in Non-random reflections on Health Services Research (Eds Maynard A and Chalmers I) BMJ Publishing Group 1997, pp93-106.