[I really should have discussed this before having launched into reviews of evidence from clinical trials as it is fundamental to the issue of what constitutes “evidence”. You will notice, if you read back, that I have peppered my previous posts with links to this article where appropriate.]
I have mentioned in a number of previous posts that there is some evidence for efficacy for some fairly outlandish alternative medicine treatments. This evidence comes in the form of significant statistical tests in clinical trials. Now, clinical trials (double-blind, placebo-controlled and properly randomised) are the gold standard for evidence-based medicine but (as with all statistics) you have to know how to interpret them for them to be of any use. There are three places where care needs to be exercised in the interpretation of clinical trials:
Limitation 1 – The missing trials
The first I have covered before, and this is the issue of the “file drawer effect” or hidden trial data. If you conduct an experiment 20 times then by chance you will achieve your magical p-value of less than 0.05 (an entirely arbitrary number, but that’s another issue) and you can wave it around for all to see. However, if we know that 20 tests were conducted then the probability of one of them being statistically significant actually rises to 1 (0.05 x 20), not 0.05. The trouble comes when some trials are hidden, such that you only see two of the trials and those two include one that produced a significant result. This has been documented in a few cases now and has cast a shadow over the reliability of clinical trials.
Limitation 2 – The arbitrary statistic
Even assuming that there have only been a small number of trials conducted, there can still be significant results that seem strange. An example is Oscillococcinum which appears to reduce the length of flu symptoms despite not containing any active ingredient. Unfortunately, we now have to get into some math/philosophy… The kind of statistics used in clinical trials are called “frequentist statistics”, which attempt to establish the probability that a given observation could result from chance. These statistics are based on testing hypotheses, or statements about the way that the world works.
Let’s take a celebrated example to illustrate the issue. I propose that global warming is caused by a decline in the global pirate population. This is my “alternative hypothesis” – alternative as it constitutes the alternative explanation to the relationship being driven by chance. Chance can be the “null hypothesis” – null in the sense that there is no actual relationship and any small variations are just random noise. What we need to do is see if there is enough evidence to allow us to reject the null hypothesis (chance, in this case). It has been arbitrarily decided over the past century or so that if an observation has less than a 5% probability (the probability is less than 0.05, or p<0.05) of being the result of chance then we can reject the null hypothesis. Now, the result to the left is strong enough to be able to reject the hypothesis of chance, so clearly we can accept the alternative hypothesis that pirates cool the world, right? Seems wrong, doesn’t it? This is because rejection of the null hypothesis with a significant test statistic does not mean automatically accepting the alternative hypothesis.
Limitation 3 – The credulous statistician
The final point that I wish to cover is probably the most important and the most relevant to alternative medicine claims. We have a large body of research into the sciences generally and medicine in particular. We pretty much know some stuff for certain. It is for this reason that some of the claims of alternative medicine can seem kooky – they contravene widely-held scientific beliefs about the way the world functions. Take, for example, two rival treatments: the first is a drug that has been synthesised artificially using drug design technology to interfere with a particular part of a bacterial cell. This drug, has been tested in vitro (in petri dishes) and in vivo (using model animal organisms) and has been shown to have an effect as predicted by the design. Now we look at a second treatment for the same disease. This treatment does not involve giving the patient any drugs or herbs. In fact, the practitioner of this particular healing medium doesn’t even need to be in the same room as the patient – s/he can heal from distance using precisely-directed positive thinking.
We take these two treatments and, in the interest of fairness, test them for action on the same disease using a randomised, placebo-controlled, double-blind clinical trial. Both show evidence for efficacy above placebo (p<0.05) and so we conclude that both can be used in the treatment of the particular disease. In fact, since the drug manufacture is expensive, it is more cost-effective to have the psychic healer do all the work so that is the preferred treatment.
The problem here is that we have relied on frequentist statistics again, which are flawed not only in the application of the arbitrary p-value mentioned above, but also in their lack of ability to consider the relative likelihood of the alternative hypotheses. When comparing the null hypothesis of chance against the alternative hypothesis of “pirates cool the world”, frequentist statistics use a level playing field. However, there is another form of statistics that can incorporate our pre-existing knowledge into the testing of hypotheses. The gentleman to the right is Reverend Thomas Bayes (1701-1761) and his theory (Bayes’ theorum) gave rise to a whole new way of approaching statistics. Rather than simply comparing hypotheses on a level playing field, Bayes proposed that we enter a statistical test with a “prior” expectation of whether or not the hypothesis was correct. We then make observations, following which we re-evaluate our assessment of the hypothesis (giving a “posterior” expectation of whether it is correct). We then use this posterior probability in the next round of testing, evaluating that in light of yet more observations, and so on until the posterior distribution stabilises.
This idea of prior-observation-posterior has great implications for the testing of alternative medicine. Using sound scientific principles, we can attribute a very low prior probability to the hypothesis that “homeopathic remedies help with ‘flu”, while hypotheses such as “HIV-protease inhibitors help with HIV” are granted higher prior probabilities. With a lower prior probability, greater observational evidence (and by “observations” I also mean experiments) is needed to draw a posterior conclusion that the hypothesis is supported. If we apply this criterion to Oscillo’s clinical trials, we would begin by stating that there is a very small prior probability of a treatment containing nothing but sugar having any effect on ‘flu. The observation that there is a statistically significant but clinically unimportant decline in ‘flu symptoms does little to convince us that the treatment actually does what it claims. This is in contrast to the credulous frequentist approach that would consider the drug to be a success.
Clinical trials are the best way to learn whether or not a therapy is effective in treating a condition. However, the experiments and resulting statistics cannot be taken and interpreted without placing them into context. In particular, the number of other trials that (may) have been conducted, the extent of the statistical effect (is it clinically relevant?), and the prior likelihood of the treatment showing and effect are all aspects that are rarely considered when formulating judgements on efficacy.
One thought on “The limitations of clinical trials”
[…] is one close to my heart and I wanted to post on it as soon as possible. I have blogged about the limitations of clinical trials and the need for clinical trial registration before. Ben Goldacre has published a new book […]