How well a given treatment may work can greatly influence our lives.
But before we decide whether to take a treatment we generally want to know how effective it may be. Randomised controlled trials (RCTs) are commonly conducted by randomly distributing people into treatment and control groups to test if a treatment may be effective.
Researchers in fields like medicine, psychology, and economics often claim that this method is the only reliable means to properly inform medical, social and policy decisions;
it is an ultimate benchmark against which to assess other methods; and it is exempt from strong theoretical assumptions, methodological biases and the influence of researchers (or as exempt as possible) which non-randomised methods are subject to.
This study assesses the hypothesis that randomised experiments estimate the effects of some treatment without strong assumptions, biases and limitations. In assessing this hypothesis, the 10 most cited RCT studies worldwide are analysed.
While these trials are related to the fields of general medicine, biology and neurology, the insights outlined here are as useful for researchers and practitioners using RCTs across any field including psychology, neuroscience, economics and, among others, agriculture.
This study shows that all of the 10 most cited RCTs assessed here suffer from at least several commonly known methodological issues that lead to biased results:
- poor allocation of their participants’ background characteristics that influence outcomes across trial groups,
- issues related to partially blinding and unblinding,
- significant shares of participant refusal and
- participants switching between trial groups, among others.
Some of these issues cannot be avoided in trials – and they affect their robustness and constrain their reported outcomes.
This study thereby contributes to the literature on the methodological biases and limits of RCTs and a number of meta-analyses of RCTs also indicate that trials at times face different biases, using common assessment criteria including randomisation, double-blinding, dropouts and withdrawals
To help reduce biases, trial reporting guidelines have been important but these need to be significantly improved.
A critical concern for trial quality is that only some trials report the common methodological problems. Even fewer explain how these problems affect their trial’s results. And no existing trials report all such problems and explain how they influence trial outcomes.
Exacerbating the situation, these are only some of the more commonly reported problems. This study’s main contribution is outlining a larger set of important assumptions, biases and limitations facing RCTs that have not yet all been thoroughly discussed in trial studies.
Better understanding the limits of randomised experiments is very important for research, policy and practice.
Results and discussion
Assumptions, biases and limitations in designing RCTs
To begin, a constraint of RCTs not yet thoroughly discussed in existing studies is that randomisation is only possible for a small set of questions we are interested in – i.e. the simple-treatment-at-the-individual-level limitation of trials.
Randomisation is largely infeasible for many complex scientific questions, e.g. on what drives overall good physical or mental health, high life expectancy, functioning public health institutions or, in general, what shapes any other intricate or large-scale phenomenon (from depression to social anxiety).
Topics are generally not amenable to randomisation that are related to
- mental states,
- human capacities,
- norms and practices.
Not having a comparable counterfactual for such topics is often the reason for not being able to randomise.
Trials are restricted in answering questions about how to achieve the desired outcomes within another context and policy setting: about what type of health practitioners are needed in which kind of clinics within what regulatory, administrative and institutional environment to deliver health services effective in providing the treatment.
But they cannot generally be conducted in cases with multiple and complex treatments or outcomes simultaneously that often reflect the reality of medical situations (e.g. for understanding how to increase life expectancy or make public health institutions more effective)
Researchers would, if they viewed RCTs as the only reliable research design, thus largely only focus on select questions related to simple treatments at the level of the individual that fit the quantifiable treatment–outcome schema (more to come on this later).
- initial sample selection bias
Another constraint facing RCTs is that a trial’s initial sample, when the aim is to later scale up a treatment, would ideally need to be generated randomly and chosen representatively from the general population – but the 10 most cited RCTs at times use, when reported, a selective sample that can limit scaling up results and can lead to an initial sample selection bias.
Some of these leading trials, as Table 1 indicates, do not provide information about how their initial sample was selected before randomisation while others only state that “patient records” were used or that they “recruited at 29 centers” but critical information is not provided such as the quality, diversity or location of such centres and the participating practitioners, how the centres were selected, the types of individuals they tend to treat and so forth.
This means that we do not have details about the representativeness of the data used for these RCTs.
A foundational and strong assumption of RCTs (once the sample is chosen) is the achieving-good-randomisation assumption. Poor randomisation – and thus poor distribution of participants’ background traits that affect outcomes between trial groups – puts into question the degree of robustness of the results from several of these 10 leading RCTs.
all of these 10 RCTs randomised their sample, showing that randomisation by itself does not ensure a balanced distribution – as we always have finite samples with finite randomisations.
As long as there are important imbalances we cannot interpret the different outcomes between the treatment and control groups as simply reflecting the treatment’s effectiveness.
incomplete baseline data limitation
Another constraint that can arise in trials is when they do not collect baseline data for all relevant background influencers (but only some) that are known to alternatively influence outcomes – i.e. an incomplete baseline data limitation.
The common claim, that “an advantage of RCTs is that nobody needs to know all the factors affecting the outcome as randomising should ensure it is due to the treatment”, does not hold and we cannot evade an even balance of influencing factors.
some of these 10 trials did not double-blind while others initially double-blinded but later partially unblinded, or only partially blinded for one arm of the trial – which reflects in relevant cases (while often unavoidable) a lack-of-blinding bias.
Beyond randomisation and blinding, a further constraint is that trials often consist of a few hundred individuals that are often too restrictive to produce robust results – i.e. the small sample bias.
quantitative variable limitation
Another issue facing RCTs not yet discussed in existing studies is the quantitative variable limitation: that trials are only possible for those specific phenomena for which we can create strictly defined outcome variables that fit within our experimental model and make correlational or causal claims possible.
The 10 most cited RCTs thus all use a rigid quantitative outcome variable. Some use the binary treatment variable (1 or 0) of whether participants died or not
But this binary variable can neglect the multiple ways in which participants perceive the quality of their life while receiving treatment.
In fact, most medical phenomena (from depression, cancer and overall health, to medical norms and hospital capacity) are not naturally binary or amendable to randomisation and statistical analysis (and this issue also affects other statistical methods and its implications need to be discussed in studies).
Assumptions, biases and limitations in implementing RCTs
- all-preconditions-are-fully-met assumption
An assumption in implementing trials that has not yet been thoroughly discussed in existing studies is the all-preconditions-are-fully-met assumption: that a trial treatment can only work if a broad set of influencing factors (beyond the treatment) that can be difficult to measure and control would be simultaneously present
A treatment – whether chemotherapy or a cholesterol drug – can only work
- if patients are nourished and healthy enough for the treatment to be effective,
- if compliance is high enough in taking the proper dosage,
- if community clinics administering the treatment are not of low quality,
- if practitioners are trained and experienced in delivering it effectively,
- if institutional capacity of the health services to monitor and evaluate its implementation is sufficient,
- among many other issues.
Variation in the extent to which such preconditions are met leads to variation (bias) in average treatment effects across different groups of people. To increase the effectiveness of treatments and the usefulness of results, researchers need to give greater focus, when designing trials and when extrapolating from them, to this broader context.
Table 1. Research designs of the ten most cited RCTs worldwide
In these 10 leading RCTs, some degree of statistical bias arises during implementation through issues related to people initially recruited who refused to participate, participants switching between trial groups, variations in actual dosage taken, missing data for participants and the like.
Table 1 illustrates that for the few trials in which the share of people unwilling to participate after being recruited was reported it accounted at times for a large share of the eligible sample
This implies a selection bias among those who
- have time,
- are willing,
- find it useful,
- view limited risk in participating and
- possibly have greater demand for treatment.
Among this small share, 88% were then randomised into the trial. During implementation, 42% in the treatment group stopped taking the drug. Among all participants 4% had unknown vital status (missing data) and 3% died.
As a sample gets smaller due to people refusing, people with missing data etc. “average participants” are likely not being lost but those who may differ strongly – which are issues that intention-to-treat analysis cannot necessarily address.
Assumptions, biases and limitations in analysing RCTs
unique time period assessment bias
In evaluating results after trial implementation, RCTs face a unique time period assessment bias that has not yet been thoroughly discussed in existing studies: that a correlational or causal claim about the outcome is a function of when a researcher chooses to collect baseline and endline data points and thus assesses one average outcome instead of another.
Treatments generally have different levels of decreasing (or at times increasing) returns. Variation in estimated results is thus generally inevitable depending on when we decide to evaluate a treatment – every month, quarter, year or several years.
No two assessment points are identical and we need to thus evaluate at multiple time points to improve our understanding of the evaluation trajectory and of lags over time (while this issue also affects other statistical methods).
- background-traits-remain-constant assumption
Another strong assumption made in evaluating RCTs that has not yet been discussed is the background-traits-remain-constant assumption – but these change during the trial so we need to assess them not only at baseline but also at endline as they can alternatively influence outcomes and bias results.
- average treatment effects limitation
Another constraint is that trials are commonly designed to only evaluate average effects – i.e. the average treatment effects limitation. Though, average effects can at times be positive even when some or the majority are not influenced or even negatively influenced by the treatment but a minority still experience large effects.
A best results bias can also exist in reporting treatment effects, with funders and journals at times less likely to accept negligible or negative results.
Another constraint in evaluating trials is that funders can have some inherent interest in the published outcomes that can lead to a funder bias.
- placebo-only oa studyr conventional-treatment-only limitation
An associated constraint that arises in interpreting a trial’s treatment effects is related to a placebo-only or conventional-treatment-only limitation.
Combining the set of assumptions, biases and limitations facing RCTs
Pulling the range of assumptions and biases together that arise in designing, implementing and analysing trials (Figure 1), we can try to assess how reliable an RCT’s outcomes are.
Figure 1. Overview of assumptions, biases and limitations in RCTs (i.e. improving trials involves reducing these biases and satisfying these assumptions as far as possible). Source: Own illustration. Note: For further details on any assumption, bias or limitation, see the respective section throughout the study. This list is not exhaustive.
Yet is it feasible to always meet this set of assumptions and minimise this set of biases? The answer does not seem positive when assessing these leading RCTs.
The extent of assumptions and biases underlying a trial’s results can increase at each stage: from
- choose our research question and objective,
- create our variables,
- select our sample,
- randomise, blind and control,
to how we
- carry out treatments and monitor participants,
- collect our data and conduct our data analysis,
- interpret our results
and do everything else before, in between and after these steps.
We need to furthermore use RCTs together with other methods that also have benefits.
When a trial suggests that a new treatment can be effective for some participants in the sample, subsequent observational studies for example can often be important to provide insight into:
- a treatment’s broader range of side effects,
- the distribution of effects on those of different age, location and other traits and,
- among others, whether people in everyday practice with everyday service providers in everyday facilities would be able to attain comparable outcomes as the average trial participant.
Randomised experiments require much more than just randomising an experiment to identify a treatment’s effectiveness.
They involve many decisions and complex steps that bring their own assumptions and degree of bias before, during and after randomisation.
Seen through this lens, the reproducibility crisis can also be explained by the scientific process being a complex human process involving many actors making many decisions at many levels when designing, implementing and analysing studies, with some degree of bias inevitably arising during this process.
And addressing one bias can at times mean introducing another bias (e.g. making a sample more heterogeneous can help improve how useful results are after the trial but can also reduce reliability in the trial’s estimated results).
Journals must begin requiring that researchers include a standalone section with additional tables in their studies on the “Research assumptions, biases and limitations” they faced in carrying out the trial.
Researchers need to furthermore better combine methods as each can provide insight into different aspects of a treatment. These range from RCTs, observational studies and historically controlled trials, to rich single cases and consensus of experts.
Finally, randomisation does not always even out everything well at the baseline and it cannot control for endline imbalances in background influencers.
No researcher should thus just generate a single randomisation schedule and then use it to run an experiment. Instead researchers need to run a set of randomisation iterations before conducting a trial and select the one with the most balanced distribution of background influencers between trial groups, and then also control for changes in those background influencers during the trial by collecting endline data