The Levels of Evidence and their role in Evidence-Based Medicine – free full-text /PMC3124652/ – Jul 2012
This article explains how different kinds of evidence for different types of studies are graded. It makes even clearer the crime of the CDC to allow a bunch of addiction specialists to issue practice guidelines out of their area of expertise based on low-quality evidence.
As the name suggests, evidence-based medicine (EBM), is about finding evidence and using that evidence to make clinical decisions.
A cornerstone of EBM is the hierarchical system of classifying evidence. This hierarchy is known as the levels of evidence.
it is important to understand the history behind the levels and how they should be interpreted. This paper will focus on the origin of levels of evidence, their relevance to the EBM movement.
History of Levels of Evidence
The levels of evidence were originally described in a report by the Canadian Task Force on the Periodic Health Examination in 1979
The authors developed a system of rating evidence (Table 1 below) when determining the effectiveness of a particular intervention.
The hierarchies rank studies according to the probability of bias. RCTs are given the highest level because they are designed to be unbiased and have less risk of systematic errors.
A case series or expert opinion is often biased by the author’s experience or opinions and there is no control of confounding factors.
This is exactly what we’re seeing in studies of opioids. Contrary to the agonizing results for so many legitimate pain patients, they all find a way to show that pain isn’t worsened when opioids are tapered, even if the tapering is forced.
Canadian Task Force on the Periodic Health Examination’s Levels of Evidence
Level Type of evidence I At least 1 RCT with proper randomization II.1 Well designed cohort or case-control study II.2 Time series comparisons or dramatic results from uncontrolled studies III Expert opinions
Modification of levels
Since the introduction of levels of evidence, several other organizations and journals have adopted variation of the classification system.
Diverse specialties are often asking different questions and it was recognized that the type and level of evidence needed to be modified accordingly.
Research questions are divided into the categories:
- diagnosis, and
- economic/decision analysis.
- Table 3 shows the levels of evidence developed by the American Society of Plastic Surgeons (ASPS) for prognosis and
- Table 4 shows the levels developed by the Centre for Evidence Based Medicine (CEBM) for treatment.
The two tables highlight the types of studies that are appropriate for the question (prognosis versus treatment) and how quality of data is taken into account when assigning a level.
The levels of evidence also take into account the quality of the data.
Levels of Evidence for Prognostic Studies
Level Type of evidence I High quality prospective cohort study with adequate power or systematic review of these studies II Lesser quality prospective cohort, retrospective cohort study, untreated controls from an RCT, or systematic review of these studies III Case-control study or systematic review of these studies IV Case series V Expert opinion; case report or clinical example; or evidence based on physiology, bench research or “first principles”
Levels of Evidence for Therapeutic Studies
Level Type of evidence 1A Systematic review (with homogeneity) of RCTs 1B Individual RCT (with narrow confidence intervals) 1C All or none study 2A Systematic review (with homogeneity) of cohort studies 2B Individual Cohort study (including low quality RCT, e.g. <80% follow-up) 2C “Outcomes” research; Ecological studies 3A Systematic review (with homogeneity) of case-control studies 3B Individual Case-control study 4 Case series (and poor quality cohort and case-control study 5 Expert opinion without explicit critical appraisal or based on physiology bench research or “first principles”
A grading system that provides strength of recommendations based on evidence has also changed over time.
The grading system provides an important component in evidence-based medicine and assists in clinical decision making.
Grade Practice Recommendations
Grade Descriptor Qualifying Evidence Implications for Practice A Strong recommendation Level I evidence or consistent findings from multiple studies of levels II, III, or IV Clinicians should follow a strong recommendation unless a clear and compelling rationale for an alternative approach is present B Recommendation Levels II, III, or IV evidence and findings are generally consistent Generally, clinicians should follow a recommendation but should remain alert to new information and sensitive to patient preferences C Option Levels II, III, or IV evidence, but findings are inconsistent Clinicians should be flexible in their decision-making regarding appropriate practice, although they may set bounds on alternatives; patient preference should have a substantial influencing role D Option Level V evidence: little or no systematic empirical evidence Clinicians should consider all options in their decision making and be alert to new published evidence that clarifies the balance of benefit versus harm; patient preference should have a substantial influencing role
Interpretation of levels
Many journals assign a level to the papers they publish and authors often assign a level when submitting an abstract to conference proceedings. This allows the reader to know the level of evidence of the research but the designated level of evidence does always guarantee the quality of the research.
Although RCTs are the often assigned the highest level of evidence, not all RCTs are conducted properly and the results should be carefully scrutinized.
I wish some “real scientists” would review all those “opioids are evil” studies being churned out these days and point out that almost all of them are overlooking the pain for which the opioids were prescribed.
Most of the damaging effects of opioids they find could as easily be attributable to chronic pain. (See Opioids Blamed for Side-Effects of Chronic Pain)
For example, a study finding that people who take opioids have worse pain than those that don’t:
This sounds bizarre until you realize that only people suffering from severe pain are prescribed opioids in the first place and even then, won’t come close to relieving it completely.
For example, a poorly conducted RCT may report a negative result due to low power when in fact a real difference exists between treatment groups.
Although physicians may not have the time or inclination to use a scale to assess quality, there are some basic items that should be taken into account
Items used for assessing RCTs include:
- a description of the randomization and blinding process,
- description of the number of subjects who withdrew or drop out of the study;
- the confidence intervals around study estimates; and
- a description of the power analysis.
The levels of evidence are an important component of EBM.
Understanding the levels and why they are assigned to publications and abstracts helps the reader to prioritize information.
This is not to say that all level 4 evidence should be ignored and all level 1 evidence accepted as fact.
If you’re the CDC, you’ll use almost exclusively level 3 and 4 evidence to issue “strong” recommendations – a serious evidence problem created by cherry-picking the evidence to fit your purpose.
The levels of evidence provide a guide and the reader needs to be cautious when interpreting these results.