Misusing statistics is one of the most powerful ways to lie.
The CDC did exactly this to justify their opioid prescribing guidelines:
- CDC may have over-counted opioid overdoses
- A Reviewer’s Analysis of the Draft CDC Guidelines
- Lies, Damned Lies, and Overdose Statistics
By using bad (miscounted, misused) data, they show frightening statistics that lead to utterly wrong conclusions. This article shows exactly how illusions of meaning are created.
We’re going to show you how to make data say whatever the hell you want to back up any wrong idea you have.
Gather Sample Data That Adds Bias to Your Findings
The first step to building statistics is determining what you want to analyze. Statisticians refer to this as the “population.”.
Then you define a subset of that data to collect that, when analyzed, should be representative of the population as a whole. The larger and more accurate the sample, the more precise your conclusions can be.
Of course, there are a few big ways to screw up this type of statistical sampling, either by accident or intentionally. If the sample data you gather is bad, you’ll end up with false conclusions no matter what. There are a lot of ways you can mess up your data, but here are a few of the big ones:
- Self-Selection Bias: This type of bias occurs when the people or data you’re studying voluntarily puts itself into a group that isn’t representative of your whole population. For example, when we ask our readers questions like “What’s your favorite texting app?” we only get responses from people who choose to read Lifehacker. The results of an informal poll like this likely won’t be representative of the population at large because all our readers are smarter, funnier, and more attractive than the average person.
- Convenience Sampling: This bias occurs when a study analyzes whatever data it has available, instead of trying to find representative data. For example, a cable news network might poll its viewers about a political candidate. Without polling people who watch other networks (or don’t watch TV at all), it’s impossible to say that the results of the poll would represent reality.
- Non-Response Bias: This happens when some people in a chosen set don’t respond to a statistical survey, causing the answers to shift. For example, if a survey on sexual activity asked “Have you ever cheated on your spouse?” some people may not want to admit to infidelity, making it look like cheating is rarer than it is.
- Open-Access Polls: These type of polls allow anyone to submit answers and, in many cases, don’t even verify that people only submit an answer once. While common, they’re fundamentally biased because they don’t attempt to control the input in any meaningful way. For example, online polls that just ask you to click your preferred option fall under this bias. While they can be fun and useful, they’re not good at objectively proving a point.
Since statistics use numbers, it’s easy to assume that they’re hard proof of the ideas they claim to support. In reality, the math behind statistics is complex, and analyzing it improperly can yield different or even entirely contradictory conclusions.
If you wanted to twist a statistic to suit your needs, fudge the math.
To demonstrate the flaws in analyzing data, statistician Francis Anscombe created Anscombe’s quartet (diagramed below). It consists of four graphs that, when viewed on a chart, show wildly different trends.
For all four of these charts, the following statements are true:
- The average x value is 9 for each dataset
- The average y value is 7.50 for each dataset
- The variance for x is 11 and the variance for y is 4.12
- The correlation between x and y is 0.816 for each dataset
Obscure Your Sources At All Costs
The easier it is to see your sources, the easier other people can verify or disprove your conclusions.
For proper sourcing, every single person who ever mentions a piece of data will include a reference to the source.
News sites should link to the studies or research they’re quoting (not articles about the studies). Researchers may not show their entire data set, but the source of a study should answer some basic questions:
- When was the data collected?
- How was the data gathered?
- Who collected the data?
- Who was asked?