Wednesday, September 12, 2007

The Skeptical Approach to Statistics

There are three kinds of lies: lies, damned lies, and statistics.
-- Benjamin Disraeli
Sometimes when I use statistical arguments the person I'm talking with will say something along the lines of "you can prove anything by choosing the right statistics." Yeah, but...

My rules of thumb for which statistics to believe are:

1. Never believe statistics you read in a newspaper or popular magazines or blogs. 87.5% of reporters have liberal arts backgrounds, and they're just not particularly good at math/science/statistics. Track down the original source, and see what the researchers actually say. If you can, run the paper through Google Scholar then find later papers that cite the study/paper, and see if all the citations support the same conclusions.

2. Never believe statistics that come from non-peer-reviewed sources. A statistic from the Quarterly Journal of Economics is 99.98% more trustworthy than a statistic from me.

3. Look closely at how close the statistic is to being "signficant." If the statistic is given without any measure of statistical significance, then it's almost certainly bogus.

4. Be very wary of statistics generated from "meta analyses" of a bunch of other studies. I'm told it's much harder (2.6 times harder) to get the math right for a meta-analysis.

5. Beware of statistics that give you differences in risk-- e.g. "doing XYZ decreases your risk of dying by 10%." These kinds of statistics sound impressive until you look at the absolute risk of doing XYZ-- very often, the risk is insanely tiny to begin with. Who cares if you decrease your risk of dying by a shark attack by 99% if you only swim in fresh water, if your chances of dying from a shark attack are essentially zero to begin with?

6. Beware of data mining effects. If you generate enough statistics about ANYTHING, some of them (1 in 20) will be outside the "95% confidence level." Pick and choose the ones that support your point of view and you can pretend that the randomness in the data is ironclad proof that you're right.

In an ideal world of infinite time and resources, I'd sit down with somebody I disagree with and together we'd decide on a statistic that we both agreed would be an unbiased way of measuring whatever we're arguing about (does the minimum wage cause unemployment, is more immigration good or bad for the economy, does capital punishment prevent crime...).

Then we'd go look up the relevant data and generate the statistic, and see who's right. With the astounding quantity of data and research available on the Internet we're getting closer to this ideal.
DISCLAIMER: All of the statistics in this blog post are 100% made up.

No comments: