Despite public pledges, leading scientific journals still allow statistical misconduct and refuse to correct it

A leading form of statistical malpractice in scientific studies is to retroactively comb through the data for “interesting” patterns; while such patterns may provide useful leads for future investigations, simply cherry-picking data that looks significant out of a study that has otherwise failed to prove out the researcher’s initial hypothesis can generate false — but plausible-seeming — conclusions.

Continue reading “Despite public pledges, leading scientific journals still allow statistical misconduct and refuse to correct it”

Analysis of all the elections since Trump produces no clear answers on the class and suburban/urban correlates of flippability

Fivethirtyeight studied every election since Trump — 99 special elections plus regular state elections in NJ and Virginia — and checked whether there were any strong predictors of whether Trump voters would support a Democratic candidate.
Continue reading “Analysis of all the elections since Trump produces no clear answers on the class and suburban/urban correlates of flippability”

Automatically generate datasets that teach people how (not) to create statistical mirages

FJ Anscome’s classic, oft-cited 1973 paper “Graphs in Statistical Analysis” showed that very different datasets could produce “the same summary statistics (mean, standard deviation, and correlation) while producing vastly different plots” — Anscome’s point being that you can miss important differences if you just look at tables of data, and these leap out when you use graphs to represent the same data.
Continue reading “Automatically generate datasets that teach people how (not) to create statistical mirages”

Sampling bias: how a machine-learning beauty contest awarded nearly all prizes to whites

If you’ve read Cathy O’Neil’s Weapons of Math Destruction (you should, right NOW), then you know that machine learning can be a way to apply a deadly, nearly irrefutable veneer of objectivity to our worst, most biased practices.

Continue reading “Sampling bias: how a machine-learning beauty contest awarded nearly all prizes to whites”

Proposal: replace Algebra II and Calculus with “Statistics for Citizenship”

Andrew Hacker, a professor of both mathematics and political science at Queens University has a new book out, The Math Myth: And Other STEM Delusions, which makes the case that the inclusion of algebra and calculus in high school curriculum discourages students from learning mathematics, and displaces much more practical mathematical instruction about statistical and risk literacy, which he calls “Statistics for Citizenship.”
Continue reading “Proposal: replace Algebra II and Calculus with “Statistics for Citizenship””

Why all scientific diet research turns out to be bullshit

The gold standard for researching the effects of diet on health is the self-reported food-diary, which is prone to lots of error, underreporting of “bad” food, and changes in diet that result from simply keeping track of what you’re eating. The standard tool for correcting these errors comparisons with more self-reported tests.
Continue reading “Why all scientific diet research turns out to be bullshit”

Stats-based response to UK Tories’ call for social media terrorism policing

David Cameron wants social media companies to invent a terrorism-detection algorithm and send all the “bad guys” it detects to the police — but this will fall prey to the well-known (to statisticians) “paradox of the false positive,” producing tens of thousands of false leads that will drown the cops.
Continue reading “Stats-based response to UK Tories’ call for social media terrorism policing”

Piketty’s methods: parsing wealth inequality data and its critique


I’ve been writing about Thomas Piketty’s magisterial economics bestseller Capital in the Twenty First Century for some time now (previously), and have been taking a close interest in criticisms of his work, especially the Financial Times’s critique of his methods and his long, detailed response. I freely admit that I lack the stats and economics background to make sense of this highly technical part of the debate, but I think that Wonkblog’s analysis looks sound, as does the More or Less play-by-play (MP3).

Continue reading “Piketty’s methods: parsing wealth inequality data and its critique”

Spurious correlations: an engine for head-scratching coincidences


The Spurious Correlations engine helps you discover bizarre and delightful spurious correlations, and collects some of the most remarkable ones. For example, Per capita consumption of sour cream (US)
correlates with
Motorcycle riders killed in noncollision transport accident at the astounding rate of 0.916391. Meanwhile, but exploring the engine, I’ve discovered a surprising correlation between the Age of Miss America and Murders by steam, hot vapours and hot objects (a whopping 0.870127!).

Spurious Correlations

(via Waxy)