This repo hosts AI News Audit (AI use in American newspapers is widespread, uneven, and rarely disclosed), analyzing 250,000+ news articles to detect and track AI-generated content across different media sources.
π Website: https://ainewsaudit.github.io/
Authors: Jenna Russell, Marzena Karpinska, Destiny Akinode, Katherine Thai, Bradley Emi, Max Spero, and Mohit Iyyer
AI is rapidly transforming journalism, but the extent of its use in published U.S. newspaper articles remains unclear. We address this gap by auditing a large-scale dataset of 186K articles from 1.5K American newspapers published in the summer of 2025. Using Pangram, a state-of-the-art AI detector, we discover that approximately 9% of newly-published articles are either partially or fully AI-generated. This AI use is unevenly distributed, appearing more frequently in smaller, local outlets, in specific topics such as weather and technology, and within certain ownership groups. We also analyze 45K opinion pieces from Washington Post, New York Times, and Wall Street Journal, finding that they are 6.4 times more likely to contain AI-generated content than news articles from the same publications, with many AI-flagged op-eds authored by prominent public figures. Despite this prevalence, we find that AI use is rarely disclosed: a manual audit of 100 AI-flagged articles found only five disclosures of AI use. Overall, our audit highlights the immediate need for greater transparency and updated editorial standards regarding the use of AI in journalism to maintain public trust.
Code coming soon!
This platform helps you understand the prevalence of AI-generated content in news media by analyzing articles from our three datasets:
- Recent News: 186,512 articles from various news sources
- Opinions: 44,803 opinion pieces and editorials from WSJ, NYT, and WaPo
- Reporters: 20,131 articles from reporter-specific sources
Our data was collected from publicly accessible newspaper sites, either through RSS feeds or available archives. Given the sensitivity of large-scale text collection, we do not release the complete article texts, but instead provide metadata to respect the rights of content owners.
We use Pangram to detect AI use.
- Human: Content written entirely by humans
- Mixed: Content with some AI-generated elements
- AI: Content entirely generated by AI
- AI Likelihood: Overall probability the article contains AI content
- Max AI Likelihood: Highest detection score from any segment of the article