Skip to content

jenna-russell/ai_news

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AI Use in American Newspapers

arxiv

This repo hosts AI News Audit (AI use in American newspapers is widespread, uneven, and rarely disclosed), analyzing 250,000+ news articles to detect and track AI-generated content across different media sources.

🌐 Website: https://ainewsaudit.github.io/

Authors: Jenna Russell, Marzena Karpinska, Destiny Akinode, Katherine Thai, Bradley Emi, Max Spero, and Mohit Iyyer

Introduction

AI is rapidly transforming journalism, but the extent of its use in published U.S. newspaper articles remains unclear. We address this gap by auditing a large-scale dataset of 186K articles from 1.5K American newspapers published in the summer of 2025. Using Pangram, a state-of-the-art AI detector, we discover that approximately 9% of newly-published articles are either partially or fully AI-generated. This AI use is unevenly distributed, appearing more frequently in smaller, local outlets, in specific topics such as weather and technology, and within certain ownership groups. We also analyze 45K opinion pieces from Washington Post, New York Times, and Wall Street Journal, finding that they are 6.4 times more likely to contain AI-generated content than news articles from the same publications, with many AI-flagged op-eds authored by prominent public figures. Despite this prevalence, we find that AI use is rarely disclosed: a manual audit of 100 AI-flagged articles found only five disclosures of AI use. Overall, our audit highlights the immediate need for greater transparency and updated editorial standards regarding the use of AI in journalism to maintain public trust.

πŸ’» Code

Code coming soon!

πŸ” What This Site Does

This platform helps you understand the prevalence of AI-generated content in news media by analyzing articles from our three datasets:

  • Recent News: 186,512 articles from various news sources
  • Opinions: 44,803 opinion pieces and editorials from WSJ, NYT, and WaPo
  • Reporters: 20,131 articles from reporter-specific sources

Our data was collected from publicly accessible newspaper sites, either through RSS feeds or available archives. Given the sensitivity of large-scale text collection, we do not release the complete article texts, but instead provide metadata to respect the rights of content owners.

πŸ“Š Understanding the Data

AI Detection Categories

We use Pangram to detect AI use.

  • Human: Content written entirely by humans
  • Mixed: Content with some AI-generated elements
  • AI: Content entirely generated by AI

Key Metrics

  • AI Likelihood: Overall probability the article contains AI content
  • Max AI Likelihood: Highest detection score from any segment of the article

About

AI use in American newspapers is widespread, uneven, and rarely disclosed

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages