-
Notifications
You must be signed in to change notification settings - Fork 1
Description
π Issue Type
Bug Fix / SEO Quality
π― Objective
Fix the wordCount: 0 metadata bug in data/news-articles.json where 225 out of 668 articles (34%) report zero word count. This affects SEO structured data (Schema.org wordCount), quality metrics reporting, and article quality scoring.
π Current State β Evidence
Measured: 225 articles in data/news-articles.json have "wordCount": 0
{
"slug": "2026-02-24-legislative-push",
"file": "2026-02-24-legislative-push-en.html",
"lang": "en",
"headline": "Sweden Bolsters Civilian Defence...",
"wordCount": 0, // β Should be ~1500-2000
...
}All 14 language variants of recent breaking news articles (#485) have wordCount: 0.
The actual articles have content β news/2026-02-24-committee-reports-en.html has ~2239 words, but data/news-articles.json reports 0.
Root cause: The word count is calculated correctly in scripts/generate-news-enhanced.ts (line ~505) during article quality scoring, but this value is not persisted to data/news-articles.json. The news index generator (scripts/generate-news-indexes.ts) creates the JSON entries but doesn't read back the word count from the HTML files.
For agentic workflow-generated articles (breaking news from realtime monitor), the article HTML is written directly by the agentic engine without going through generate-news-enhanced.ts at all, so word count is never calculated.
π Desired State
-
Calculate word count during news index generation (
generate-news-indexes.ts) by:- Reading each HTML file
- Stripping HTML tags
- Counting whitespace-delimited tokens
- Writing the result to
data/news-articles.json
-
Propagate word count to Schema.org structured data in the article HTML:
{ "@type": "NewsArticle", "wordCount": 1500 } -
Update quality metrics to use actual word counts for quality scoring.
π§ Implementation Approach
- Update
scripts/generate-news-indexes.tsto read HTML files and calculate word count - Populate
wordCountfield indata/news-articles.jsonentries - Add word count calculation for agentic-workflow-generated articles
- Run
npx vitest runand verify - Regenerate
data/news-articles.jsonwith correct word counts
π€ Recommended Agent
code-quality-engineer β fix metadata generation pipeline
β Acceptance Criteria
-
data/news-articles.jsonhas accuratewordCountfor all articles (not 0) - Word count calculation strips HTML tags before counting
- Agentic workflow articles also get word count populated
- Schema.org
wordCountin article HTML matches JSON metadata - Existing tests pass (
npx vitest run) - Quality metrics reporting reflects actual word counts
π References
- Word count calc:
scripts/generate-news-enhanced.tslines 503-506 - Index generator:
scripts/generate-news-indexes.ts - Data file:
data/news-articles.json(225 entries withwordCount: 0) - PR evidence: π΄ Breaking: Sweden files five propositions on civilian defence, psychological violence β 2026-02-24Β #485 (all 14 breaking news articles have
wordCount: 0) - Quality scoring:
scripts/generate-news-enhanced.tslines 487-560