Skip to content

Comments

feat: document type/subtype normalization, person-profile guard, CSV CIA context, and full article regen#438

Merged
pethers merged 5 commits intonews-generation/weekly-review-2026-02-22-25732d3df5a3d00ffrom
copilot/sub-pr-433
Feb 22, 2026
Merged

feat: document type/subtype normalization, person-profile guard, CSV CIA context, and full article regen#438
pethers merged 5 commits intonews-generation/weekly-review-2026-02-22-25732d3df5a3d00ffrom
copilot/sub-pr-433

Conversation

Copy link
Contributor

Copilot AI commented Feb 22, 2026

Articles contained deceased/former MP profile text ("Avliden 2011-09-20", "Tjänstgörande riksdagsledamot…") leaked from the riksdag API's notis/summary fields, and searchDocuments() returned raw doktyp codes with no normalized type classification. Article generation also relied on synthetic JSON schema files instead of real parliamentary data.

Person-profile guard (data-transformers.ts, weekly-review.ts)

isPersonProfileText() — exported predicate blocking all riksdag ledamot profile patterns from appearing as article content:

  • Active/former/resigned MPs, substitutes, ministers, PM
  • Inga uppdrag, Aktuella uppdrag Riksdagsledamot, Avgången
  • Avliden YYYY-MM-DD, @riksdagen.se email addresses
  • Applied in generateEnhancedSummary, generateDocumentIntelligenceAnalysis, and enrichWithFullText

Document type normalization (mcp-client.ts, types/mcp.ts)

searchDocuments() now stamps every returned document with normalized type + subtype:

// Before
{ dok_id: 'H901mot1', doktyp: 'mot', ... }

// After
{ dok_id: 'H901mot1', doktyp: 'mot', type: 'motion', subtype: 'rskr', ... }
  • normalizeDocumentType(doktyp) — exported, case-insensitive; covers mot→motion, bet→committee-report, prop→proposition, skr→government-communication, ip→interpellation, fr→written-question, kammakt→chamber-action, prot→minutes, sou→government-inquiry, dir→committee-directive, and more; unknown codes pass through unchanged; undefined/empty → 'document'
  • annotateDocumentTypes() stamps in-place without overwriting an existing type field; preserves raw doktyp/subtyp
  • RiksdagDocument gains explicit type, subtype, subtyp fields; SearchDocumentsParams gains subtyp filter

CIA data source (weekly-review.ts)

loadCIAContext() switched from synthetic data/cia-exports/current/*.json schema files to real cia-data/*.csv files updated weekly:

  • party/view_party_performance_metrics_sample.csv — win rates, rebel rates
  • view_riksdagen_party_summary_sample.csv — seat counts
  • party/distribution_coalition_alignment.csv — coalition alignment rates
  • view_riksdagen_committee_decisions.csv — decision outcomes / denial rate

Article regeneration

All 84 2026-02-22 articles regenerated with the fixed pipeline (6 types × 14 languages). monthly-review was the only type contaminated pre-fix (13 person-profile hits/file → 0). News indexes, data/news-articles.json (557 articles), and sitemap.xml (625 URLs) all regenerated.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

…ntext

Article generation now reads from cia-data/*.csv (real, weekly-updated data)
instead of data/cia-exports/current/*.json (synthetic schema files).

CSV sources:
- cia-data/party/view_party_performance_metrics_sample.csv (win rates, documents, rebel rate)
- cia-data/view_riksdagen_party_summary_sample.csv (seat counts per party)
- cia-data/party/distribution_coalition_alignment.csv (coalition alignment rates)
- cia-data/view_riksdagen_committee_decisions.csv (decision outcomes for denial rate)

Added parseCsvLine/parseCsvFile helpers and resolveCIADataDir() to weekly-review.ts.

Co-authored-by: pethers <[email protected]>
Copilot AI changed the title [WIP] Add weekly review of parliamentary activity from Feb 15-22 2026 fix: loadCIAContext reads from real cia-data/*.csv instead of synthetic JSON exports Feb 22, 2026
Copilot AI requested a review from pethers February 22, 2026 15:50
@github-actions github-actions bot added deployment Deployment configuration refactor Code refactoring size-l Large change (250-1000 lines) labels Feb 22, 2026
Copilot AI and others added 2 commits February 22, 2026 16:09
…Text guard

- Add normalizeDocumentType() mapping Swedish doktyp codes (mot→motion,
  bet→committee-report, prop→proposition, skr→government-communication,
  ip→interpellation, fr→written-question, kammakt→chamber-action, etc.)
- annotateDocumentTypes() stamps type+subtype on each raw API doc in-place
- searchDocuments() now applies annotation to every returned document
- RiksdagDocument interface gains explicit type/subtype/subtyp fields
- SearchDocumentsParams gains subtyp field
- 10 new tests covering normalization, edge cases, and enriched searchDocuments
- isPersonProfileText() guards generateEnhancedSummary + enrichWithFullText
  against deceased/former MP profile strings ("Avliden", "Tjänstgörande
  riksdagsledamot", "Tidigare statsminister", riksdagen.se email, etc.)
- Fixed typo Tlllgänglig→Tillgänglig in JSDoc; added Tidigare statsminister pattern

Co-authored-by: pethers <[email protected]>
Copilot AI changed the title fix: loadCIAContext reads from real cia-data/*.csv instead of synthetic JSON exports feat: document type normalization + deceased MP profile filtering Feb 22, 2026
@github-actions github-actions bot added testing Test coverage size-xl Extra large change (> 1000 lines) labels Feb 22, 2026
- monthly-review ×14: removed person-profile contamination (was 13 hits/file)
- weekly-review ×14: refreshed with isPersonProfileText + CSV CIA context
- week-ahead ×14: refreshed with latest code
- committee-reports ×14: refreshed with normalizeDocumentType
- government-propositions ×14: refreshed
- opposition-motions ×14: refreshed
- Regenerated 14 news index files (571 articles total)
- Regenerated data/news-articles.json (557 articles)
- Regenerated sitemap.xml (625 URLs)

Co-authored-by: pethers <[email protected]>
Copilot AI changed the title feat: document type normalization + deceased MP profile filtering feat: document type/subtype normalization, person-profile guard, CSV CIA context, and full article regen Feb 22, 2026
@pethers pethers marked this pull request as ready for review February 22, 2026 16:29
@pethers pethers merged commit 3b294dd into news-generation/weekly-review-2026-02-22-25732d3df5a3d00f Feb 22, 2026
4 checks passed
@pethers pethers deleted the copilot/sub-pr-433 branch February 22, 2026 16:29
@github-actions github-actions bot added html-css HTML/CSS changes i18n Internationalization/localization translation Translation updates rtl RTL language support (Arabic, Hebrew) labels Feb 22, 2026
@github-actions github-actions bot added the news News articles and content generation label Feb 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deployment Deployment configuration html-css HTML/CSS changes i18n Internationalization/localization news News articles and content generation refactor Code refactoring rtl RTL language support (Arabic, Hebrew) size-l Large change (250-1000 lines) size-xl Extra large change (> 1000 lines) testing Test coverage translation Translation updates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants