feat: document type/subtype normalization, person-profile guard, CSV CIA context, and full article regen#438
Merged
pethers merged 5 commits intonews-generation/weekly-review-2026-02-22-25732d3df5a3d00ffrom Feb 22, 2026
Conversation
…ntext Article generation now reads from cia-data/*.csv (real, weekly-updated data) instead of data/cia-exports/current/*.json (synthetic schema files). CSV sources: - cia-data/party/view_party_performance_metrics_sample.csv (win rates, documents, rebel rate) - cia-data/view_riksdagen_party_summary_sample.csv (seat counts per party) - cia-data/party/distribution_coalition_alignment.csv (coalition alignment rates) - cia-data/view_riksdagen_committee_decisions.csv (decision outcomes for denial rate) Added parseCsvLine/parseCsvFile helpers and resolveCIADataDir() to weekly-review.ts. Co-authored-by: pethers <[email protected]>
Copilot
AI
changed the title
[WIP] Add weekly review of parliamentary activity from Feb 15-22 2026
fix: loadCIAContext reads from real cia-data/*.csv instead of synthetic JSON exports
Feb 22, 2026
Co-authored-by: pethers <[email protected]>
…Text guard
- Add normalizeDocumentType() mapping Swedish doktyp codes (mot→motion,
bet→committee-report, prop→proposition, skr→government-communication,
ip→interpellation, fr→written-question, kammakt→chamber-action, etc.)
- annotateDocumentTypes() stamps type+subtype on each raw API doc in-place
- searchDocuments() now applies annotation to every returned document
- RiksdagDocument interface gains explicit type/subtype/subtyp fields
- SearchDocumentsParams gains subtyp field
- 10 new tests covering normalization, edge cases, and enriched searchDocuments
- isPersonProfileText() guards generateEnhancedSummary + enrichWithFullText
against deceased/former MP profile strings ("Avliden", "Tjänstgörande
riksdagsledamot", "Tidigare statsminister", riksdagen.se email, etc.)
- Fixed typo Tlllgänglig→Tillgänglig in JSDoc; added Tidigare statsminister pattern
Co-authored-by: pethers <[email protected]>
Copilot
AI
changed the title
fix: loadCIAContext reads from real cia-data/*.csv instead of synthetic JSON exports
feat: document type normalization + deceased MP profile filtering
Feb 22, 2026
- monthly-review ×14: removed person-profile contamination (was 13 hits/file) - weekly-review ×14: refreshed with isPersonProfileText + CSV CIA context - week-ahead ×14: refreshed with latest code - committee-reports ×14: refreshed with normalizeDocumentType - government-propositions ×14: refreshed - opposition-motions ×14: refreshed - Regenerated 14 news index files (571 articles total) - Regenerated data/news-articles.json (557 articles) - Regenerated sitemap.xml (625 URLs) Co-authored-by: pethers <[email protected]>
Copilot
AI
changed the title
feat: document type normalization + deceased MP profile filtering
feat: document type/subtype normalization, person-profile guard, CSV CIA context, and full article regen
Feb 22, 2026
3b294dd
into
news-generation/weekly-review-2026-02-22-25732d3df5a3d00f
4 checks passed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Articles contained deceased/former MP profile text ("Avliden 2011-09-20", "Tjänstgörande riksdagsledamot…") leaked from the riksdag API's
notis/summaryfields, andsearchDocuments()returned rawdoktypcodes with no normalized type classification. Article generation also relied on synthetic JSON schema files instead of real parliamentary data.Person-profile guard (
data-transformers.ts,weekly-review.ts)isPersonProfileText()— exported predicate blocking all riksdag ledamot profile patterns from appearing as article content:Inga uppdrag,Aktuella uppdrag Riksdagsledamot,AvgångenAvliden YYYY-MM-DD,@riksdagen.seemail addressesgenerateEnhancedSummary,generateDocumentIntelligenceAnalysis, andenrichWithFullTextDocument type normalization (
mcp-client.ts,types/mcp.ts)searchDocuments()now stamps every returned document with normalizedtype+subtype:normalizeDocumentType(doktyp)— exported, case-insensitive; coversmot→motion,bet→committee-report,prop→proposition,skr→government-communication,ip→interpellation,fr→written-question,kammakt→chamber-action,prot→minutes,sou→government-inquiry,dir→committee-directive, and more; unknown codes pass through unchanged;undefined/empty →'document'annotateDocumentTypes()stamps in-place without overwriting an existingtypefield; preserves rawdoktyp/subtypRiksdagDocumentgains explicittype,subtype,subtypfields;SearchDocumentsParamsgainssubtypfilterCIA data source (
weekly-review.ts)loadCIAContext()switched from syntheticdata/cia-exports/current/*.jsonschema files to realcia-data/*.csvfiles updated weekly:party/view_party_performance_metrics_sample.csv— win rates, rebel ratesview_riksdagen_party_summary_sample.csv— seat countsparty/distribution_coalition_alignment.csv— coalition alignment ratesview_riksdagen_committee_decisions.csv— decision outcomes / denial rateArticle regeneration
All 84
2026-02-22articles regenerated with the fixed pipeline (6 types × 14 languages).monthly-reviewwas the only type contaminated pre-fix (13 person-profile hits/file → 0). News indexes,data/news-articles.json(557 articles), andsitemap.xml(625 URLs) all regenerated.✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.