Conversation
…k in generatePolicySignificance - Add defence keywords: vapen, beredskap, totalförsvar, krigsmateriel, säkerhetsskydd, weapon - Add fiscal keywords: moms, mervärd, skattebedrägeri, e-id, e-legitimation, verklig huvudman, penningtvätt, beneficial - Add migration keywords: uppehållstillstånd, medborgarskap, citizenship, utvisning, statslöshet - Add labour keywords: ilo, trakasserier, kollektivavtal, lönediskriminering, harassment - Add rural/agriculture keywords: jordbruk, lantbruk, veterinär, djur → trade and industry policy - Add ORGAN_NAMES map for committee-specific fallback messages - Improve generatePolicySignificance() to use committee name when organ is known but no domain matched - Add 12 new test cases covering expanded domains and committee fallback Co-authored-by: pethers <[email protected]>
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
There was a problem hiding this comment.
Pull request overview
This pull request enhances policy domain detection and adds committee-specific fallback logic to improve the quality of "Why It Matters" sections in government propositions articles. The changes address a root cause where ~80% of propositions fell back to generic boilerplate text due to insufficient keyword coverage in detectPolicyDomains().
Changes:
- Expanded
detectPolicyDomains()with 30+ new Swedish and English keywords across defence, fiscal, migration, labour, and agricultural policy domains - Added
ORGAN_NAMESmap with all 15 Riksdag committee codes and bilingual names for committee-specific fallback sentences - Enhanced
generatePolicySignificance()to generate committee-specific context when no domain keywords match but organ field is present - Added 12 comprehensive test cases covering new keyword detection and committee fallback behavior
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| scripts/data-transformers.ts | Expanded policy domain keywords in detectPolicyDomains(), added ORGAN_NAMES constant with committee translations, and enhanced generatePolicySignificance() with committee-specific fallback logic |
| tests/data-transformers.test.ts | Added 12 new test cases verifying keyword detection for defence (vapen, beredskap), fiscal (moms, e-legitimation, verklig huvudman), migration (uppehållstillstånd), labour (trakasserier), agricultural (veterinär) topics, plus committee fallback tests for KU and KrU organs |
scripts/data-transformers.ts
Outdated
| AU: { en: 'the Committee on Labour', sv: 'arbetsmarknadsutskottet' }, | ||
| CU: { en: 'the Civil Affairs Committee', sv: 'civilutskottet' }, | ||
| FiU: { en: 'the Finance Committee', sv: 'finansutskottet' }, | ||
| FöU: { en: 'the Defence Committee', sv: 'försvarsutskottet' }, | ||
| JuU: { en: 'the Justice Committee', sv: 'justitieutskottet' }, | ||
| KrU: { en: 'the Committee on Cultural Affairs', sv: 'kulturutskottet' }, | ||
| KU: { en: 'the Committee on the Constitution', sv: 'konstitutionsutskottet' }, | ||
| MJU: { en: 'the Committee on Environment and Agriculture', sv: 'miljö- och jordbruksutskottet' }, | ||
| NU: { en: 'the Committee on Industry and Trade', sv: 'näringsutskottet' }, | ||
| SkU: { en: 'the Tax Committee', sv: 'skatteutskottet' }, | ||
| SfU: { en: 'the Social Insurance Committee', sv: 'socialförsäkringsutskottet' }, | ||
| SoU: { en: 'the Social Affairs Committee', sv: 'socialutskottet' }, | ||
| TU: { en: 'the Transport Committee', sv: 'trafikutskottet' }, | ||
| UbU: { en: 'the Committee on Education', sv: 'utbildningsutskottet' }, | ||
| UU: { en: 'the Foreign Affairs Committee', sv: 'utrikesutskottet' }, |
There was a problem hiding this comment.
The English committee names in ORGAN_NAMES are inconsistent with the established naming convention in translation-dictionary.ts. Most committees should follow the pattern "Committee on [Subject]" without the "the" prefix, though the sentence construction at line 2600 requires "the" to be grammatically correct.
Inconsistencies found:
- AU: Should be "Committee on Labour Market Affairs" (not "the Committee on Labour")
- CU: Should be "Committee on Civil Affairs" (not "the Civil Affairs Committee")
- FiU: Should be "Committee on Finance" (not "the Finance Committee")
- FöU: Should be "Committee on Defence" (not "the Defence Committee")
- JuU: Should be "Committee on Justice" (not "the Justice Committee")
- SkU: Should be "Committee on Taxation" (not "the Tax Committee")
- SfU: Should be "Committee on Social Insurance" (not "the Social Insurance Committee")
- SoU: Should be "Committee on Social Affairs" (not "the Social Affairs Committee")
- TU: Should be "Committee on Transport" (not "the Transport Committee")
- UU: Should be "Committee on Foreign Affairs" (not "the Foreign Affairs Committee")
The sentence template at line 2600 uses "This matter is referred to ${organEntry.en}", which requires the article "the" to be included. The correct approach is to either: (1) include "the" in all committee names consistently, or (2) adjust the sentence template to add "the" dynamically (e.g., "This matter is referred to the ${organEntry.en}").
| AU: { en: 'the Committee on Labour', sv: 'arbetsmarknadsutskottet' }, | |
| CU: { en: 'the Civil Affairs Committee', sv: 'civilutskottet' }, | |
| FiU: { en: 'the Finance Committee', sv: 'finansutskottet' }, | |
| FöU: { en: 'the Defence Committee', sv: 'försvarsutskottet' }, | |
| JuU: { en: 'the Justice Committee', sv: 'justitieutskottet' }, | |
| KrU: { en: 'the Committee on Cultural Affairs', sv: 'kulturutskottet' }, | |
| KU: { en: 'the Committee on the Constitution', sv: 'konstitutionsutskottet' }, | |
| MJU: { en: 'the Committee on Environment and Agriculture', sv: 'miljö- och jordbruksutskottet' }, | |
| NU: { en: 'the Committee on Industry and Trade', sv: 'näringsutskottet' }, | |
| SkU: { en: 'the Tax Committee', sv: 'skatteutskottet' }, | |
| SfU: { en: 'the Social Insurance Committee', sv: 'socialförsäkringsutskottet' }, | |
| SoU: { en: 'the Social Affairs Committee', sv: 'socialutskottet' }, | |
| TU: { en: 'the Transport Committee', sv: 'trafikutskottet' }, | |
| UbU: { en: 'the Committee on Education', sv: 'utbildningsutskottet' }, | |
| UU: { en: 'the Foreign Affairs Committee', sv: 'utrikesutskottet' }, | |
| AU: { en: 'Committee on Labour Market Affairs', sv: 'arbetsmarknadsutskottet' }, | |
| CU: { en: 'Committee on Civil Affairs', sv: 'civilutskottet' }, | |
| FiU: { en: 'Committee on Finance', sv: 'finansutskottet' }, | |
| FöU: { en: 'Committee on Defence', sv: 'försvarsutskottet' }, | |
| JuU: { en: 'Committee on Justice', sv: 'justitieutskottet' }, | |
| KrU: { en: 'the Committee on Cultural Affairs', sv: 'kulturutskottet' }, | |
| KU: { en: 'the Committee on the Constitution', sv: 'konstitutionsutskottet' }, | |
| MJU: { en: 'the Committee on Environment and Agriculture', sv: 'miljö- och jordbruksutskottet' }, | |
| NU: { en: 'the Committee on Industry and Trade', sv: 'näringsutskottet' }, | |
| SkU: { en: 'Committee on Taxation', sv: 'skatteutskottet' }, | |
| SfU: { en: 'Committee on Social Insurance', sv: 'socialförsäkringsutskottet' }, | |
| SoU: { en: 'Committee on Social Affairs', sv: 'socialutskottet' }, | |
| TU: { en: 'Committee on Transport', sv: 'trafikutskottet' }, | |
| UbU: { en: 'the Committee on Education', sv: 'utbildningsutskottet' }, | |
| UU: { en: 'Committee on Foreign Affairs', sv: 'utrikesutskottet' }, |
There was a problem hiding this comment.
Fixed in 2bd8d9d. All 15 entries now use the consistent 'Committee on [Subject]' pattern (no 'the' prefix in the data), and the sentence template adds 'the' dynamically: "This matter is referred to the ${organEntry.en} for parliamentary examination."
scripts/data-transformers.ts
Outdated
| || title.includes('makrotillsyn') || title.includes('macroprudential') | ||
| || title.includes('moms') || title.includes('mervärd') || title.includes('skattebedrägeri') | ||
| || title.includes('e-id') || title.includes('e-legitimation') || title.includes('verklig huvudman') | ||
| || title.includes('penningtvätt') || title.includes('beneficial') || title.includes('fakturabedrägeri') |
There was a problem hiding this comment.
The keyword "beneficial" at line 2311 could cause false positives by matching partial words or unrelated contexts (e.g., "beneficial effects", "mutually beneficial"). Consider using a more specific Swedish term or adding word boundary checks. The intended use case appears to be "beneficial ownership" (verklig huvudman), so a more precise match would be "beneficial ownership" as a phrase or relying solely on the Swedish "verklig huvudman".
| || title.includes('penningtvätt') || title.includes('beneficial') || title.includes('fakturabedrägeri') | |
| || title.includes('penningtvätt') || /\bbeneficial ownership\b/.test(title) || /\bbeneficial owner\b/.test(title) || title.includes('fakturabedrägeri') |
There was a problem hiding this comment.
Fixed in 2bd8d9d. Changed to /\bbeneficial owner(ship)?\b/.test(title) to match only "beneficial owner" and "beneficial ownership" phrases.
scripts/data-transformers.ts
Outdated
| set.add(isSv ? 'rättspolitik' : 'justice policy'); | ||
| if (title.includes('arbetsmarknad') || title.includes('labour') || title.includes('anställning') | ||
| || title.includes('facklig') || organ === 'AU') | ||
| || title.includes('facklig') || title.includes('ilo') || title.includes('trakasserier') |
There was a problem hiding this comment.
The keyword "ilo" at line 2343 should use word boundary checks to avoid false positives. Without word boundaries, it will match partial words like "pilot", "tailor", "billion", "kiloton", etc. Consider using a regex pattern like /\bilo\b/ similar to how "eu" is handled at line 2336, or use a more specific phrase like "ilo-konvention" to match the actual ILO convention context.
| || title.includes('facklig') || title.includes('ilo') || title.includes('trakasserier') | |
| || title.includes('facklig') || /\bilo\b/.test(title) || title.includes('trakasserier') |
There was a problem hiding this comment.
Fixed in 2bd8d9d. Changed to /\bilo\b/.test(title) with a word boundary to avoid matching partial words like "pilot" or "billion".
| if (title.includes('näring') || title.includes('handel') || title.includes('trade') || title.includes('industri') | ||
| || title.includes('företag') || organ === 'NU') | ||
| || title.includes('företag') || title.includes('jordbruk') || title.includes('lantbruk') | ||
| || title.includes('veterinär') || title.includes('djur') || organ === 'NU') | ||
| set.add(isSv ? 'näringspolitik' : 'trade and industry policy'); |
There was a problem hiding this comment.
Adding agricultural keywords (jordbruk, lantbruk, veterinär, djur) to "trade and industry policy" creates potential ambiguity for MJU (Committee on Environment and Agriculture) documents. A document from MJU with agricultural keywords in the title would match both "environmental and climate policy" (via organ === 'MJU' at line 2323) and "trade and industry policy" (via keyword matching at lines 2355-2356), resulting in two policy domains. While this might be intentional (MJU covers both environment AND agriculture), consider whether agricultural matters should map to a dedicated agriculture policy domain, or whether the organ check should take precedence to avoid dual-categorization.
|
@copilot apply changes based on the comments in this thread |
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
…, and ILO word boundary
- Rename all ORGAN_NAMES entries to consistent 'Committee on [Subject]' pattern (no 'the' prefix)
- Move 'the' into the sentence template: 'referred to the ${organEntry.en}'
- Replace title.includes('beneficial') with /\bbeneficial owner(ship)?\b/ regex to avoid false positives
- Replace title.includes('ilo') with /\bilo\b/ word-boundary regex to avoid false positives (pilot, billion, etc.)
Co-authored-by: pethers <[email protected]>
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
| // Secondary: committee-specific context when organ is present but no domain matched | ||
| const organ = doc.organ || doc.committee || ''; | ||
| if (organ) { | ||
| const organEntry = ORGAN_NAMES[organ]; | ||
| if (organEntry) { | ||
| const isSv = lang === 'sv'; | ||
| return isSv | ||
| ? `Ärendet behandlas av ${organEntry.sv} för parlamentarisk beredning.` | ||
| : `This matter is referred to the ${organEntry.en} for parliamentary examination.`; | ||
| } |
There was a problem hiding this comment.
ORGAN_NAMES duplicates COMMITTEE_NAMES/getCommitteeName (same committee codes) but uses different English/Swedish spellings (e.g., KU: "Committee on the Constitution" vs getCommitteeName→"Constitutional Committee"; sv casing differs too). This can produce inconsistent committee naming within the same generated article (e.g., "Referred to: …" line vs the new fallback sentence) and risks drift over time. Prefer reusing getCommitteeName/COMMITTEE_NAMES for the fallback sentence, or update COMMITTEE_NAMES to the "Committee on …" convention and have both call sites share the same source of truth.
There was a problem hiding this comment.
Fixed in ffef423. ORGAN_NAMES is removed — generatePolicySignificance() now uses COMMITTEE_NAMES directly as the single source of truth. COMMITTEE_NAMES English entries are updated to the consistent 'Committee on [Subject]' convention (matching translation-dictionary.ts), and the Swedish name uses .toLowerCase() for correct mid-sentence casing in the fallback sentence.
|
@copilot apply changes based on the comments in this thread |
…licy significance fallback - Update COMMITTEE_NAMES to consistent 'Committee on [Subject]' English naming convention - Remove ORGAN_NAMES constant (was a duplicate of COMMITTEE_NAMES with different spellings) - generatePolicySignificance() now uses COMMITTEE_NAMES[organ] for the committee fallback, calling .toLowerCase() on the Swedish name for correct mid-sentence casing - Update test regex to match new 'Committee on Finance' naming Co-authored-by: pethers <[email protected]>
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
detectPolicyDomains()with missing keywords (defence, fiscal, migration, labour, agricultural)beneficial→/\bbeneficial owner(ship)?\b/to avoid false positivesilo→/\bilo\b/word-boundary regex to avoid false positivesgeneratePolicySignificance()for known organs with no domain matchORGAN_NAMESconstant —generatePolicySignificance()now usesCOMMITTEE_NAMESas the single source of truthCOMMITTEE_NAMESto consistent'Committee on [Subject]'English naming (matching translation-dictionary.ts convention)npx vitest run)Original prompt
This section details on the original issue you should resolve
<issue_title>fix(news): Generic "Why It Matters" fallback produces repetitive analysis in government propositions articles</issue_title>
<issue_description>## 📋 Issue Type
Bug Fix / Quality Improvement
🎯 Objective
Fix the
generatePolicySignificance()function inscripts/data-transformers.tsthat falls back to a generic "Requires committee review and chamber debate before a decision is reached" text whendetectPolicyDomains()returns no domains. This produces repetitive, low-quality "Why It Matters" analysis across government propositions articles.📊 Current State — Evidence from Recent PRs
Affected PR: #480 (merged Feb 24), older articles from Feb 22
Measured quality metrics (Feb 22 government propositions article):
Feb 24 articles (newer) have improved to 0 generic fallbacks — the agentic workflow enhanced them manually — but the underlying generator code still produces the same generic output for any proposition where
detectPolicyDomains()fails to match.Root cause:
generatePolicySignificance()at line ~2557 ofscripts/data-transformers.ts:The
detectPolicyDomains()function only matches a limited set of policy keywords. Many proposition topics (weapons acts, VAT fraud, e-ID reporting, beneficial ownership) don't match any domain, causing the generic fallback.🚀 Desired State
Expand
detectPolicyDomains()keyword coverage to recognize more policy areas:Improve the generic fallback to use proposition metadata (committee assignment
organ, ministry, document type) to generate a context-specific sentence even when no policy domain keyword is detected.Add a quality check in
generateDeepPolicyAnalysis()that warns when more than 30% of entries in a single article use the generic fallback.🔧 Implementation Approach
POLICY_DOMAIN_KEYWORDSindata-transformers.tswith missing domain categoriesgeneratePolicySignificance()to use committee/organ/ministry info as secondary signalnpx vitest run)🤖 Recommended Agent
code-quality-engineer — code refactoring to improve policy domain detection and reduce generic fallback usage
✅ Acceptance Criteria
detectPolicyDomains()covers defence, finance/regulatory, migration, labour, rural affairs, infrastructure domainsLABELSobjectnpx vitest run)📚 References
scripts/data-transformers.tslines 2540–2560 (generatePolicySignificance)news/2026-02-22-government-propositions-en.html— 8/10 genericnews/2026-02-24-government-propositions-en.html— 0/11 generic (enhanced by agent)<agent_instructions>Fix the generatePolicySignificance() function in scripts/data-transformers.ts.
Key tasks:
Root cause is at line ~2557 of scripts/data-transformers.ts where policySignificanceGeneric is returned. T...
🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.