Skip to content

Comments

fix(news): expand policy domain detection and add committee-name fallback in generatePolicySignificance#492

Merged
pethers merged 5 commits intomainfrom
copilot/fix-generic-fallback-analysis
Feb 24, 2026
Merged

fix(news): expand policy domain detection and add committee-name fallback in generatePolicySignificance#492
pethers merged 5 commits intomainfrom
copilot/fix-generic-fallback-analysis

Conversation

Copy link
Contributor

Copilot AI commented Feb 24, 2026

  • Expand detectPolicyDomains() with missing keywords (defence, fiscal, migration, labour, agricultural)
  • Fix beneficial/\bbeneficial owner(ship)?\b/ to avoid false positives
  • Fix ilo/\bilo\b/ word-boundary regex to avoid false positives
  • Add committee-name fallback in generatePolicySignificance() for known organs with no domain match
  • Remove duplicate ORGAN_NAMES constant — generatePolicySignificance() now uses COMMITTEE_NAMES as the single source of truth
  • Update COMMITTEE_NAMES to consistent 'Committee on [Subject]' English naming (matching translation-dictionary.ts convention)
  • Add 12 new test cases for expanded domains and committee fallback
  • All 536 data-transformer tests pass (npx vitest run)
Original prompt

This section details on the original issue you should resolve

<issue_title>fix(news): Generic "Why It Matters" fallback produces repetitive analysis in government propositions articles</issue_title>
<issue_description>## 📋 Issue Type
Bug Fix / Quality Improvement

🎯 Objective

Fix the generatePolicySignificance() function in scripts/data-transformers.ts that falls back to a generic "Requires committee review and chamber debate before a decision is reached" text when detectPolicyDomains() returns no domains. This produces repetitive, low-quality "Why It Matters" analysis across government propositions articles.

📊 Current State — Evidence from Recent PRs

Affected PR: #480 (merged Feb 24), older articles from Feb 22

Measured quality metrics (Feb 22 government propositions article):

  • 10 total "Why It Matters" entries
  • 8 out of 10 use the identical generic fallback text: "Requires committee review and chamber debate before a decision is reached."
  • Only 2 entries had domain-specific analysis (those touching "environmental and climate policy")

Feb 24 articles (newer) have improved to 0 generic fallbacks — the agentic workflow enhanced them manually — but the underlying generator code still produces the same generic output for any proposition where detectPolicyDomains() fails to match.

Root cause: generatePolicySignificance() at line ~2557 of scripts/data-transformers.ts:

// Generic significance when no domain detected
const genericVal = L(lang, 'policySignificanceGeneric');
return typeof genericVal === 'string' ? genericVal : 'Requires committee review and chamber debate before a decision is reached.';

The detectPolicyDomains() function only matches a limited set of policy keywords. Many proposition topics (weapons acts, VAT fraud, e-ID reporting, beneficial ownership) don't match any domain, causing the generic fallback.

🚀 Desired State

  1. Expand detectPolicyDomains() keyword coverage to recognize more policy areas:

    • Defence/security policy (weapons, military, preparedness, vapen, beredskap)
    • Financial regulation (VAT, moms, beneficial ownership, e-ID, kris)
    • Migration/immigration policy (migration, uppehållstillstånd, medborgarskap)
    • Labour/employment policy (arbetsmarknad, ILO, trakasserier)
    • Rural affairs/agriculture (veterinär, läkemedel, djur)
    • Infrastructure/transport
  2. Improve the generic fallback to use proposition metadata (committee assignment organ, ministry, document type) to generate a context-specific sentence even when no policy domain keyword is detected.

  3. Add a quality check in generateDeepPolicyAnalysis() that warns when more than 30% of entries in a single article use the generic fallback.

🔧 Implementation Approach

  1. Extend POLICY_DOMAIN_KEYWORDS in data-transformers.ts with missing domain categories
  2. Update generatePolicySignificance() to use committee/organ/ministry info as secondary signal
  3. Add test cases for the new domains
  4. Verify existing tests still pass (npx vitest run)

🤖 Recommended Agent

code-quality-engineer — code refactoring to improve policy domain detection and reduce generic fallback usage

✅ Acceptance Criteria

  • detectPolicyDomains() covers defence, finance/regulatory, migration, labour, rural affairs, infrastructure domains
  • Generic "Requires committee review..." fallback triggers for less than 10% of entries in generated articles
  • All 14 language translations for new domain labels exist in the LABELS object
  • Existing tests pass (npx vitest run)
  • New test cases cover expanded domain detection

📚 References

<agent_instructions>Fix the generatePolicySignificance() function in scripts/data-transformers.ts.

Key tasks:

  1. Expand detectPolicyDomains() keyword coverage to recognize defence/security, financial regulation, migration, labour, rural affairs, and infrastructure domains. Add Swedish keywords too (vapen, beredskap, moms, migration, arbetsmarknad, veterinär, etc.).
  2. As a fallback when no domain keyword matches, use the document's committee/organ field to infer the domain.
  3. Add tests for the new domains in tests/data-transformers.test.ts.
  4. Run npx vitest run to verify all tests pass.

Root cause is at line ~2557 of scripts/data-transformers.ts where policySignificanceGeneric is returned. T...


🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

…k in generatePolicySignificance

- Add defence keywords: vapen, beredskap, totalförsvar, krigsmateriel, säkerhetsskydd, weapon
- Add fiscal keywords: moms, mervärd, skattebedrägeri, e-id, e-legitimation, verklig huvudman, penningtvätt, beneficial
- Add migration keywords: uppehållstillstånd, medborgarskap, citizenship, utvisning, statslöshet
- Add labour keywords: ilo, trakasserier, kollektivavtal, lönediskriminering, harassment
- Add rural/agriculture keywords: jordbruk, lantbruk, veterinär, djur → trade and industry policy
- Add ORGAN_NAMES map for committee-specific fallback messages
- Improve generatePolicySignificance() to use committee name when organ is known but no domain matched
- Add 12 new test cases covering expanded domains and committee fallback

Co-authored-by: pethers <[email protected]>
Copilot AI changed the title [WIP] Fix repetitive fallback in why it matters analysis fix(news): expand policy domain detection and add committee-name fallback in generatePolicySignificance Feb 24, 2026
@pethers pethers requested a review from Copilot February 24, 2026 11:48
@github-actions github-actions bot added deployment Deployment configuration testing Test coverage refactor Code refactoring size-xl Extra large change (> 1000 lines) labels Feb 24, 2026
@github-actions
Copy link
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request enhances policy domain detection and adds committee-specific fallback logic to improve the quality of "Why It Matters" sections in government propositions articles. The changes address a root cause where ~80% of propositions fell back to generic boilerplate text due to insufficient keyword coverage in detectPolicyDomains().

Changes:

  • Expanded detectPolicyDomains() with 30+ new Swedish and English keywords across defence, fiscal, migration, labour, and agricultural policy domains
  • Added ORGAN_NAMES map with all 15 Riksdag committee codes and bilingual names for committee-specific fallback sentences
  • Enhanced generatePolicySignificance() to generate committee-specific context when no domain keywords match but organ field is present
  • Added 12 comprehensive test cases covering new keyword detection and committee fallback behavior

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 4 comments.

File Description
scripts/data-transformers.ts Expanded policy domain keywords in detectPolicyDomains(), added ORGAN_NAMES constant with committee translations, and enhanced generatePolicySignificance() with committee-specific fallback logic
tests/data-transformers.test.ts Added 12 new test cases verifying keyword detection for defence (vapen, beredskap), fiscal (moms, e-legitimation, verklig huvudman), migration (uppehållstillstånd), labour (trakasserier), agricultural (veterinär) topics, plus committee fallback tests for KU and KrU organs

Comment on lines 2551 to 2565
AU: { en: 'the Committee on Labour', sv: 'arbetsmarknadsutskottet' },
CU: { en: 'the Civil Affairs Committee', sv: 'civilutskottet' },
FiU: { en: 'the Finance Committee', sv: 'finansutskottet' },
FöU: { en: 'the Defence Committee', sv: 'försvarsutskottet' },
JuU: { en: 'the Justice Committee', sv: 'justitieutskottet' },
KrU: { en: 'the Committee on Cultural Affairs', sv: 'kulturutskottet' },
KU: { en: 'the Committee on the Constitution', sv: 'konstitutionsutskottet' },
MJU: { en: 'the Committee on Environment and Agriculture', sv: 'miljö- och jordbruksutskottet' },
NU: { en: 'the Committee on Industry and Trade', sv: 'näringsutskottet' },
SkU: { en: 'the Tax Committee', sv: 'skatteutskottet' },
SfU: { en: 'the Social Insurance Committee', sv: 'socialförsäkringsutskottet' },
SoU: { en: 'the Social Affairs Committee', sv: 'socialutskottet' },
TU: { en: 'the Transport Committee', sv: 'trafikutskottet' },
UbU: { en: 'the Committee on Education', sv: 'utbildningsutskottet' },
UU: { en: 'the Foreign Affairs Committee', sv: 'utrikesutskottet' },
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The English committee names in ORGAN_NAMES are inconsistent with the established naming convention in translation-dictionary.ts. Most committees should follow the pattern "Committee on [Subject]" without the "the" prefix, though the sentence construction at line 2600 requires "the" to be grammatically correct.

Inconsistencies found:

  • AU: Should be "Committee on Labour Market Affairs" (not "the Committee on Labour")
  • CU: Should be "Committee on Civil Affairs" (not "the Civil Affairs Committee")
  • FiU: Should be "Committee on Finance" (not "the Finance Committee")
  • FöU: Should be "Committee on Defence" (not "the Defence Committee")
  • JuU: Should be "Committee on Justice" (not "the Justice Committee")
  • SkU: Should be "Committee on Taxation" (not "the Tax Committee")
  • SfU: Should be "Committee on Social Insurance" (not "the Social Insurance Committee")
  • SoU: Should be "Committee on Social Affairs" (not "the Social Affairs Committee")
  • TU: Should be "Committee on Transport" (not "the Transport Committee")
  • UU: Should be "Committee on Foreign Affairs" (not "the Foreign Affairs Committee")

The sentence template at line 2600 uses "This matter is referred to ${organEntry.en}", which requires the article "the" to be included. The correct approach is to either: (1) include "the" in all committee names consistently, or (2) adjust the sentence template to add "the" dynamically (e.g., "This matter is referred to the ${organEntry.en}").

Suggested change
AU: { en: 'the Committee on Labour', sv: 'arbetsmarknadsutskottet' },
CU: { en: 'the Civil Affairs Committee', sv: 'civilutskottet' },
FiU: { en: 'the Finance Committee', sv: 'finansutskottet' },
FöU: { en: 'the Defence Committee', sv: 'försvarsutskottet' },
JuU: { en: 'the Justice Committee', sv: 'justitieutskottet' },
KrU: { en: 'the Committee on Cultural Affairs', sv: 'kulturutskottet' },
KU: { en: 'the Committee on the Constitution', sv: 'konstitutionsutskottet' },
MJU: { en: 'the Committee on Environment and Agriculture', sv: 'miljö- och jordbruksutskottet' },
NU: { en: 'the Committee on Industry and Trade', sv: 'näringsutskottet' },
SkU: { en: 'the Tax Committee', sv: 'skatteutskottet' },
SfU: { en: 'the Social Insurance Committee', sv: 'socialförsäkringsutskottet' },
SoU: { en: 'the Social Affairs Committee', sv: 'socialutskottet' },
TU: { en: 'the Transport Committee', sv: 'trafikutskottet' },
UbU: { en: 'the Committee on Education', sv: 'utbildningsutskottet' },
UU: { en: 'the Foreign Affairs Committee', sv: 'utrikesutskottet' },
AU: { en: 'Committee on Labour Market Affairs', sv: 'arbetsmarknadsutskottet' },
CU: { en: 'Committee on Civil Affairs', sv: 'civilutskottet' },
FiU: { en: 'Committee on Finance', sv: 'finansutskottet' },
FöU: { en: 'Committee on Defence', sv: 'försvarsutskottet' },
JuU: { en: 'Committee on Justice', sv: 'justitieutskottet' },
KrU: { en: 'the Committee on Cultural Affairs', sv: 'kulturutskottet' },
KU: { en: 'the Committee on the Constitution', sv: 'konstitutionsutskottet' },
MJU: { en: 'the Committee on Environment and Agriculture', sv: 'miljö- och jordbruksutskottet' },
NU: { en: 'the Committee on Industry and Trade', sv: 'näringsutskottet' },
SkU: { en: 'Committee on Taxation', sv: 'skatteutskottet' },
SfU: { en: 'Committee on Social Insurance', sv: 'socialförsäkringsutskottet' },
SoU: { en: 'Committee on Social Affairs', sv: 'socialutskottet' },
TU: { en: 'Committee on Transport', sv: 'trafikutskottet' },
UbU: { en: 'the Committee on Education', sv: 'utbildningsutskottet' },
UU: { en: 'Committee on Foreign Affairs', sv: 'utrikesutskottet' },

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2bd8d9d. All 15 entries now use the consistent 'Committee on [Subject]' pattern (no 'the' prefix in the data), and the sentence template adds 'the' dynamically: "This matter is referred to the ${organEntry.en} for parliamentary examination."

|| title.includes('makrotillsyn') || title.includes('macroprudential')
|| title.includes('moms') || title.includes('mervärd') || title.includes('skattebedrägeri')
|| title.includes('e-id') || title.includes('e-legitimation') || title.includes('verklig huvudman')
|| title.includes('penningtvätt') || title.includes('beneficial') || title.includes('fakturabedrägeri')
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The keyword "beneficial" at line 2311 could cause false positives by matching partial words or unrelated contexts (e.g., "beneficial effects", "mutually beneficial"). Consider using a more specific Swedish term or adding word boundary checks. The intended use case appears to be "beneficial ownership" (verklig huvudman), so a more precise match would be "beneficial ownership" as a phrase or relying solely on the Swedish "verklig huvudman".

Suggested change
|| title.includes('penningtvätt') || title.includes('beneficial') || title.includes('fakturabedrägeri')
|| title.includes('penningtvätt') || /\bbeneficial ownership\b/.test(title) || /\bbeneficial owner\b/.test(title) || title.includes('fakturabedrägeri')

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2bd8d9d. Changed to /\bbeneficial owner(ship)?\b/.test(title) to match only "beneficial owner" and "beneficial ownership" phrases.

set.add(isSv ? 'rättspolitik' : 'justice policy');
if (title.includes('arbetsmarknad') || title.includes('labour') || title.includes('anställning')
|| title.includes('facklig') || organ === 'AU')
|| title.includes('facklig') || title.includes('ilo') || title.includes('trakasserier')
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The keyword "ilo" at line 2343 should use word boundary checks to avoid false positives. Without word boundaries, it will match partial words like "pilot", "tailor", "billion", "kiloton", etc. Consider using a regex pattern like /\bilo\b/ similar to how "eu" is handled at line 2336, or use a more specific phrase like "ilo-konvention" to match the actual ILO convention context.

Suggested change
|| title.includes('facklig') || title.includes('ilo') || title.includes('trakasserier')
|| title.includes('facklig') || /\bilo\b/.test(title) || title.includes('trakasserier')

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2bd8d9d. Changed to /\bilo\b/.test(title) with a word boundary to avoid matching partial words like "pilot" or "billion".

Comment on lines 2354 to 2357
if (title.includes('näring') || title.includes('handel') || title.includes('trade') || title.includes('industri')
|| title.includes('företag') || organ === 'NU')
|| title.includes('företag') || title.includes('jordbruk') || title.includes('lantbruk')
|| title.includes('veterinär') || title.includes('djur') || organ === 'NU')
set.add(isSv ? 'näringspolitik' : 'trade and industry policy');
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding agricultural keywords (jordbruk, lantbruk, veterinär, djur) to "trade and industry policy" creates potential ambiguity for MJU (Committee on Environment and Agriculture) documents. A document from MJU with agricultural keywords in the title would match both "environmental and climate policy" (via organ === 'MJU' at line 2323) and "trade and industry policy" (via keyword matching at lines 2355-2356), resulting in two policy domains. While this might be intentional (MJU covers both environment AND agriculture), consider whether agricultural matters should map to a dedicated agriculture policy domain, or whether the organ check should take precedence to avoid dual-categorization.

Copilot uses AI. Check for mistakes.
@pethers
Copy link
Member

pethers commented Feb 24, 2026

@copilot apply changes based on the comments in this thread

@github-actions github-actions bot added the size-m Medium change (50-250 lines) label Feb 24, 2026
@github-actions
Copy link
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

…, and ILO word boundary

- Rename all ORGAN_NAMES entries to consistent 'Committee on [Subject]' pattern (no 'the' prefix)
- Move 'the' into the sentence template: 'referred to the ${organEntry.en}'
- Replace title.includes('beneficial') with /\bbeneficial owner(ship)?\b/ regex to avoid false positives
- Replace title.includes('ilo') with /\bilo\b/ word-boundary regex to avoid false positives (pilot, billion, etc.)

Co-authored-by: pethers <[email protected]>
@github-actions
Copy link
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment on lines 2592 to 2601
// Secondary: committee-specific context when organ is present but no domain matched
const organ = doc.organ || doc.committee || '';
if (organ) {
const organEntry = ORGAN_NAMES[organ];
if (organEntry) {
const isSv = lang === 'sv';
return isSv
? `Ärendet behandlas av ${organEntry.sv} för parlamentarisk beredning.`
: `This matter is referred to the ${organEntry.en} for parliamentary examination.`;
}
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ORGAN_NAMES duplicates COMMITTEE_NAMES/getCommitteeName (same committee codes) but uses different English/Swedish spellings (e.g., KU: "Committee on the Constitution" vs getCommitteeName→"Constitutional Committee"; sv casing differs too). This can produce inconsistent committee naming within the same generated article (e.g., "Referred to: …" line vs the new fallback sentence) and risks drift over time. Prefer reusing getCommitteeName/COMMITTEE_NAMES for the fallback sentence, or update COMMITTEE_NAMES to the "Committee on …" convention and have both call sites share the same source of truth.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ffef423. ORGAN_NAMES is removed — generatePolicySignificance() now uses COMMITTEE_NAMES directly as the single source of truth. COMMITTEE_NAMES English entries are updated to the consistent 'Committee on [Subject]' convention (matching translation-dictionary.ts), and the Swedish name uses .toLowerCase() for correct mid-sentence casing in the fallback sentence.

@pethers
Copy link
Member

pethers commented Feb 24, 2026

@copilot apply changes based on the comments in this thread

…licy significance fallback

- Update COMMITTEE_NAMES to consistent 'Committee on [Subject]' English naming convention
- Remove ORGAN_NAMES constant (was a duplicate of COMMITTEE_NAMES with different spellings)
- generatePolicySignificance() now uses COMMITTEE_NAMES[organ] for the committee fallback,
  calling .toLowerCase() on the Swedish name for correct mid-sentence casing
- Update test regex to match new 'Committee on Finance' naming

Co-authored-by: pethers <[email protected]>
@github-actions
Copy link
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

@pethers pethers marked this pull request as ready for review February 24, 2026 13:10
@pethers pethers merged commit 3fce140 into main Feb 24, 2026
19 checks passed
@pethers pethers deleted the copilot/fix-generic-fallback-analysis branch February 24, 2026 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deployment Deployment configuration refactor Code refactoring size-m Medium change (50-250 lines) size-xl Extra large change (> 1000 lines) testing Test coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(news): Generic "Why It Matters" fallback produces repetitive analysis in government propositions articles

2 participants