Skip to content

Comments

fix: Unknown author sentinel, data-translate span leakage, motion proposition grouping#457

Merged
pethers merged 5 commits intomainfrom
copilot/generate-dynamic-article-titles
Feb 23, 2026
Merged

fix: Unknown author sentinel, data-translate span leakage, motion proposition grouping#457
pethers merged 5 commits intomainfrom
copilot/generate-dynamic-article-titles

Conversation

Copy link
Contributor

Copilot AI commented Feb 23, 2026

Three related content-quality bugs in scripts/data-transformers.ts causing "Filed by: Unknown (Unknown)", false validation failures on 131 non-Swedish articles, and repetitive motion section headers.

#454 — Unknown author/party sentinel bypass

enrichDocumentsWithContent sets intressent_namn='Unknown' / parti='Unknown' as sentinels on enrichment failure. The old if (!authorName) guard was skipped because 'Unknown' is truthy, so parseMotionAuthorParty was never invoked.

  • generateMotionsContent: guard now checks !authorName || authorName === 'Unknown'; sentinel values stripped before the fallback parse runs; motion.titel/rubrik added as raw-text sources
  • generateEnhancedSummary: same sentinel guard for the motion author/party metadata line
  • 'Unknown' parti now buckets into 'other' in the party-breakdown section

#458data-translate="true" leaking into non-Swedish articles

All Swedish API content was unconditionally wrapped in <span data-translate="true" lang="sv">. No client-side translator exists, so this marker appears 20+ times per article in non-Swedish files and fails validation.

Added svSpan(escapedText, lang) helper:

function svSpan(escapedText: string, lang: Language | string): string {
  if (lang === 'sv') {
    return `<span data-translate="true" lang="sv">${escapedText}</span>`;
  }
  return `<span lang="sv">${escapedText}</span>`;
}

Replaced all 14 call sites. Swedish articles retain the marker; non-Swedish articles get lang="sv" only (screen-reader accessibility preserved).

#462 — Motions referencing the same proposition rendered under separate headers

Multiple motions responding to the same government proposition each rendered their own section, duplicating the proposition title repeatedly.

Added extractPropRef(title) (extracts YYYY/YY:NNN from "med anledning av prop. …" prefixes) and groupMotionsByProposition(motions) which partitions into a Map<propRef, motions[]> and an independent[] list. generateMotionsContent now renders:

  • Responses to Government Propositions — one <h3>Prop. 2025/26:118: [title]</h3> per proposition grouping all related motions beneath it
  • Independent Motions — only rendered when both grouped and independent motions are present

Section headers are translated for all 14 languages.

Original prompt

This section details on the original issue you should resolve

<issue_title>Generate content-based article titles instead of static repeated headers</issue_title>
<issue_description>## 📋 Issue Type
Feature / Content Quality

🎯 Objective

Replace the static, generic article titles with content-aware, dynamic titles that reflect the actual substance of each article. Currently, every motions article uses "Opposition Motions: Battle Lines This Week", every propositions article uses "Government Propositions: Policy Priorities This Week", and every committee-reports article uses "Committee Reports: Parliamentary Priorities This Week" — regardless of the actual content.

📊 Current State

In scripts/data-transformers.ts title generation (around lines 539-814):

  • Hardcoded static titles per article type and language
  • Same title used across all dates: "Battle Lines This Week" appears 15 times across articles
  • No connection between title and actual article content
  • The agentic workflow prompt says to generate content-based titles, but the TypeScript pipeline uses hardcoded ones

Evidence: grep -oh 'Battle Lines This Week' news/*.html | wc -l returns 15

🚀 Desired State

  • Titles should summarize the actual content themes (e.g., "Opposition Targets Renewable Energy Permits and Waste Reform" instead of "Battle Lines This Week")
  • Generate titles from the top 2-3 policy domains covered by the documents
  • Include date context where relevant (e.g., "Week of February 22")
  • Maintain translations for all 14 languages but base them on actual content

🔧 Implementation Approach

Target file: scripts/data-transformers.ts (title generation sections)

  1. After processing documents, extract the top policy domains/themes
  2. Build a content-based title from the most prominent themes
  3. Keep language-specific title templates but make them parameterized
  4. Example: generateContentTitle(docs, lang) → "Opposition Challenges Energy and Housing Reforms"
  5. Fallback to generic titles only if no content themes can be extracted

✅ Acceptance Criteria

  • Article titles reflect actual document content themes
  • No two consecutive articles have identical titles (unless content is identical)
  • Titles are properly translated for all 14 languages
  • Generic fallback titles still work when content analysis fails
  • Unit tests verify content-based title generation

📚 References

  • scripts/data-transformers.ts lines 539-555 (week-ahead titles)
  • scripts/data-transformers.ts lines 714-729 (proposition titles)
  • scripts/data-transformers.ts lines 798-814 (motion titles)
  • Workflow prompt: "An analytical lede paragraph about opposition strategy, not just a motion count"

🤖 Recommended Agent

Copilot coding agent — modify title generation in data-transformers.ts</issue_description>

<agent_instructions>Focus ONLY on the title generation functions in scripts/data-transformers.ts. Look for the hardcoded TitleSet objects (around lines 539-555 for week-ahead, 714-729 for propositions, 798-814 for motions). Create a new helper function generateContentTitle(documents, lang, articleType) that extracts top policy domains from document content and builds a dynamic title. Keep the existing static titles as fallbacks. Update tests in tests/data-transformers.test.ts. Do NOT modify mcp-client.ts, generate-news-enhanced.ts, or any workflow files.</agent_instructions>

Comments on the Issue (you are @copilot in this section)


🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

- Add extractPolicyDomains() (private) to data-transformers.ts
- Add generateContentTitle() (exported) to data-transformers.ts
- Wire generateContentTitle into getTitles() in motions.ts,
  propositions.ts, and committee-reports.ts with static fallback
- Add 35 unit tests in tests/data-transformers.test.ts covering
  all 14 languages, all 3 article types, fallback, and frequency ranking

Co-authored-by: pethers <[email protected]>
Copilot AI changed the title [WIP] Update article titles to be content-aware and dynamic feat: generate content-based article titles from top policy domains Feb 23, 2026
Copilot AI requested a review from pethers February 23, 2026 08:20
…ouping

- Fix #454: Skip 'Unknown' sentinel in generateMotionsContent and
  generateEnhancedSummary; fall back to parseMotionAuthorParty on
  motion.titel/rubrik before displaying 'Unknown'. Also treat 'Unknown'
  parti as ungrouped in the party-breakdown section.

- Fix #458: Add svSpan(text, lang) helper — emits lang="sv" only for
  non-Swedish articles (no data-translate="true" marker), preventing
  validation failures in the 131 non-Swedish articles. Swedish articles
  still carry the data-translate marker. Replace all 14 call sites.
  Add undertitel field to RawDocument interface.

- Fix #462: Add extractPropRef() + groupMotionsByProposition() helpers.
  generateMotionsContent now groups motions by parent proposition under
  'Responses to Government Propositions' section header, eliminates
  repetitive duplication of the same prop title.

- Update 6 existing tests and add 10 new tests (1630 total, all green)

Co-authored-by: pethers <[email protected]>
Copilot AI changed the title feat: generate content-based article titles from top policy domains fix: Unknown author sentinel, data-translate span leakage, motion proposition grouping Feb 23, 2026
@pethers pethers requested a review from Copilot February 23, 2026 10:46
@github-actions github-actions bot added deployment Deployment configuration testing Test coverage refactor Code refactoring size-xl Extra large change (> 1000 lines) labels Feb 23, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request fixes three content-quality bugs in the article generation pipeline that were causing poor quality output across all 14 language versions:

Changes:

  • Fixed #454: Unknown author sentinel bypass where 'Unknown' string sentinels from enrichDocumentsWithContent were treated as valid data, preventing fallback author/party parsing
  • Fixed #458: Eliminated data-translate="true" spans from non-Swedish articles by introducing svSpan() helper that conditionally adds the marker only for Swedish articles while preserving lang="sv" accessibility attribute for all languages
  • Fixed #462: Grouped motions by parent proposition reference to eliminate repetitive section headers where multiple motions responding to the same proposition each rendered separate h3 headers
  • Added #456: Implemented generateContentTitle() for content-based article titles derived from policy domain keywords, replacing generic static titles with dynamic summaries of actual document themes

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
scripts/data-transformers.ts Core implementation: svSpan() helper (14 call sites updated), Unknown sentinel checks in generateMotionsContent/generateEnhancedSummary, extractPropRef()/groupMotionsByProposition() functions, generateContentTitle() with policy domain extraction and 14-language translation tables
scripts/news-types/motions.ts Integrated generateContentTitle() into getTitles() with fallback to static titles
scripts/news-types/propositions.ts Integrated generateContentTitle() into getTitles() with fallback to static titles
scripts/news-types/committee-reports.ts Integrated generateContentTitle() into getTitles() with fallback to static titles
tests/data-transformers.test.ts Updated 11 existing tests to expect no data-translate in non-SV articles, added 3 new test suites: generateContentTitle (78 tests), Unknown sentinel fix (2 tests), motion grouping (3 tests)
Comments suppressed due to low confidence (3)

scripts/data-transformers.ts:2844

  • In extractPolicyDomains(), the function scans document text for Swedish keywords to determine policy domains. However, it uses simple string inclusion checks (text.includes(kw)) without word boundaries. This means keywords like "energi" will match "energiminister", "förnybar" will match "förnybartdirektivet", and so on. This is acceptable for broad domain detection, but could lead to false positives if a keyword appears as a substring in an unrelated term. Consider whether this fuzziness is intentional for catching variations, or if word-boundary matching would be more precise.
function extractPolicyDomains(documents: RawDocument[]): string[] {
  const counts: Record<string, number> = {};
  for (const doc of documents) {
    const text = `${doc.titel ?? ''} ${doc.rubrik ?? ''} ${doc.summary ?? ''} ${doc.notis ?? ''}`.toLowerCase();
    for (const [domain, keywords] of Object.entries(POLICY_DOMAIN_KEYWORDS)) {
      if (keywords.some(kw => text.includes(kw))) {

scripts/data-transformers.ts:2881

  • In generateContentTitle(), when fewer than 2 domains are detected, the function returns null and callers fall back to static titles. However, there's no fallback when exactly 2 domains are found but the translation lookup fails (lines 2877-2881). If either DOMAIN_TRANSLATIONS[domains[0]] or DOMAIN_TRANSLATIONS[domains[1]] is undefined, or if the language key doesn't exist in those translations, d1 or d2 will be undefined, causing the null check at line 2881 to return null. While this is safe, it means translation table completeness is critical. Consider adding a comment warning that all domains in POLICY_DOMAIN_KEYWORDS must have complete translations in DOMAIN_TRANSLATIONS for all 14 languages.
  const domains = extractPolicyDomains(documents);
  if (domains.length < 2) return null;

  const langKey = lang as string;
  const trans0 = DOMAIN_TRANSLATIONS[domains[0]!];
  const trans1 = DOMAIN_TRANSLATIONS[domains[1]!];
  const d1 = trans0?.[langKey] ?? trans0?.['en'];
  const d2 = trans1?.[langKey] ?? trans1?.['en'];
  if (!d1 || !d2) return null;

scripts/data-transformers.ts:2003

  • In groupMotionsByProposition(), when a motion references a proposition via extractPropRef() but the ref is truthy (line 1998), it's added to the grouped map. However, if extractPropRef() returns an empty string (which is falsy in JavaScript), the motion will be treated as independent. The regex pattern /med anledning av prop\.\s+(\S+)/i should always capture at least one non-whitespace character if it matches, but consider explicitly checking for empty strings if defensive programming is desired, or document that extractPropRef() returns either a non-empty string or null.
  for (const motion of motions) {
    const title = motion.titel || motion.title || '';
    const ref = extractPropRef(title);
    if (ref) {
      if (!grouped.has(ref)) grouped.set(ref, []);
      grouped.get(ref)!.push(motion);
    } else {
      independent.push(motion);
    }

@pethers pethers marked this pull request as ready for review February 23, 2026 11:46
@pethers pethers merged commit c6a7fbe into main Feb 23, 2026
10 of 11 checks passed
@pethers pethers deleted the copilot/generate-dynamic-article-titles branch February 23, 2026 11:47
@github-actions github-actions bot added the size-l Large change (250-1000 lines) label Feb 23, 2026
@github-actions
Copy link
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

@github-actions
Copy link
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Copilot AI added a commit that referenced this pull request Feb 23, 2026
- Take origin/main as base for data-transformers.ts, types/content.ts and
  tests/data-transformers.test.ts (which had landed PRs #453, #457, etc.)
- Apply our PR changes on top:
  * extractPropRef: tighten regex to \d{4}\/\d{2}:\d+ (strict ID format)
  * groupMotionsByProposition: add export keyword; keep {grouped,independent}
    return type from main; replace inline lang-switches with L() labels
  * generateMotionsContent: use L(lang,'responsesToProp') and
    L(lang,'independentMotions') instead of 14-branch ternary chains
  * groupPropositionsByCommittee: new exported helper (Map<committeeKey,[...]>)
  * generatePropositionsContent: committee-grouped rendering (h3 per committee,
    h4 per prop when multi-committee; h3 per prop + Referred-to line otherwise)
    reuses byCommitteeGroup.size for policyImplicationsContext domainCount
  * ContentLabelSet: add responsesToProp and independentMotions fields
  * CONTENT_LABELS: add both new fields to all 14 language objects
- Tests: add unit tests for exported helpers ({grouped,independent} API),
  label presence tests (28 tests for 2 keys × 14 langs), proposition grouping
  rendering tests; all 1674 tests passing across 39 test files

Co-authored-by: pethers <[email protected]>
Copilot AI added a commit that referenced this pull request Feb 23, 2026
… take-main + apply)

Take origin/main as clean base for data-transformers.ts, types/content.ts,
tests/data-transformers.test.ts and sitemap.xml (main had landed PRs #453,
#457, #469 etc.), then re-apply every PR-specific change on top:

data-transformers.ts:
  - svSpan(): fix data-translate direction per translation workflow
    (lang!='sv' → data-translate="true"; lang='sv' → no marker)
  - PROP_REFERENCE_REGEX: strict \d{4}\/\d{2}:\d+ (was \S+)
  - PROP_FULL_REF_REGEX: non-greedy [^<]+? with HTML-tag lookahead
  - export groupMotionsByProposition (was private)
  - generateMotionsContent: replace 14-branch ternaries with
    L(lang,'responsesToProp') / L(lang,'independentMotions')
  - use PROP_FULL_REF_REGEX + svSpan() for proposition group heading
  - export groupPropositionsByCommittee (new)
  - generatePropositionsContent: committee-grouped rendering
    (h3+h4 when multi-committee; h3+referredLine when single-committee)
  - Add responsesToProp + independentMotions to all 14 language objects

scripts/types/content.ts:
  - Add responsesToProp: string; independentMotions: string to ContentLabelSet

tests/data-transformers.test.ts:
  - Import groupMotionsByProposition and groupPropositionsByCommittee
  - Fix 8 existing svSpan tests: flip data-translate expectations to match
    corrected svSpan() semantics
  - Add 5 unit tests for groupMotionsByProposition (strict-ID regex, fallback
    to title field, empty input, total-count invariant)
  - Add 5 unit tests for groupPropositionsByCommittee (organ/committee fallback,
    empty-string key, empty input, total-count invariant)
  - Add 28 label-presence tests (2 keys × 14 langs)
  - Add 4 rendering tests (single-committee h3, multi-committee h4,
    referredLine shown/hidden)

All 1672 tests pass across 39 test files.

Co-authored-by: pethers <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deployment Deployment configuration refactor Code refactoring size-l Large change (250-1000 lines) size-xl Extra large change (> 1000 lines) testing Test coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Generate content-based article titles instead of static repeated headers

2 participants