Group duplicate motions by parent proposition; group propositions by committee#463
Group duplicate motions by parent proposition; group propositions by committee#463
Conversation
- Add groupMotionsByProposition() exported helper that groups motions by their parent proposition reference (pattern: med anledning av prop. YYYY/YY:NNN) - Modify generateMotionsContent() to render proposition-grouped sections: - Proposition groups get h3 heading (In response to: Prop. YYYY/YY:NNN ...) - Individual motions within groups get h4 headings showing their document ID - Independent motions use h3 with their title (unchanged behavior) - Independent Motions section header added when mixed with grouped motions - Add responsesToProp and independentMotions labels to all 14 languages - Add responsesToProp and independentMotions to ContentLabelSet type - Add 40 new unit tests for groupMotionsByProposition and grouped rendering Co-authored-by: pethers <[email protected]>
…cing - Extract PROP_REFERENCE_REGEX and PROP_FULL_REF_REGEX as named constants to avoid duplication between groupMotionsByProposition() and the rendering loop - Replace non-null assertion (!) with ?? [] and optional chaining for safer access Co-authored-by: pethers <[email protected]>
…ndering - Add groupPropositionsByCommittee() exported helper: groups propositions by organ/committee field into Map<string, RawDocument[]>; key '' for uncategorised - Modify generatePropositionsContent() to render committee-grouped sections: - Multiple committees → h3 section per committee, h4 per proposition - Single committee (no heading shown) → h3 per proposition + 'Referred to' line preserved - Reuse byCommitteeGroup.size for domain count in Policy Implications (removes separate counter) - Add 9 unit tests for groupPropositionsByCommittee and committee-grouped rendering - 468 total tests passing; CodeQL: 0 alerts Co-authored-by: pethers <[email protected]>
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
There was a problem hiding this comment.
Pull request overview
This pull request introduces motion and proposition grouping logic to eliminate visual repetition in generated articles. Motions referencing the same government proposition are now grouped under a single heading, and propositions are grouped by their referred committee.
Changes:
- Adds
groupMotionsByProposition()helper that extracts proposition references using regex patterns and returns a Map grouping motions by proposition ID - Adds
groupPropositionsByCommittee()helper that groups propositions by theirorgan/committeefield - Refactors motion rendering to use h3 for group headings and h4 for individual motion entries within groups
- Refactors proposition rendering to show committee section headings when multiple committees are present
- Adds
responsesToPropandindependentMotionslabels to all 14 language localization files
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| scripts/data-transformers.ts | Core changes: adds two new exported grouping functions, two private regex constants, refactors generateMotionsContent and generatePropositionsContent to render grouped entries with conditional heading levels (h3 vs h4) |
| scripts/types/content.ts | Extends ContentLabelSet interface with two new string properties for localized labels |
| tests/data-transformers.test.ts | Comprehensive test coverage: 49 new tests covering grouping logic, label presence across 14 languages, and rendering behavior for grouped vs ungrouped scenarios |
scripts/data-transformers.ts
Outdated
| /** Regex to capture the full proposition reference text (including description) */ | ||
| const PROP_FULL_REF_REGEX = /med anledning av (prop\.\s*\S+(?:\s+.+)?)/i; |
There was a problem hiding this comment.
The PROP_FULL_REF_REGEX pattern uses a greedy (?:\s+.+)? suffix which will capture everything until the end of the string, potentially including unwanted trailing content. This could lead to excessively long proposition headings that include motion-specific text rather than just the proposition reference.
Consider using a non-greedy match or limiting the capture to a reasonable length. For example, the pattern could be:
/med anledning av (prop\.\s*\S+(?:\s+[^<]+?)?(?=\s*$|<))/ito stop at HTML tags- Or simply capturing up to a certain number of words after the proposition ID
This would ensure that only the proposition title is included in the group heading, not the entire motion title.
| /** Regex to capture the full proposition reference text (including description) */ | |
| const PROP_FULL_REF_REGEX = /med anledning av (prop\.\s*\S+(?:\s+.+)?)/i; | |
| /** Regex to capture the full proposition reference text (including description, but not trailing HTML or motion text) */ | |
| const PROP_FULL_REF_REGEX = /med anledning av (prop\.\s*\S+(?:\s+[^<]+?)?(?=\s*$|<))/i; |
scripts/data-transformers.ts
Outdated
| const PROP_REFERENCE_REGEX = /med anledning av prop\.\s+(\S+)/i; | ||
|
|
||
| /** Regex to capture the full proposition reference text (including description) */ | ||
| const PROP_FULL_REF_REGEX = /med anledning av (prop\.\s*\S+(?:\s+.+)?)/i; |
There was a problem hiding this comment.
The regex pattern PROP_REFERENCE_REGEX uses \S+ to capture the proposition ID, which will match any sequence of non-whitespace characters. This is too permissive and could capture additional text beyond the proposition ID (e.g., "2025/26:118Tillståndsprövning" if there's no space).
Consider using a more specific pattern that matches the expected format: /med anledning av prop(?:ositionen?)?\.\s+(\d{4}\/\d{2}:\d+)/i
This would ensure only valid proposition IDs like "2025/26:118" are captured, matching the format documented in the repository memories.
| const PROP_REFERENCE_REGEX = /med anledning av prop\.\s+(\S+)/i; | |
| /** Regex to capture the full proposition reference text (including description) */ | |
| const PROP_FULL_REF_REGEX = /med anledning av (prop\.\s*\S+(?:\s+.+)?)/i; | |
| const PROP_REFERENCE_REGEX = /med anledning av prop(?:itionen?)?\.\s+(\d{4}\/\d{2}:\d+)/i; | |
| /** Regex to capture the full proposition reference text (including description) */ | |
| const PROP_FULL_REF_REGEX = /med anledning av (prop(?:itionen?)?\.\s*\d{4}\/\d{2}:\d+(?:\s+.+)?)/i; |
|
@copilot apply changes based on the comments in this thread |
- PROP_REFERENCE_REGEX: replace \S+ with \d{4}\/\d{2}:\d+ to only match
valid proposition ID format (e.g. 2025/26:118), rejecting malformed input
- PROP_FULL_REF_REGEX: replace greedy (?:\s+.+)? with non-greedy [^<]+?
anchored to end-of-string/HTML-tag boundary to prevent capturing excess
trailing content when motion titles contain additional motion-specific text
Co-authored-by: pethers <[email protected]>
Done in 94b137e. Both patterns tightened:
|
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 4 changed files in this pull request and generated 26 comments.
Comments suppressed due to low confidence (3)
scripts/data-transformers.ts:2365
- Missing closing brace and duplicate conflicting motion rendering logic. Lines 2311-2337 implement proposition-based grouping (using groupMotionsByProposition), but line 2337 is missing a closing brace for the independent.forEach. Then lines 2338-2365 implement a completely different thematic grouping approach that conflicts with the proposition grouping. This appears to be an incomplete merge conflict resolution - only one grouping strategy should be used.
independent.forEach(motion => { content += renderMotion(motion, 'h3'); });
// Group motions by primary policy theme for thematic analysis
const byTheme: Record<string, RawDocument[]> = {};
motions.forEach(motion => {
const domains = detectPolicyDomains(motion, lang);
const theme = domains[0] || String(L(lang, 'generalMatters'));
if (!byTheme[theme]) byTheme[theme] = [];
byTheme[theme].push(motion);
});
const themeCount = Object.keys(byTheme).length;
if (themeCount > 1) {
// Thematic sections: group motions by policy area
content += `\n <h2>${L(lang, 'thematicAnalysis')}</h2>\n`;
Object.entries(byTheme).forEach(([theme, themeMotions]) => {
content += `\n <h3>${escapeHtml(theme)} (${themeMotions.length})</h3>\n`;
themeMotions.forEach(motion => {
// Demote motion entry headings one level when inside a themed section
const entryHtml = renderMotionEntry(motion, lang);
const demotedHtml = entryHtml
.replace(/<h3(\b[^>]*)?>/g, '<h4$1>')
.replace(/<\/h3>/g, '</h4>');
content += demotedHtml;
});
});
} else {
// Single theme or no detection: flat list
motions.forEach(motion => { content += renderMotionEntry(motion, lang); });
}
scripts/data-transformers.ts:2195
- Duplicate function definition. The extractPropRef function is defined twice (lines 2145-2148 and 2192-2195) with different regex patterns. The first uses PROP_REFERENCE_REGEX (strict YYYY/YY:NNN pattern) while the second uses a looser /\S+/ pattern. This will cause a redeclaration error. The duplicate definition on lines 2187-2195 should be removed since the exported function already calls the first definition.
/**
* Extract the parent proposition reference (e.g. "2025/26:118") from a motion title.
* Motions responding to a government proposition have titles like
* "med anledning av prop. 2025/26:118 Tillståndsprövning enligt förnybartdirektivet".
*/
function extractPropRef(title: string): string | null {
const m = title.match(/med anledning av prop\.\s+(\S+)/i);
return m?.[1] || null;
}
scripts/data-transformers.ts:237
- The svSpan function's JSDoc comment contains conflicting documentation. Lines 218-227 correctly describe the new behavior (sv→no marker, non-sv→with marker), but lines 228-237 describe the old incorrect behavior. The outdated documentation in lines 228-237 should be removed to match the actual implementation.
* For Swedish articles (`lang === 'sv'`) the span carries both the
* `lang="sv"` accessibility attribute AND `data-translate="true"` so
* quality-validation tooling can verify that Swedish articles contain the
* original text.
*
* For **all other** languages the span carries only `lang="sv"` (screen
* readers still know the text is Swedish) but the `data-translate` marker is
* intentionally omitted — it signals "this text should be translated" but no
* client-side translation mechanism exists, so the marker only causes false
* validation failures in non-Swedish articles.
| policySignificanceTouches: (domains: string): string => `Raakt aan ${domains}.`, | ||
| policySignificanceGeneric: 'Vereist commissiebehandeling en plenair debat voor een besluit wordt genomen.', |
There was a problem hiding this comment.
Duplicate property definitions in Dutch labels. The 'policySignificanceTouches' and 'policySignificanceGeneric' properties are defined twice with different implementations. The second definitions will override the first. Remove the duplicate definitions on lines 1047-1048.
| policySignificanceTouches: (domains: string): string => `Raakt aan ${domains}.`, | |
| policySignificanceGeneric: 'Vereist commissiebehandeling en plenair debat voor een besluit wordt genomen.', |
scripts/data-transformers.ts
Outdated
| // Group motions by party for strategic analysis | ||
| const byParty: Record<string, RawDocument[]> = {}; | ||
| motions.forEach(motion => { | ||
| const party = (motion.parti && motion.parti !== 'Unknown') ? motion.parti : 'other'; |
There was a problem hiding this comment.
Duplicate variable declaration for 'party'. Line 2243 declares it with Unknown-handling logic, but line 2244 immediately redeclares it with normalizePartyKey logic. This will cause a syntax error or the first declaration will be ignored. Only one declaration should remain - likely the normalizePartyKey version on line 2244.
| const party = (motion.parti && motion.parti !== 'Unknown') ? motion.parti : 'other'; |
| policySignificanceTouches: (domains: string): string => `Touches on ${domains}.`, | ||
| policySignificanceGeneric: 'Requires committee review and chamber debate before a decision is reached.', |
There was a problem hiding this comment.
Duplicate property definitions. The 'policySignificanceTouches' and 'policySignificanceGeneric' properties are defined twice (lines 443-444 and 447-448) with different implementations. The second definition will override the first. Only one definition of each property should remain - likely the more detailed versions on lines 443-444 that mention "Committee review and potential chamber vote".
| policySignificanceTouches: (domains: string): string => `Touches on ${domains}.`, | |
| policySignificanceGeneric: 'Requires committee review and chamber debate before a decision is reached.', |
| policySignificanceTouches: (domains: string): string => `${domains} 분야에 관련됩니다.`, | ||
| policySignificanceGeneric: '결정 전에 위원회 심사와 본회의 토론이 필요합니다.', |
There was a problem hiding this comment.
Duplicate property definitions in Korean labels. The 'policySignificanceTouches' and 'policySignificanceGeneric' properties are defined twice with different implementations. The second definitions will override the first. Remove the duplicate definitions on lines 1347-1348.
| policySignificanceTouches: (domains: string): string => `${domains} 분야에 관련됩니다.`, | |
| policySignificanceGeneric: '결정 전에 위원회 심사와 본회의 토론이 필요합니다.', |
tests/data-transformers.test.ts
Outdated
| expect(content).toContain('data-translate="true"'); | ||
| // Non-Swedish: lang="sv" for accessibility but no data-translate marker | ||
| expect(content).not.toContain('data-translate="true"'); |
There was a problem hiding this comment.
Contradictory test assertions. Line 625 expects data-translate="true" to be present, but line 627 expects it NOT to be present. Both cannot be true simultaneously. Based on the svSpan fix described in previous feedback, non-Swedish articles should contain the data-translate marker, so line 627 should be removed.
tests/data-transformers.test.ts
Outdated
| expect(content).toContain('data-translate="true"'); | ||
| // Non-Swedish: lang="sv" for accessibility but no data-translate marker | ||
| expect(content).not.toContain('data-translate="true"'); |
There was a problem hiding this comment.
Contradictory test assertions. Line 764 expects data-translate="true" to be present, but line 766 expects it NOT to be present. Both cannot be true simultaneously. Based on the svSpan fix, non-Swedish articles should contain the data-translate marker, so line 766 should be removed.
scripts/data-transformers.ts
Outdated
| // Group motions by primary policy theme for thematic analysis | ||
| const byTheme: Record<string, RawDocument[]> = {}; | ||
| motions.forEach(motion => { |
There was a problem hiding this comment.
Reference to undefined function 'renderMotionEntry'. Lines 2355 and 2364 call renderMotionEntry(motion, lang), but this function is defined outside the changed regions (line 2389). However, within the changed region there's already a 'renderMotion' function defined (line 2263) that serves the same purpose. The code in lines 2338-2365 appears to be leftover from a merge conflict that should have been removed - it references functions and uses a thematic grouping approach that conflicts with the proposition-based grouping implemented in lines 2311-2337.
| return `<span data-translate="true" lang="sv">${escapedText}</span>`; | ||
| } | ||
| return `<span lang="sv">${escapedText}</span>`; |
There was a problem hiding this comment.
The svSpan function contains unreachable duplicate code. Lines 243-246 implement the correct logic, but lines 247-249 contain old conflicting code that will never execute (unreachable after the return statement on line 246). The duplicate lines 247-249 should be removed to avoid confusion and maintain code clarity.
| it('should use data-translate="true" for Swedish (sv) articles with Swedish title', () => { | ||
| const content = generateArticleContent( | ||
| { motions: [{ titel: 'Djurskydd', url: '#', parti: 'MP', intressent_namn: 'Test' }] } as MockArticlePayload, | ||
| 'motions', | ||
| 'sv' | ||
| ) as string; | ||
| // Swedish articles: text already in target language, no data-translate marker needed | ||
| expect(content).not.toContain('data-translate="true"'); | ||
| expect(content).toContain('data-translate="true"'); |
There was a problem hiding this comment.
Duplicate and contradictory test name and assertions. Line 680 declares a test that should NOT use data-translate for Swedish articles, but line 681 immediately redeclares the same test saying it SHOULD use data-translate. Then line 688 expects it NOT to be present while line 689 expects it to be present. Based on the svSpan fix, Swedish articles should NOT have the marker, so lines 681 and 689 should be removed.
tests/data-transformers.test.ts
Outdated
| expect(contentWithoutMetadata).toContain('data-translate="true"'); | ||
| // Non-Swedish: lang="sv" span for title only, no data-translate markers | ||
| expect(contentWithoutMetadata).not.toContain('data-translate="true"'); |
There was a problem hiding this comment.
Contradictory test assertions. Line 744 expects data-translate="true" to be present, but line 746 expects it NOT to be present. Both cannot be true simultaneously. Based on the svSpan fix, non-Swedish articles should contain the data-translate marker, so line 746 should be removed.
|
@copilot apply changes based on the comments in this thread |
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
|
@copilot try again |
|
@pethers Unfortunately I hit an unexpected error while processing your comment. I've automatically reported this to GitHub. You can ask me to try again later by mentioning me in a new comment. If you want to contact GitHub about this error, please mention the following identifier so they can better serve you: Sorry for the inconvenience! |
Main advanced 20+ commits (including PR #461 thematic grouping in generateMotionsContent). Takes FETCH_HEAD as clean base and applies every PR-specific change on top, combining proposition grouping with thematic analysis rather than replacing it. scripts/types/content.ts: - Add responsesToProp: string; independentMotions: string to ContentLabelSet scripts/data-transformers.ts: - Add PROP_REFERENCE_REGEX (strict \d{4}\/\d{2}:\d+), PROP_FULL_REF_REGEX, PROP_TITLE_SUFFIX_REGEX constants - extractPropRef: use PROP_REFERENCE_REGEX (was loose \S+) - Export groupMotionsByProposition (was private) - Add exported groupPropositionsByCommittee - Add responsesToProp + independentMotions to all 14 language blocks - generatePropositionsContent: committee-grouped rendering via groupPropositionsByCommittee (h3 committee + h4 props when multi-committee; h3 props + referredLine when single-committee) - generateMotionsContent: proposition grouping runs first (h2→h3 prop→h4 motions for "med anledning av" motions); independent motions fall through to existing thematic analysis; "Thematic Analysis" h2 suppressed when already under "Independent Motions" h2 to avoid consecutive h2 headers; thematic grouping fully preserved for pure-independent-motion articles tests/data-transformers.test.ts: - Add groupMotionsByProposition + groupPropositionsByCommittee to imports - Update two test suites that expected "Responses to Government Propositions" NOT to appear: now correctly expect it (and h3 Prop. ref, MOT_A/MOT_B) - Keep "renders without grouping header when none reference a proposition" assertion unchanged (no grouped → no header) All 1744 tests pass across 42 test files; CodeQL: 0 alerts Co-authored-by: pethers <[email protected]>
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
scripts/data-transformers.ts
Outdated
| /** Captures the full proposition reference text (non-greedy, stops at HTML tags). */ | ||
| const PROP_FULL_REF_REGEX = /med anledning av (prop\.\s*\d{4}\/\d{2}:\d+(?:\s+[^<]+?)?(?=\s*$|<))/i; | ||
|
|
There was a problem hiding this comment.
The PROP_FULL_REF_REGEX constant is defined but never used in the codebase. Consider removing it to reduce code maintenance burden, or add a comment explaining its purpose if it's intended for future use.
| /** Captures the full proposition reference text (non-greedy, stops at HTML tags). */ | |
| const PROP_FULL_REF_REGEX = /med anledning av (prop\.\s*\d{4}\/\d{2}:\d+(?:\s+[^<]+?)?(?=\s*$|<))/i; |
There was a problem hiding this comment.
Removed in 3d8e84c. PROP_FULL_REF_REGEX was defined but never referenced — PROP_REFERENCE_REGEX (strict ID match) and PROP_TITLE_SUFFIX_REGEX (title suffix capture) are the only two regex constants now needed.
|
@copilot apply changes based on the comments in this thread |
The constant was defined at line 2066 but never referenced anywhere in the codebase. PROP_REFERENCE_REGEX (strict ID match) and PROP_TITLE_SUFFIX_REGEX (title suffix capture) are the only regex constants needed. Removing the dead code reduces maintenance surface as flagged by the automated reviewer. All 522 data-transformers tests pass. Co-authored-by: pethers <[email protected]>
Done in 3d8e84c — removed the unused |
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
groupMotionsByPropositionhelper function (exported)responsesToProp/independentMotionslabels (14 langs + ContentLabelSet type)generateMotionsContent()with proposition grouping (h2→h3 prop→h4 motions; independent fall through to existing thematic analysis)PROP_REFERENCE_REGEX,PROP_TITLE_SUFFIX_REGEXconstants;svSpanon prop titlegroupPropositionsByCommitteeexported helpergeneratePropositionsContent()with committee groupingsvSpan(): non-sv →data-translate="true"; sv → no markerotherCommitteefallback heading for empty committeeKey in multi-committee viewPROP_FULL_REF_REGEXconstant (reviewer feedback, comment 2843769403)Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.