Add article quality validation gate to news generation pipeline#467
Add article quality validation gate to news generation pipeline#467
Conversation
Co-authored-by: pethers <[email protected]>
) Co-authored-by: pethers <[email protected]>
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
There was a problem hiding this comment.
Pull request overview
Adds an enforceable article-quality gate to the news generation pipeline and strengthens the agentic workflow prompts so low-signal “list-only” articles get detected before being committed.
Changes:
- Introduces
validateArticleQuality()+--quality-threshold(default 40) inscripts/generate-news-enhanced.ts, collects per-article quality metrics, and uses exit code 2 when all generated articles are below threshold. - Extends shared article typings with
ArticleQualityScoreand addsqualityScorestoGenerationStats. - Adds a dedicated Vitest suite covering scoring, bounds, exemptions, and threshold behavior; updates the three agentic workflow prompt files with a “MANDATORY Quality Validation” checklist and bash checks.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
scripts/generate-news-enhanced.ts |
Adds programmatic quality scoring, reporting, run summary, and soft-fail exit behavior. |
scripts/types/article.ts |
Adds ArticleQualityScore type and threads it through GenerationStats. |
tests/article-quality-validation.test.ts |
Adds unit tests for the new quality scoring logic and exports. |
.github/workflows/news-motions.md |
Adds mandatory quality validation checklist + bash checks for motion articles. |
.github/workflows/news-propositions.md |
Adds mandatory quality validation checklist + bash checks for proposition articles. |
.github/workflows/news-committee-reports.md |
Adds mandatory quality validation checklist + bash checks for committee report articles. |
scripts/generate-news-enhanced.ts
Outdated
| const QUALITY_THRESHOLD: number = qualityThresholdArg | ||
| ? parseInt(qualityThresholdArg.split('=')[1] ?? '40', 10) | ||
| : 40; | ||
|
|
There was a problem hiding this comment.
--quality-threshold parsing can produce NaN (e.g., --quality-threshold=foo or an empty value). When QUALITY_THRESHOLD is NaN, score >= QUALITY_THRESHOLD will always be false, causing every article to fail and potentially forcing exit code 2 unexpectedly. Consider validating the parsed value (e.g., Number.isFinite) and falling back to the default (and/or clamping to 0–100) with a warning when invalid.
| const QUALITY_THRESHOLD: number = qualityThresholdArg | |
| ? parseInt(qualityThresholdArg.split('=')[1] ?? '40', 10) | |
| : 40; | |
| const DEFAULT_QUALITY_THRESHOLD = 40; | |
| let parsedQualityThreshold: number = DEFAULT_QUALITY_THRESHOLD; | |
| if (qualityThresholdArg) { | |
| const rawValue = qualityThresholdArg.split('=')[1]; | |
| const numericValue = rawValue === undefined || rawValue === '' ? NaN : Number(rawValue); | |
| if (Number.isFinite(numericValue)) { | |
| // Clamp to a sensible range 0–100 | |
| parsedQualityThreshold = Math.min(100, Math.max(0, numericValue)); | |
| } else { | |
| console.warn( | |
| `Invalid --quality-threshold value "${rawValue}", falling back to default ${DEFAULT_QUALITY_THRESHOLD}.`, | |
| ); | |
| } | |
| } | |
| const QUALITY_THRESHOLD: number = parsedQualityThreshold; |
There was a problem hiding this comment.
Fixed in 66ea1af. --quality-threshold now uses Number.isFinite validation, clamps the parsed value to 0–100, and falls back to the default (40) with a console warning when the input is invalid (e.g. --quality-threshold=foo or empty).
scripts/types/article.ts
Outdated
| lang: string; | ||
| /** Article type (e.g. "motions") */ | ||
| articleType: string; | ||
| /** Approximate word count (includes HTML tag tokens) */ |
There was a problem hiding this comment.
The wordCount field documentation says it "includes HTML tag tokens", but validateArticleQuality() strips HTML tags before counting words. Update this comment to match the actual behavior (word count based on text content after tag stripping).
| /** Approximate word count (includes HTML tag tokens) */ | |
| /** Approximate word count based on text content after stripping HTML tags */ |
There was a problem hiding this comment.
Fixed in 66ea1af. Updated the JSDoc to "Approximate word count based on text content after stripping HTML tags" to match the actual tag-stripping behaviour in validateArticleQuality.
|
@copilot apply changes based on the comments in this thread |
Co-authored-by: pethers <[email protected]>
Signed-off-by: James Pether Sörling <[email protected]>
|
@copilot Analyse current state, did a bad merge . analyse and fix any issues |
|
@pethers The model You can try again without specifying a model (just If you want to contact GitHub about this error, please mention the following identifier so they can better serve you: |
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
Signed-off-by: James Pether Sörling <[email protected]>
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
Co-authored-by: pethers <[email protected]>
…s files Co-authored-by: pethers <[email protected]>
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
| // wordCount ~750 → wordScore=37; h2Count=0 → sectionScore=0; no spans → translationScore=20 → 57 (too high) | ||
| // Try: wordCount=0, h2Count=1 (sectionScore=10), untranslatedSpans=0 → 0+10+20 = 30 (below 40) | ||
| const html = buildHtml({ wordCount: 0, h2Count: 1, untranslatedSpans: 0 }); | ||
| const result = qualityModule.validateArticleQuality(html, 'en', 'motions', 'test.html'); | ||
| expect(result.passed).toBe(result.score >= 40); | ||
| }); | ||
|
|
||
| it('should pass when score equals threshold (40)', () => { | ||
| if (!qualityModule) return; | ||
| // wordCount=1000 → wordScore=50; h2Count=0; untranslatedSpans=15 → translationScore=20-30=-10→0 | ||
| // = 50 → passes | ||
| // Simpler: h2Count=3 → sectionScore=30; no words, no spans → 0+30+20=50 | ||
| const html = buildHtml({ wordCount: 0, h2Count: 3, untranslatedSpans: 0 }); | ||
| const result = qualityModule.validateArticleQuality(html, 'en', 'motions', 'test.html'); | ||
| expect(result.score).toBeGreaterThanOrEqual(40); |
There was a problem hiding this comment.
This threshold test is effectively tautological (expect(result.passed).toBe(result.score >= 40)) and does not verify the intended boundary behavior (e.g., score=39 should fail). Update the test to construct deterministic HTML that produces a known score (like 39) and assert passed === false, and similarly add an exact 40-point case if you want to test equality semantics.
| // wordCount ~750 → wordScore=37; h2Count=0 → sectionScore=0; no spans → translationScore=20 → 57 (too high) | |
| // Try: wordCount=0, h2Count=1 (sectionScore=10), untranslatedSpans=0 → 0+10+20 = 30 (below 40) | |
| const html = buildHtml({ wordCount: 0, h2Count: 1, untranslatedSpans: 0 }); | |
| const result = qualityModule.validateArticleQuality(html, 'en', 'motions', 'test.html'); | |
| expect(result.passed).toBe(result.score >= 40); | |
| }); | |
| it('should pass when score equals threshold (40)', () => { | |
| if (!qualityModule) return; | |
| // wordCount=1000 → wordScore=50; h2Count=0; untranslatedSpans=15 → translationScore=20-30=-10→0 | |
| // = 50 → passes | |
| // Simpler: h2Count=3 → sectionScore=30; no words, no spans → 0+30+20=50 | |
| const html = buildHtml({ wordCount: 0, h2Count: 3, untranslatedSpans: 0 }); | |
| const result = qualityModule.validateArticleQuality(html, 'en', 'motions', 'test.html'); | |
| expect(result.score).toBeGreaterThanOrEqual(40); | |
| // Scoring model (from other tests and comments): | |
| // - wordScore: up to 50 points, ~0.05 points per word (1000 words → 50) | |
| // - sectionScore: 10 points per <h2>, up to 30 | |
| // - translationScore: 20 points when fully translated (0 untranslated spans) | |
| // Choose: wordCount=380 → wordScore=19; h2Count=0 → sectionScore=0; untranslatedSpans=0 → translationScore=20 | |
| // Total: 19 + 0 + 20 = 39 → should NOT pass the 40-point threshold | |
| const html = buildHtml({ wordCount: 380, h2Count: 0, untranslatedSpans: 0 }); | |
| const result = qualityModule.validateArticleQuality(html, 'en', 'motions', 'test.html'); | |
| expect(result.score).toBe(39); | |
| expect(result.passed).toBe(false); | |
| }); | |
| it('should pass when score equals threshold (40)', () => { | |
| if (!qualityModule) return; | |
| // Construct an article that scores exactly 40: | |
| // Use the same scoring assumptions as above: | |
| // - wordCount=400 → wordScore=20 | |
| // - h2Count=0 → sectionScore=0 | |
| // - untranslatedSpans=0 → translationScore=20 | |
| // Total: 20 + 0 + 20 = 40 → should pass the 40-point threshold | |
| const html = buildHtml({ wordCount: 400, h2Count: 0, untranslatedSpans: 0 }); | |
| const result = qualityModule.validateArticleQuality(html, 'en', 'motions', 'test.html'); | |
| expect(result.score).toBe(40); |
.github/workflows/news-motions.md
Outdated
| # Check word count of English article (must be >= 500; note: includes HTML tags) | ||
| wc -w "news/$(date +%Y-%m-%d)-opposition-motions-en.html" 2>/dev/null || true | ||
|
|
There was a problem hiding this comment.
The word-count validation uses wc -w on the raw HTML, which counts markup and can overstate content length (and will also print 0 if the file is missing due to 2>/dev/null || true). To avoid false passes, count words after stripping tags (similar to validateArticleQuality) and/or explicitly fail when the expected file does not exist.
| # Check word count of English article (must be >= 500; note: includes HTML tags) | |
| wc -w "news/$(date +%Y-%m-%d)-opposition-motions-en.html" 2>/dev/null || true | |
| # Check word count of English article (must be >= 500; strips HTML tags and fails if file is missing) | |
| FILE="news/$(date +%Y-%m-%d)-opposition-motions-en.html" | |
| if [ ! -f "$FILE" ]; then | |
| echo "Expected article file not found: $FILE" >&2 | |
| exit 1 | |
| fi | |
| python - "$FILE" <<'PY' | |
| import re | |
| import sys | |
| path = sys.argv[1] | |
| with open(path, encoding="utf-8") as f: | |
| html = f.read() | |
| # Strip HTML tags | |
| text = re.sub(r"<[^>]+>", " ", html) | |
| # Split on whitespace to count words | |
| words = [w for w in re.split(r"\s+", text) if w] | |
| count = len(words) | |
| print(count) | |
| # Enforce minimum word count | |
| if count < 500: | |
| raise SystemExit(1) | |
| PY |
| ### Bash Validation Commands: | ||
| ```bash | ||
| # Check for unknown authors (should return 0) | ||
| grep -l "Filed by: Unknown" news/*-opposition-motions-*.html 2>/dev/null | wc -l || true |
There was a problem hiding this comment.
This section uses the opposition-motions slug/filename pattern in its validation commands, but later in the workflow the "Article Naming Convention" still states YYYY-MM-DD-motions-{lang}.html. Update that naming convention section to the actual generator output (YYYY-MM-DD-opposition-motions-{lang}.html) so readers don’t validate the wrong filenames.
| # Check word count of English article (must be >= 500; note: includes HTML tags) | ||
| wc -w "news/$(date +%Y-%m-%d)-government-propositions-en.html" 2>/dev/null || true |
There was a problem hiding this comment.
The word-count validation uses wc -w on the raw HTML, which counts markup and can overstate content length (and will also print 0 if the file is missing due to 2>/dev/null || true). To avoid false passes, count words after stripping tags (similar to validateArticleQuality) and/or explicitly fail when the expected file does not exist.
| # Check word count of English article (must be >= 500; note: includes HTML tags) | |
| wc -w "news/$(date +%Y-%m-%d)-government-propositions-en.html" 2>/dev/null || true | |
| # Check word count of English article text content (must be >= 500; HTML tags stripped) | |
| FILE="news/$(date +%Y-%m-%d)-government-propositions-en.html" | |
| if [ ! -f "$FILE" ]; then | |
| echo "ERROR: Expected article file not found: $FILE" >&2 | |
| exit 1 | |
| fi | |
| sed 's/<[^>]*>/ /g' "$FILE" | tr -s '[:space:]' '\n' | grep -c '[[:alnum:]]' || exit 1 |
| # Check for unknown authors (should return 0) | ||
| grep -l "Filed by: Unknown" news/*-government-propositions-*.html 2>/dev/null | wc -l || true | ||
|
|
||
| # Check for untranslated spans in English article (should return 0) | ||
| grep -c 'data-translate="true"' "news/$(date +%Y-%m-%d)-government-propositions-en.html" 2>/dev/null || true |
There was a problem hiding this comment.
This section uses the government-propositions slug/filename pattern in its validation commands, but later in the workflow the "Article Naming Convention" still states YYYY-MM-DD-propositions-{lang}.html. Update that naming convention section to the actual generator output (YYYY-MM-DD-government-propositions-{lang}.html) so readers don’t validate the wrong filenames.
| # Check for unknown authors (should return 0) | ||
| grep -rl "Filed by: Unknown" news/ | grep "committee-reports" | wc -l || true | ||
|
|
||
| # Check for untranslated spans in English article (should return 0) | ||
| grep -c 'data-translate="true"' "news/$(date +%Y-%m-%d)-committee-reports-en.html" 2>/dev/null || true | ||
|
|
||
| # Check word count of English article (must be >= 500; note: includes HTML tags) | ||
| wc -w "news/$(date +%Y-%m-%d)-committee-reports-en.html" 2>/dev/null || true | ||
|
|
||
| # Check for duplicate "Why It Matters" content (should return empty) | ||
| grep -o 'Why It Matters[^<]*' "news/$(date +%Y-%m-%d)-committee-reports-en.html" 2>/dev/null | sort | uniq -d || true |
There was a problem hiding this comment.
The word-count validation uses wc -w on the raw HTML, which counts markup and can overstate content length (and will also print 0 if the file is missing due to 2>/dev/null || true). To avoid false passes, count words after stripping tags (similar to validateArticleQuality) and/or explicitly fail when the expected file does not exist.
| # Check for unknown authors (should return 0) | |
| grep -rl "Filed by: Unknown" news/ | grep "committee-reports" | wc -l || true | |
| # Check for untranslated spans in English article (should return 0) | |
| grep -c 'data-translate="true"' "news/$(date +%Y-%m-%d)-committee-reports-en.html" 2>/dev/null || true | |
| # Check word count of English article (must be >= 500; note: includes HTML tags) | |
| wc -w "news/$(date +%Y-%m-%d)-committee-reports-en.html" 2>/dev/null || true | |
| # Check for duplicate "Why It Matters" content (should return empty) | |
| grep -o 'Why It Matters[^<]*' "news/$(date +%Y-%m-%d)-committee-reports-en.html" 2>/dev/null | sort | uniq -d || true | |
| FILE="news/$(date +%Y-%m-%d)-committee-reports-en.html" | |
| # Ensure the expected English article exists | |
| if [ ! -f "$FILE" ]; then | |
| echo "ERROR: Expected article file not found: $FILE" >&2 | |
| exit 1 | |
| fi | |
| # Check for unknown authors (should return 0) | |
| grep -rl "Filed by: Unknown" news/ | grep "committee-reports" | wc -l | |
| # Check for untranslated spans in English article (should return 0) | |
| grep -c 'data-translate="true"' "$FILE" | |
| # Check word count of English article content (must be >= 500; HTML tags stripped) | |
| WORD_COUNT="$(sed 's/<[^>]*>//g' "$FILE" | wc -w)" | |
| echo "Content word count (HTML tags stripped): $WORD_COUNT" | |
| if [ "$WORD_COUNT" -lt 500 ]; then | |
| echo "ERROR: Article content too short (must be at least 500 words)." >&2 | |
| exit 1 | |
| fi | |
| # Check for duplicate "Why It Matters" content (should return empty) | |
| grep -o 'Why It Matters[^<]*' "$FILE" | sort | uniq -d |
| console.log(` Untranslated spans: ${untranslatedSpans} (score: ${translationScore}/20)`); | ||
| console.log(` Unknown authors: ${unknownAuthors} ⚠️`); | ||
| console.log(` Quality Score: ${score}/100 — ${passed ? 'PASSED' : 'BELOW THRESHOLD'} ${scoreLabel}`); |
There was a problem hiding this comment.
The quality report always prints a warning symbol for unknown authors (${unknownAuthors} ⚠️) even when the count is 0, which makes the report misleading. Consider rendering ✅/unknownAuthors > 0).
scripts/generate-news-enhanced.ts
Outdated
| console.log(`\n📊 Article Quality Report: ${articleType}-${lang}`); | ||
| console.log(` Word count: ${wordCount} (score: ${wordScore}/50)`); |
There was a problem hiding this comment.
The report header uses ${articleType}-${lang} but the generators pass articleType values like motions/propositions while the actual slugs/filenames are opposition-motions and government-propositions. This makes the printed report id differ from the generated slug patterns described in this PR/workflows; consider using the inferred slug type (from slug) or the filename base for the report id.
|
@copilot apply changes based on the comments in this thread |
…ndicator, word count bash commands, naming conventions, and deterministic threshold tests Co-authored-by: pethers <[email protected]>
Co-authored-by: pethers <[email protected]>
Addressed in 4d13b78 and 58e18d6:
|
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
…cleanly origin/main had advanced ~20 commits (PRs #467, #469, #473, #474, #475) since last conflict resolution. Take FETCH_HEAD as clean base and re-apply every PR-specific change on top: data-transformers.ts: - svSpan(): fix data-translate direction (lang!='sv' → marker; lang='sv' → none) - Add PROP_REFERENCE_REGEX (strict \d{4}\/\d{2}:\d+), PROP_FULL_REF_REGEX (non-greedy HTML-safe), PROP_TITLE_SUFFIX_REGEX (title suffix capture) - extractPropRef: use PROP_REFERENCE_REGEX (was \S+) - Export groupMotionsByProposition (was private) - Add exported groupPropositionsByCommittee - generatePropositionsContent: committee-grouped rendering via groupPropositionsByCommittee (h3 committee + h4 props when multi-committee; h3 props + referredLine when single-committee; domainCount from map.size) - generateMotionsContent: parameterize renderMotion(motion, headingTag='h3'); grouped motions → h4; independent → h3; replace 14-branch ternary chains with L(lang,'responsesToProp') / L(lang,'independentMotions'); use PROP_TITLE_SUFFIX_REGEX for propTitle extraction; svSpan on prop title - Add responsesToProp + independentMotions to all 14 language objects scripts/types/content.ts: - Add responsesToProp: string; independentMotions: string to ContentLabelSet tests/data-transformers.test.ts: - Import groupMotionsByProposition and groupPropositionsByCommittee - Fix 8 svSpan assertions: non-sv → toContain; sv article → not.toContain - Add h3/h4 hierarchy assertions to grouped motions rendering tests - Add 5 unit tests for groupMotionsByProposition exported helper - Add 5 unit tests for groupPropositionsByCommittee exported helper - Add 28 label-presence tests (2 keys × 14 langs) - Add 4 rendering tests (single/multi-committee h3/h4, referredLine) sitemap.xml: take origin/main version All 1672 tests pass across 39 test files Co-authored-by: pethers <[email protected]>
List-style articles with "Filed by: Unknown", untranslated
data-translatespans, and no analytical depth were being committed silently. Two complementary enforcement layers are added.Agentic workflow prompts (
news-motions.md,news-propositions.md,news-committee-reports.md)Each file gets a
## MANDATORY Quality Validationsection after Step 4 with:opposition-motions,government-propositions,committee-reports)validateArticleQualitybehaviour) and explicitly fail withexit 1when the expected article file does not existYYYY-MM-DD-opposition-motions-{lang}.htmlandYYYY-MM-DD-government-propositions-{lang}.htmlPipeline quality gate (
scripts/generate-news-enhanced.ts)validateArticleQuality(html, lang, articleType, filename)scores every article 0–100 before write:<h2>sectionsdata-translatespans2026-02-23-opposition-motions-en) instead ofarticleType-lang, matching the actual generated slug✅when count is 0 and⚠️only when count is non-zero--quality-threshold=N(default 40) — validated withNumber.isFinite, clamped 0–100, warns on invalid inputstats.qualityScores: ArticleQualityScore[]accumulates per-article metrics; summary logged post-runTypes (
scripts/types/article.ts)Added
ArticleQualityScoreinterface;GenerationStatsextended withqualityScores;wordCountJSDoc updated to reflect tag-stripping behaviour.Tests (
tests/article-quality-validation.test.ts)19 unit tests covering all scoring dimensions, edge cases (empty HTML, 1000+ words, capped deductions), Swedish exemption, and score bounds. Threshold boundary tests use deterministic HTML that produces known scores — score=39 asserts
passed === false, score=40 assertspassed === true— replacing the previously tautological assertion. Mock MCP client includessearchDocuments,fetchWrittenQuestions, andfetchInterpellationsfor consistency with the main test suite.Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.