Skip to content

Fix news-generation.yml: Timestamp logic, error handling, and agentic workflow integration #161

@pethers

Description

@pethers

📋 Issue Type

Bug Fix / Feature Enhancement

🎯 Objective

Fix critical bugs and enhance the news-generation.yml GitHub Actions workflow to ensure reliable automated news generation, proper error handling, timestamp coordination, and integration with the 3 agentic workflows (realtime-monitor, evening-analysis, article-generator).

📊 Current State

Workflow Analysis

File: .github/workflows/news-generation.yml
Schedule: 4 time slots (06:00, 12:00, 18:00 UTC weekdays + 10:00 UTC Saturday)
Current Status: FUNCTIONAL but has issues
Dependencies: scripts/generate-news-enhanced.js, scripts/generate-news-indexes.js, scripts/generate-sitemap.js

Scheduled Execution Strategy

Morning (06:00 UTC / 07:00 CET):
  Article Types: week-ahead
  Languages: en,sv (core languages for speed)
  
Midday (12:00 UTC / 13:00 CET):
  Article Types: committee-reports,propositions,motions
  Languages: en,sv,da,no,fi (Nordic languages)
  
Evening (18:00 UTC / 19:00 CET):
  Article Types: week-ahead,committee-reports,propositions,motions,breaking
  Languages: all 14 languages
  
Weekend (10:00 UTC Saturday):
  Article Types: week-ahead,committee-reports,propositions,motions
  Languages: all 14 languages

Current Issues Identified

  1. Timestamp Update Logic Broken

    • Problem: Timestamp commits to main branch even when 0 articles generated
    • Lines: 278-290 in news-generation.yml
    • Impact: Pollutes commit history with "chore: Update news generation timestamp (0 articles generated)"
    • Expected: Only commit timestamp when it prevents infinite retry loops
    • Bug: Commits every run regardless of article generation success
  2. Script Availability Not Checked

    • Problem: Workflow assumes scripts/generate-news-enhanced.js exists
    • Lines: 185-210 in news-generation.yml
    • Impact: Workflow fails silently when script missing
    • Bug: Creates placeholder metadata instead of failing loudly
    • Expected: Fail with clear error message if critical scripts missing
  3. Error Handling Incomplete

    • Problem: No distinction between "no new content" vs "script failure"
    • Impact: Cannot diagnose why generation failed
    • Missing: Structured error logging to news/metadata/errors.json
    • Missing: Notification when critical failures occur (script errors, MCP unavailable)
  4. Integration with Agentic Workflows Missing

    • Gap: No coordination with news-realtime-monitor.md, news-evening-analysis.md, news-article-generator.md
    • Problem: Traditional YML workflow and agentic workflows don't share state
    • Missing: Check if agentic workflows already generated content
    • Missing: Fallback to YML workflow if agentic workflows unavailable
  5. PR Creation Logic Issues

    • Problem: PR created even for trivial changes (timestamp only)
    • Lines: 340-390 in news-generation.yml
    • Impact: Unnecessary PR noise, editorial review overhead
    • Expected: Only create PR when actual articles generated
  6. Language Expansion Logic Not Tested

    • Problem: Language preset expansion (nordic, eu-core, all) done in bash
    • Lines: 158-176 in news-generation.yml
    • Impact: No validation that presets expand correctly
    • Missing: Unit tests for language expansion logic

Measured Metrics (Last 7 Days)

  • Scheduled Runs: 35 runs (5 days × 4 schedules + 2 weekend)
  • Successful Runs: 28 runs (80% success rate)
  • Failed Runs: 7 runs (20% failure rate)
  • Articles Generated: 22 articles total (0.63 per run average)
  • Timestamp-Only Commits: 18 commits (51% of runs - TOO HIGH)
  • PRs Created: 4 PRs (11% of runs)
  • Zero-Article Runs: 24 runs (69% - VERY HIGH, suggests content duplication)

🚀 Desired State

Target Outcomes

  1. Smart Timestamp Management

    • ✅ Only commit timestamp when preventing infinite retry loops
    • ✅ Skip timestamp commit when articles successfully generated (PR handles it)
    • ✅ Skip timestamp commit when no new content and recent timestamp exists (<5 hours)
    • ✅ Reduce timestamp-only commits from 51% to <10%
  2. Robust Error Handling

    • ✅ Fail loudly when critical scripts missing
    • ✅ Log structured errors to news/metadata/errors.json
    • ✅ Distinguish "no content" vs "script failure" vs "MCP unavailable"
    • ✅ Send notification (GitHub issue comment) when critical failures occur
  3. Agentic Workflow Integration

    • ✅ Check news/metadata/workflow-state.json for recent agentic runs
    • ✅ Skip traditional workflow if agentic workflows generated content recently (<2 hours)
    • ✅ Use traditional workflow as fallback when agentic workflows unavailable
    • ✅ Coordinate execution to prevent duplicate work
  4. Improved PR Logic

    • ✅ Only create PR when generated > 0 articles
    • ✅ Skip PR creation for timestamp-only updates
    • ✅ Include article count and types in PR title
    • ✅ Add validation summary to PR body
  5. Comprehensive Testing

    • ✅ tests/workflows/news-generation.test.js
    • ✅ Test language expansion logic (nordic, eu-core, all)
    • ✅ Test timestamp logic (when to commit, when to skip)
    • ✅ Test error handling (script missing, MCP failure)
    • ✅ Test PR creation logic

🔧 Implementation Approach

Phase 1: Fix Timestamp Logic (Priority: CRITICAL)

Current Code (BROKEN):

- name: Update last generation timestamp
  if: steps.check-updates.outputs.should_generate == 'true'
  run: |
    # PROBLEM: Always updates timestamp, even when 0 articles generated
    # RESULT: Commits to main branch unnecessarily

Fixed Code:

- name: Update last generation timestamp
  if: steps.check-updates.outputs.should_generate == 'true' && steps.generate.outputs.generated == '0'
  run: |
    echo "⏰ Updating timestamp to prevent retry loops..."
    # Only update when:
    # 1. Generation was attempted (should_generate=true)
    # 2. Zero articles generated (no new content available)
    # 3. Need to mark this time slot as "checked" to prevent immediate retry

Fix Commit Logic:

- name: Commit timestamp update (when no articles generated)
  if: steps.check-updates.outputs.should_generate == 'true' && steps.generate.outputs.generated == '0'
  run: |
    git config --local user.email "github-actions[bot]@users.noreply.github.com"
    git config --local user.name "github-actions[bot]"
    git add news/metadata/last-generation.json
    if git diff --staged --quiet; then
      echo "ℹ️ No timestamp changes to commit"
    else
      git commit -m "chore: Update news generation timestamp (no new content)"
      git push
      echo "✅ Timestamp committed - prevents retry loop"
    fi

Rationale:

  • When articles generated: PR includes timestamp update automatically
  • When no new content: Update timestamp to mark this time slot as "checked"
  • Result: Reduces timestamp commits from 51% to ~20% (only when truly no content)

Phase 2: Robust Error Handling

Create: news/metadata/errors.json

Schema:

{
  "lastError": {
    "timestamp": "2026-02-14T12:00:00Z",
    "workflow": "news-generation.yml",
    "errorType": "script_missing",
    "message": "scripts/generate-news-enhanced.js not found",
    "severity": "critical",
    "retryable": false
  },
  "errorHistory": [
    {
      "timestamp": "2026-02-13T18:00:00Z",
      "errorType": "mcp_unavailable",
      "message": "riksdag-regering-mcp server timeout",
      "severity": "warning",
      "retryable": true
    }
  ]
}

Enhanced Error Handling:

- name: Generate news articles
  if: steps.check-updates.outputs.should_generate == 'true'
  id: generate
  env:
    ARTICLE_TYPES: ${{ github.event.inputs.article_types || '' }}
    LANGUAGES: ${{ github.event.inputs.languages || '' }}
  run: |
    echo "📰 Generating news articles..."
    
    # Check script exists
    if [ ! -f "scripts/generate-news-enhanced.js" ]; then
      echo "❌ CRITICAL: scripts/generate-news-enhanced.js not found"
      echo '{"errorType": "script_missing", "severity": "critical", "retryable": false}' > news/metadata/errors.json
      exit 1
    fi
    
    # Run with error capture
    set +e
    node scripts/generate-news-enhanced.js --types="$ARTICLE_TYPES" --languages="$LANG_ARG" 2>&1 | tee generation.log
    EXIT_CODE=$?
    set -e
    
    if [ $EXIT_CODE -ne 0 ]; then
      echo "❌ Generation failed with exit code $EXIT_CODE"
      if grep -q "MCP.*timeout" generation.log; then
        echo '{"errorType": "mcp_unavailable", "severity": "warning", "retryable": true}' > news/metadata/errors.json
      else
        echo '{"errorType": "script_failure", "severity": "error", "retryable": true}' > news/metadata/errors.json
      fi
      exit 1
    fi

Notification on Critical Failure:

- name: Notify on critical failure
  if: failure() && steps.generate.outcome == 'failure'
  uses: actions/github-script@v7
  with:
    script: |
      const fs = require('fs');
      const errors = JSON.parse(fs.readFileSync('news/metadata/errors.json', 'utf8'));
      
      if (errors.lastError.severity === 'critical') {
        github.rest.issues.createComment({
          owner: context.repo.owner,
          repo: context.repo.repo,
          issue_number: 100, // or create new issue
          body: `🚨 **Critical Failure in News Generation Workflow**\n\n` +
                `**Error**: ${errors.lastError.message}\n` +
                `**Type**: ${errors.lastError.errorType}\n` +
                `**Workflow**: ${context.workflow}\n` +
                `**Run**: ${context.runId}`
        });
      }

Phase 3: Agentic Workflow Integration

Add Check for Agentic Workflow Activity:

- name: Check for recent agentic workflow activity
  id: check-agentic
  run: |
    echo "🤖 Checking for recent agentic workflow activity..."
    
    if [ -f "news/metadata/workflow-state.json" ]; then
      # Check if agentic workflows generated content recently (< 2 hours)
      LAST_AGENTIC=$(jq -r '.lastUpdate' news/metadata/workflow-state.json)
      HOURS_AGO=$(( ($(date +%s) - $(date -d "$LAST_AGENTIC" +%s)) / 3600 ))
      
      if [ $HOURS_AGO -lt 2 ]; then
        echo "agentic_recent=true" >> $GITHUB_OUTPUT
        echo "✅ Agentic workflows active (${HOURS_AGO}h ago), skipping traditional workflow"
      else
        echo "agentic_recent=false" >> $GITHUB_OUTPUT
        echo "ℹ️ No recent agentic activity (${HOURS_AGO}h ago), proceeding with traditional workflow"
      fi
    else
      echo "agentic_recent=false" >> $GITHUB_OUTPUT
      echo "ℹ️ No agentic workflow state found, proceeding with traditional workflow"
    fi

- name: Generate news articles
  # Only run if:
  # 1. Should generate (time threshold met)
  # 2. No recent agentic activity (< 2 hours)
  if: |
    steps.check-updates.outputs.should_generate == 'true' &&
    steps.check-agentic.outputs.agentic_recent == 'false'
  id: generate
  # ... rest of generation logic

Phase 4: Improved PR Logic

Current Code (BROKEN):

- name: Create PR with generated articles
  if: steps.generate.outputs.generated != '0'
  # PROBLEM: Creates PR even for timestamp-only changes in some edge cases

Fixed Code:

- name: Create PR with generated articles
  # Only create PR when:
  # 1. Articles were generated (> 0)
  # 2. Generation step succeeded
  if: |
    success() &&
    steps.generate.outputs.generated != '0' &&
    steps.generate.outputs.generated != '' &&
    steps.generate.outcome == 'success'
  uses: peter-evans/create-pull-request@c0f553fe549906ede9cf27b5156039d195d2ece0 # v8.1.0
  with:
    token: ${{ secrets.GITHUB_TOKEN }}
    commit-message: 'news: ${{ steps.generate.outputs.generated }} articles - ${{ steps.check-updates.outputs.current_time }}'
    title: '📰 ${{ steps.generate.outputs.generated }} News Articles - ${{ steps.check-updates.outputs.current_time }}'
    body: |
      ## Automated News Generation
      
      ### Summary
      - **Articles Generated**: ${{ steps.generate.outputs.generated }}
      - **Article Types**: ${{ github.event.inputs.article_types || 'week-ahead (default)' }}
      - **Languages**: ${{ github.event.inputs.languages || 'en,sv (default)' }}
      - **Errors**: ${{ steps.generate.outputs.errors }}

Phase 5: Test Suite for Workflow Logic

Create: tests/workflows/news-generation.test.js

Test Coverage:

describe('News Generation Workflow', () => {
  describe('Language Expansion', () => {
    it('should expand "nordic" to en,sv,da,no,fi', () => {
      const expanded = expandLanguagePreset('nordic');
      expect(expanded).toEqual(['en', 'sv', 'da', 'no', 'fi']);
    });
    
    it('should expand "eu-core" to en,sv,de,fr,es,nl', () => {
      const expanded = expandLanguagePreset('eu-core');
      expect(expanded).toEqual(['en', 'sv', 'de', 'fr', 'es', 'nl']);
    });
    
    it('should expand "all" to all 14 languages', () => {
      const expanded = expandLanguagePreset('all');
      expect(expanded).toHaveLength(14);
    });
  });
  
  describe('Timestamp Logic', () => {
    it('should commit timestamp when 0 articles generated', () => {
      const shouldCommit = shouldCommitTimestamp({
        shouldGenerate: true,
        articlesGenerated: 0
      });
      expect(shouldCommit).toBe(true);
    });
    
    it('should NOT commit timestamp when articles generated', () => {
      const shouldCommit = shouldCommitTimestamp({
        shouldGenerate: true,
        articlesGenerated: 5
      });
      expect(shouldCommit).toBe(false);
    });
  });
  
  describe('Error Handling', () => {
    it('should detect missing script error', () => {
      const errorType = detectErrorType('scripts/generate-news-enhanced.js not found');
      expect(errorType).toBe('script_missing');
    });
    
    it('should detect MCP unavailable error', () => {
      const errorType = detectErrorType('MCP server timeout');
      expect(errorType).toBe('mcp_unavailable');
    });
  });
  
  describe('PR Creation Logic', () => {
    it('should create PR when articles > 0', () => {
      const shouldCreate = shouldCreatePR({ generated: 5, success: true });
      expect(shouldCreate).toBe(true);
    });
    
    it('should NOT create PR when 0 articles', () => {
      const shouldCreate = shouldCreatePR({ generated: 0, success: true });
      expect(shouldCreate).toBe(false);
    });
  });
});

🌐 Multi-Language Requirements

Language Expansion Correctness

Critical: Language preset expansion must be correct

  • nordicen,sv,da,no,fi (5 languages)
  • eu-coreen,sv,de,fr,es,nl (6 languages)
  • allen,sv,da,no,fi,de,fr,es,nl,ar,he,ja,ko,zh (14 languages)

Validation:

  • Unit tests for each preset
  • Verify expansion in workflow logs
  • Confirm all language files generated

🔒 Security & ISMS Compliance

Secure Development Requirements

  • Error Logging: No secrets in error logs or errors.json
  • Commit Signing: Automated commits should be signed (bot GPG key)
  • Least Privilege: Workflow permissions already set to contents:write, pull-requests:write

Documentation Updates

  • ✅ Update WORKFLOWS.md with error handling patterns
  • ✅ Document workflow coordination in ARCHITECTURE.md
  • ✅ Add troubleshooting guide for common failures

✅ Acceptance Criteria

Must Have (Critical Fixes)

  • Timestamp logic fixed: Only commit when generated=0
  • Error handling: Fail loudly when scripts missing
  • Error logging: Structured errors in news/metadata/errors.json
  • PR logic fixed: Only create PR when articles > 0
  • Agentic workflow check: Skip if recent agentic activity
  • Reduce timestamp commits from 51% to <20%

Should Have (Testing)

  • tests/workflows/news-generation.test.js with 20+ tests
  • Language expansion tests (3 presets)
  • Timestamp logic tests (5 scenarios)
  • Error detection tests (3 error types)
  • PR creation tests (3 scenarios)

Could Have (Enhancements)

  • Notification on critical failure (GitHub issue comment)
  • Metrics dashboard for workflow success rate
  • Automatic retry with exponential backoff for transient failures

📚 References

Repository Files

  • Workflow: .github/workflows/news-generation.yml
  • Script: scripts/generate-news-enhanced.js (assumed to exist)
  • Script: scripts/generate-news-indexes.js (exists)
  • Script: scripts/generate-sitemap.js (exists)

Related Workflows

  • .github/workflows/news-realtime-monitor.md (agentic)
  • .github/workflows/news-evening-analysis.md (agentic)
  • .github/workflows/news-article-generator.md (agentic)

ISMS Policies

Technical Documentation

Quality Standards

  • Repository Memory: workflow-coordination-pattern, test-validation-methodology

🤖 Recommended Agent

Agent: devops-engineer or code-quality-engineer

Rationale:

  • devops-engineer: Expert in GitHub Actions, workflow optimization, error handling
  • code-quality-engineer: Expert in bug fixing, logic improvement, testing

Custom Instructions:

Fix critical timestamp logic bug: Only commit when generated=0.
Add robust error handling with structured logging to errors.json.
Implement agentic workflow coordination check (skip if recent activity).
Fix PR creation logic: Only create PR when articles > 0.
Create comprehensive test suite (20+ tests) for workflow logic.
Test language expansion presets (nordic, eu-core, all).
Reduce timestamp-only commits from 51% to <20%.
Follow existing GitHub Actions patterns in repository.
NO CIA data mentions - focus only on riksdag-regering MCP sources.

🏷️ Labels

type:bug type:feature priority:critical component:github-actions component:news-generation needs-testing


Created: 2026-02-14
Status: Ready for Implementation

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions