Skip to content

CIA Data Consumption Pipeline (JSON Export Integration) #18

@pethers

Description

@pethers

📋 Issue Type

Feature - Data Infrastructure & Integration

🎯 Objective

IMPORTANT: CIA project is already the comprehensive digital twin of Swedish Parliament data. This issue focuses on establishing a consumption pipeline to fetch, cache, and integrate CIA's JSON exports for riksdagsmonitor's static site visualization and analysis.

Implement a robust data consumption pipeline that retrieves CIA platform's JSON exports (349 MPs, 8 parties, 50+ years of data) with automated caching, validation, and updates for offline rendering on riksdagsmonitor.

📊 Current State

  • ✅ External links to CIA platform (www.hack23.com/cia)
  • ✅ CIA platform maintains complete digital twin
  • ❌ No local caching of CIA JSON exports
  • ❌ No automated data fetching workflow
  • ❌ Direct dependency on CIA platform availability
  • ❌ No offline data access for static site generation

Measured Metrics:

  • Cached CIA exports: 0 MB
  • Fetch frequency: None
  • Data completeness: 0% (external links only)
  • Offline capability: None

🚀 Desired State

  • ✅ Automated CIA JSON export consumption pipeline
  • ✅ Nightly data fetch from CIA platform (02:00 CET)
  • ✅ Local caching of 19 CIA visualization products
  • ✅ Validation against CIA-provided schemas
  • ✅ Offline static site generation capability
  • ✅ Version tracking of CIA data updates
  • ✅ Graceful fallback if CIA unavailable

📊 CIA Data Integration Context

CIA Platform Role:
🏭 CIA IS THE DIGITAL TWIN - Complete data aggregation, processing, and OSINT analysis
📊 CIA Provides: JSON exports of 19 visualization products
🌐 Riksdagsmonitor Consumes: CIA's pre-processed JSON for static site rendering

CIA Product Exports (19 products):

  • Overview Dashboard JSON
  • Party Performance JSON
  • Government Cabinet Scorecard JSON
  • Election Cycle Analysis JSON
  • Top 10 Rankings (10 products) JSON
  • Committee Network Analysis JSON
  • Politician Career Analysis JSON
  • Party Longitudinal Analysis JSON

Data Source:

  • CIA Platform API/Exports: https://www.hack23.com/cia/api/export/
  • CIA JSON Export Specs: json-export-specs/schemas/
  • CIA Sample Data: service.data.impl/sample-data/

CIA Provides:

✅ Complete digital twin (1971-2024)
✅ Real-time data aggregation from Riksdag APIs
✅ OSINT analysis and risk scoring
✅ Data quality assurance
✅ JSON schema definitions
✅ Validated export files
✅ Historical time series

Riksdagsmonitor Consumes:

📥 JSON exports from CIA
📥 Pre-processed visualizations
📥 OSINT-analyzed data
📥 Schema definitions for validation

Implementation Notes:

🔧 Implementation Approach

Phase 1: CIA Export Client

Fetch Strategy:

// scripts/fetch-cia-exports.js
class CIAExportClient {
  // Fetch all 19 visualization products
  async fetchAllExports() {
    const products = [
      'overview-dashboard',
      'party-performance',
      'cabinet-scorecard',
      'election-analysis',
      'top10-influential-mps',
      // ... all 19 products
    ];
    
    for (const product of products) {
      await this.fetchExport(product);
    }
  }
  
  // Fetch single export with validation
  async fetchExport(productName) {
    const url = `https://www.hack23.com/cia/api/export/${productName}.json`;
    const data = await fetch(url);
    await this.validateAgainstSchema(data, productName);
    await this.cacheExport(productName, data);
  }
}

Storage Structure:

data/
  cia-exports/
    current/                    # Latest CIA exports
      overview-dashboard.json
      party-performance.json
      cabinet-scorecard.json
      election-analysis.json
      top10-*.json (10 files)
      committee-network.json
      politician-career.json
      party-longitudinal.json
    archive/                    # Historical versions
      2024-02-04/
      2024-02-03/
    metadata/
      last-fetch.json          # Fetch timestamps
      export-versions.json     # Version tracking
      validation-status.json   # Schema validation results

Phase 2: Automated Fetch Workflow

Nightly CIA Export Fetch (02:00 CET):

# .github/workflows/fetch-cia-exports.yml
name: Fetch CIA Exports

on:
  schedule:
    - cron: '0 2 * * *'  # 02:00 CET daily
  workflow_dispatch:

jobs:
  fetch-cia-data:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repository
        uses: actions/checkout@v4
      
      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'
      
      - name: Fetch CIA JSON exports
        run: node scripts/fetch-cia-exports.js
      
      - name: Validate against CIA schemas
        run: node scripts/validate-cia-data.js
      
      - name: Check for updates
        id: check
        run: |
          if git diff --quiet data/cia-exports/current/; then
            echo "changed=false" >> $GITHUB_OUTPUT
          else
            echo "changed=true" >> $GITHUB_OUTPUT
          fi
      
      - name: Commit updated exports
        if: steps.check.outputs.changed == 'true'
        run: |
          git config user.name "CIA Export Bot"
          git config user.email "[email protected]"
          git add data/cia-exports/
          git commit -m "Update CIA exports $(date +'%Y-%m-%d %H:%M')"
          git push
      
      - name: Trigger site rebuild
        if: steps.check.outputs.changed == 'true'
        run: gh workflow run deploy.yml
        env:
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

Phase 3: Validation & Caching

Schema Validation:

// scripts/validate-cia-data.js
import Ajv from 'ajv';

class CIADataValidator {
  async validateExport(productName, data) {
    // Fetch schema from CIA
    const schema = await this.fetchSchema(productName);
    
    // Validate data
    const ajv = new Ajv();
    const valid = ajv.validate(schema, data);
    
    if (!valid) {
      throw new Error(`Validation failed: ${ajv.errorsText()}`);
    }
    
    return true;
  }
  
  async fetchSchema(productName) {
    const url = `https://github.com/Hack23/cia/raw/master/json-export-specs/schemas/${productName}.schema.json`;
    return await fetch(url).then(r => r.json());
  }
}

Caching Layer:

// scripts/cache-manager.js
class CIAExportCache {
  // Archive previous version
  async archiveCurrent() {
    const timestamp = new Date().toISOString().split('T')[0];
    await fs.rename(
      'data/cia-exports/current/',
      `data/cia-exports/archive/${timestamp}/`
    );
  }
  
  // Update current with new data
  async updateCurrent(exports) {
    await this.archiveCurrent();
    await fs.writeJSON('data/cia-exports/current/', exports);
  }
}

Phase 4: Static Site Integration

Data Loading for Visualizations:

// assets/js/cia-data-loader.js
class CIADataLoader {
  // Load cached CIA export
  async loadExport(productName) {
    const url = `/data/cia-exports/current/${productName}.json`;
    return await fetch(url).then(r => r.json());
  }
  
  // Check if data is fresh (< 24h old)
  async isDataFresh() {
    const metadata = await this.loadMetadata();
    const lastFetch = new Date(metadata.lastFetch);
    const now = new Date();
    return (now - lastFetch) < 24 * 60 * 60 * 1000;
  }
  
  // Fallback to live CIA API if cache stale
  async loadWithFallback(productName) {
    try {
      if (await this.isDataFresh()) {
        return await this.loadExport(productName);
      } else {
        return await this.loadFromCIA(productName);
      }
    } catch (e) {
      return await this.loadFromCIA(productName);
    }
  }
}

Files to Create/Modify

.github/workflows/
  fetch-cia-exports.yml        # Nightly fetch workflow
data/
  cia-exports/
    current/                   # Latest CIA exports (19 files)
    archive/                   # Historical versions
    metadata/                  # Fetch metadata
scripts/
  fetch-cia-exports.js         # CIA export client
  validate-cia-data.js         # Schema validation
  cache-manager.js             # Cache management
assets/
  js/
    cia-data-loader.js         # Browser-side data loader
docs/
  CIA_INTEGRATION.md           # Integration documentation

🤖 Recommended Agent

devops-engineer - Best suited for:

  • Data fetching automation
  • GitHub Actions workflows
  • Caching strategies
  • API integration
  • Monitoring and alerting

Secondary: performance-engineer for caching optimization

✅ Acceptance Criteria

  • Automated CIA export fetch workflow operational
  • All 19 CIA visualization products fetched nightly
  • Schema validation against CIA schemas passing
  • Local caching structure established
  • Version tracking for CIA data updates
  • Offline data access working
  • Static site uses cached CIA exports
  • Graceful fallback to live CIA API
  • Monitoring and alerting configured
  • Documentation (CIA_INTEGRATION.md) complete
  • No duplication of CIA's digital twin functionality
  • Manual fetch trigger available
  • Comprehensive logging
  • Error handling and recovery

📚 References

🏷️ Labels

enhancement, github_actions, dependencies

🔒 Security & Compliance

  • Read-only API access to CIA platform
  • Secure credential storage
  • Rate limiting respect
  • Data validation before caching
  • Audit logging
  • No PII storage (CIA handles this)
  • ISMS compliance
  • GDPR alignment (consume CIA's compliant data)

💡 Key Principle

CIA = Source of Truth | Riksdagsmonitor = Consumer

CIA maintains the comprehensive digital twin with all data processing, OSINT analysis, and quality assurance. Riksdagsmonitor fetches and caches CIA's JSON exports for static site rendering and offline visualization.


Priority: High
Estimated Effort: 5-7 days
Dependencies: Issue #13 (CIA schema integration)
Related Issues: Issue #17 (news generation), Issue #19 (OSINT)

Metadata

Metadata

Labels

dependenciesDependency updatesenhancementEnhancementsgithub_actionsPull requests that update GitHub Actions code

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions