Skip to content

Integrate CIA JSON Schemas for Validation Framework #13

@pethers

Description

@pethers

📋 Issue Type

Feature - Data Integration & Validation

🎯 Objective

IMPORTANT: CIA project provides authoritative JSON schemas for all 19 visualization products. This issue focuses on integrating CIA's schemas for validation and type safety in riksdagsmonitor.

Implement integration with CIA platform's JSON schemas to enable automated validation, type checking, and seamless consumption of CIA's data exports across all 19 visualization products for riksdagsmonitor.

📊 Current State

  • ✅ Static HTML pages for 14 languages exist
  • ✅ External links to CIA platform (www.hack23.com/cia)
  • ✅ CIA maintains and publishes JSON schemas
  • ❌ No local schema integration for validation
  • ❌ No automated schema updates
  • ❌ No type definitions for CIA exports
  • ❌ Manual validation required

Measured Metrics:

  • CIA schemas integrated: 0/19
  • Schema validation: None
  • Type definitions: None
  • Schema update frequency: Manual

🚀 Desired State

  • ✅ All 19 CIA JSON schemas integrated from CIA repo
  • ✅ Automated schema validation in CI/CD
  • ✅ TypeScript/JSDoc type definitions generated from CIA schemas
  • ✅ Automated schema update detection
  • ✅ Schema-driven visualization rendering
  • ✅ CI/CD validates data against CIA schemas

📊 CIA Data Integration Context

CIA Platform Role:
🏭 CIA Creates & Maintains: JSON schemas for all data exports
📋 CIA Provides: Schema definitions in json-export-specs/
🔍 CIA Validates: All exports against schemas before publishing

Riksdagsmonitor Role:
📥 Consumes: CIA's schemas for local validation
Validates: Cached CIA exports against schemas
🔧 Generates: Type definitions from CIA schemas

CIA Schema Products (19 schemas):

  • Overview Dashboard Schema
  • Party Performance Schema
  • Government Cabinet Scorecard Schema
  • Election Cycle Analysis Schema
  • Top 10 Rankings Schemas (10 schemas)
  • Committee Network Analysis Schema
  • Politician Career Analysis Schema
  • Party Longitudinal Analysis Schema

Data Source:

  • CIA JSON Schemas: https://github.com/Hack23/cia/tree/master/json-export-specs/schemas/
  • CIA Sample Data: https://github.com/Hack23/cia/tree/master/service.data.impl/sample-data/
  • CIA Visualization Specs: https://github.com/Hack23/cia/tree/master/json-export-specs/visualizations/

Schema Files:

json-export-specs/schemas/
  overview-dashboard.schema.json
  party-performance.schema.json
  cabinet-scorecard.schema.json
  election-analysis.schema.json
  top10-influential-mps.schema.json
  top10-productive-mps.schema.json
  top10-controversial-mps.schema.json
  top10-absent-mps.schema.json
  top10-rebels.schema.json
  top10-coalition-brokers.schema.json
  top10-rising-stars.schema.json
  top10-electoral-risk.schema.json
  top10-ethics-concerns.schema.json
  top10-media-presence.schema.json
  committee-network.schema.json
  politician-career.schema.json
  party-longitudinal.schema.json

Methodology:

  • CIA OSINT methodologies from DATA_ANALYSIS_INTOP_OSINT.md (451.4 KB)
  • Schema-first design approach
  • JSON Schema Draft 7/2020-12 standard

Implementation Notes:

🔧 Implementation Approach

Phase 1: CIA Schema Integration

Schema Sync Strategy:

// scripts/sync-cia-schemas.js
class CIASchemaSync {
  // Fetch all schemas from CIA repository
  async syncAllSchemas() {
    const baseUrl = 'https://raw.githubusercontent.com/Hack23/cia/master/json-export-specs/schemas/';
    const schemas = [
      'overview-dashboard',
      'party-performance',
      'cabinet-scorecard',
      // ... all 19 schemas
    ];
    
    for (const schemaName of schemas) {
      await this.fetchSchema(schemaName);
    }
  }
  
  // Fetch individual schema
  async fetchSchema(schemaName) {
    const url = `${baseUrl}${schemaName}.schema.json`;
    const schema = await fetch(url).then(r => r.json());
    await fs.writeJSON(`schemas/cia/${schemaName}.schema.json`, schema);
  }
  
  // Check for schema updates
  async checkForUpdates() {
    const localSchemas = await this.loadLocalSchemas();
    const remoteSchemas = await this.loadRemoteSchemas();
    return this.compareSchemas(localSchemas, remoteSchemas);
  }
}

Directory Structure:

schemas/
  cia/                        # CIA-provided schemas
    overview-dashboard.schema.json
    party-performance.schema.json
    cabinet-scorecard.schema.json
    election-analysis.schema.json
    top10-*.schema.json (10 files)
    committee-network.schema.json
    politician-career.schema.json
    party-longitudinal.schema.json
  metadata/
    schema-versions.json      # Track CIA schema versions
    last-sync.json           # Last sync timestamp

Phase 2: Schema Validation

Validation Engine:

// scripts/validate-against-cia-schemas.js
import Ajv from 'ajv';
import addFormats from 'ajv-formats';

class CIASchemaValidator {
  constructor() {
    this.ajv = new Ajv({ allErrors: true, verbose: true });
    addFormats(this.ajv);
  }
  
  // Validate CIA export against CIA schema
  async validateExport(productName, data) {
    const schema = await this.loadCIASchema(productName);
    const validate = this.ajv.compile(schema);
    const valid = validate(data);
    
    if (!valid) {
      console.error(`Validation failed for ${productName}:`);
      console.error(validate.errors);
      throw new Error(`Schema validation failed`);
    }
    
    return true;
  }
  
  // Load CIA schema
  async loadCIASchema(productName) {
    return await fs.readJSON(`schemas/cia/${productName}.schema.json`);
  }
  
  // Validate all cached CIA exports
  async validateAllExports() {
    const exports = await fs.readdir('data/cia-exports/current/');
    const results = [];
    
    for (const exportFile of exports) {
      const productName = exportFile.replace('.json', '');
      const data = await fs.readJSON(`data/cia-exports/current/${exportFile}`);
      
      try {
        await this.validateExport(productName, data);
        results.push({ product: productName, valid: true });
      } catch (e) {
        results.push({ product: productName, valid: false, error: e.message });
      }
    }
    
    return results;
  }
}

Phase 3: Type Generation (Optional)

TypeScript Type Generation:

// scripts/generate-types-from-cia-schemas.js
import { compile } from 'json-schema-to-typescript';

class CIATypeGenerator {
  // Generate TypeScript types from CIA schemas
  async generateAllTypes() {
    const schemas = await fs.readdir('schemas/cia/');
    
    for (const schemaFile of schemas) {
      const schema = await fs.readJSON(`schemas/cia/${schemaFile}`);
      const types = await compile(schema, schemaFile.replace('.schema.json', ''));
      await fs.writeFile(`types/${schemaFile.replace('.schema.json', '.d.ts')}`, types);
    }
  }
}

Phase 4: CI/CD Integration

Schema Validation Workflow:

# .github/workflows/validate-cia-data.yml
name: Validate CIA Data

on:
  push:
    paths:
      - 'data/cia-exports/**'
  schedule:
    - cron: '0 3 * * *'  # Daily validation
  workflow_dispatch:

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      
      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'
      
      - name: Install dependencies
        run: npm install ajv ajv-formats
      
      - name: Sync CIA schemas
        run: node scripts/sync-cia-schemas.js
      
      - name: Validate all CIA exports
        run: node scripts/validate-against-cia-schemas.js
      
      - name: Generate validation report
        if: always()
        run: |
          echo "## CIA Data Validation Report" >> $GITHUB_STEP_SUMMARY
          cat validation-report.json >> $GITHUB_STEP_SUMMARY

Schema Update Detection:

# .github/workflows/check-cia-schema-updates.yml
name: Check CIA Schema Updates

on:
  schedule:
    - cron: '0 6 * * 1'  # Weekly Monday 06:00
  workflow_dispatch:

jobs:
  check-updates:
    runs-on: ubuntu-latest
    steps:
      - name: Check for CIA schema updates
        run: node scripts/check-cia-schema-updates.js
      
      - name: Create PR if updates found
        if: steps.check.outputs.updates == 'true'
        uses: peter-evans/create-pull-request@v5
        with:
          title: "Update CIA schemas to latest version"
          body: "Automated update of CIA JSON schemas"
          branch: "update-cia-schemas"

Files to Create/Modify

schemas/
  cia/                          # CIA schemas (19 files)
  metadata/
    schema-versions.json
    last-sync.json
types/                          # Generated TypeScript types (optional)
  overview-dashboard.d.ts
  party-performance.d.ts
  # ... (19 files)
.github/workflows/
  validate-cia-data.yml         # Validation workflow
  check-cia-schema-updates.yml  # Schema update checker
  sync-cia-schemas.yml          # Schema sync workflow
scripts/
  sync-cia-schemas.js           # Schema sync
  validate-against-cia-schemas.js  # Validation
  generate-types-from-cia-schemas.js  # Type generation (optional)
  check-cia-schema-updates.js   # Update detection
package.json                    # Add ajv, ajv-formats
docs/
  CIA_SCHEMA_INTEGRATION.md     # Documentation

🤖 Recommended Agent

devops-engineer - Best suited for:

  • CI/CD pipeline automation
  • Schema synchronization workflows
  • Validation infrastructure
  • GitHub Actions workflow design

Alternative: code-quality-engineer for validation logic and type generation

✅ Acceptance Criteria

  • All 19 CIA JSON schemas synced from CIA repository
  • Automated schema sync workflow operational
  • Schema validation integrated in CI/CD
  • All CIA exports validate against schemas
  • Schema version tracking implemented
  • TypeScript type definitions generated (optional)
  • Weekly schema update detection working
  • Validation failures trigger alerts
  • Documentation (CIA_SCHEMA_INTEGRATION.md) complete
  • No schema duplication (CIA is source)
  • Manual schema sync trigger available
  • Comprehensive validation reporting

📚 References

🏷️ Labels

enhancement, documentation, dependencies

🔒 Security Considerations

  • Validate all CIA exports against schemas
  • Monitor for schema tampering
  • Use checksums for schema integrity
  • Read-only access to CIA schemas
  • Audit logging for validation failures
  • Automated alerting for schema violations

💡 Key Principle

CIA = Schema Authority | Riksdagsmonitor = Schema Consumer

CIA creates, maintains, and versions all JSON schemas. Riksdagsmonitor syncs and validates against CIA's authoritative schemas.


Priority: High
Estimated Effort: 3-5 days
Dependencies: None (foundation for other issues)
Related Issues: Enables #14-#20

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions