-
Notifications
You must be signed in to change notification settings - Fork 1
Description
📋 Issue Type
Feature - Data Infrastructure & Integration
🎯 Objective
IMPORTANT: CIA project is already the comprehensive digital twin of Swedish Parliament data. This issue focuses on establishing a consumption pipeline to fetch, cache, and integrate CIA's JSON exports for riksdagsmonitor's static site visualization and analysis.
Implement a robust data consumption pipeline that retrieves CIA platform's JSON exports (349 MPs, 8 parties, 50+ years of data) with automated caching, validation, and updates for offline rendering on riksdagsmonitor.
📊 Current State
- ✅ External links to CIA platform (www.hack23.com/cia)
- ✅ CIA platform maintains complete digital twin
- ❌ No local caching of CIA JSON exports
- ❌ No automated data fetching workflow
- ❌ Direct dependency on CIA platform availability
- ❌ No offline data access for static site generation
Measured Metrics:
- Cached CIA exports: 0 MB
- Fetch frequency: None
- Data completeness: 0% (external links only)
- Offline capability: None
🚀 Desired State
- ✅ Automated CIA JSON export consumption pipeline
- ✅ Nightly data fetch from CIA platform (02:00 CET)
- ✅ Local caching of 19 CIA visualization products
- ✅ Validation against CIA-provided schemas
- ✅ Offline static site generation capability
- ✅ Version tracking of CIA data updates
- ✅ Graceful fallback if CIA unavailable
📊 CIA Data Integration Context
CIA Platform Role:
🏭 CIA IS THE DIGITAL TWIN - Complete data aggregation, processing, and OSINT analysis
📊 CIA Provides: JSON exports of 19 visualization products
🌐 Riksdagsmonitor Consumes: CIA's pre-processed JSON for static site rendering
CIA Product Exports (19 products):
- Overview Dashboard JSON
- Party Performance JSON
- Government Cabinet Scorecard JSON
- Election Cycle Analysis JSON
- Top 10 Rankings (10 products) JSON
- Committee Network Analysis JSON
- Politician Career Analysis JSON
- Party Longitudinal Analysis JSON
Data Source:
- CIA Platform API/Exports:
https://www.hack23.com/cia/api/export/ - CIA JSON Export Specs:
json-export-specs/schemas/ - CIA Sample Data:
service.data.impl/sample-data/
CIA Provides:
✅ Complete digital twin (1971-2024)
✅ Real-time data aggregation from Riksdag APIs
✅ OSINT analysis and risk scoring
✅ Data quality assurance
✅ JSON schema definitions
✅ Validated export files
✅ Historical time series
Riksdagsmonitor Consumes:
📥 JSON exports from CIA
📥 Pre-processed visualizations
📥 OSINT-analyzed data
📥 Schema definitions for validation
Implementation Notes:
- CIA maintains the source of truth
- Riksdagsmonitor caches for offline/static rendering
- No duplication of CIA's digital twin functionality
- Review CIA export endpoints: https://www.hack23.com/cia/api/
- Schema validation: https://github.com/Hack23/cia/tree/master/json-export-specs/schemas
🔧 Implementation Approach
Phase 1: CIA Export Client
Fetch Strategy:
// scripts/fetch-cia-exports.js
class CIAExportClient {
// Fetch all 19 visualization products
async fetchAllExports() {
const products = [
'overview-dashboard',
'party-performance',
'cabinet-scorecard',
'election-analysis',
'top10-influential-mps',
// ... all 19 products
];
for (const product of products) {
await this.fetchExport(product);
}
}
// Fetch single export with validation
async fetchExport(productName) {
const url = `https://www.hack23.com/cia/api/export/${productName}.json`;
const data = await fetch(url);
await this.validateAgainstSchema(data, productName);
await this.cacheExport(productName, data);
}
}Storage Structure:
data/
cia-exports/
current/ # Latest CIA exports
overview-dashboard.json
party-performance.json
cabinet-scorecard.json
election-analysis.json
top10-*.json (10 files)
committee-network.json
politician-career.json
party-longitudinal.json
archive/ # Historical versions
2024-02-04/
2024-02-03/
metadata/
last-fetch.json # Fetch timestamps
export-versions.json # Version tracking
validation-status.json # Schema validation results
Phase 2: Automated Fetch Workflow
Nightly CIA Export Fetch (02:00 CET):
# .github/workflows/fetch-cia-exports.yml
name: Fetch CIA Exports
on:
schedule:
- cron: '0 2 * * *' # 02:00 CET daily
workflow_dispatch:
jobs:
fetch-cia-data:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
- name: Fetch CIA JSON exports
run: node scripts/fetch-cia-exports.js
- name: Validate against CIA schemas
run: node scripts/validate-cia-data.js
- name: Check for updates
id: check
run: |
if git diff --quiet data/cia-exports/current/; then
echo "changed=false" >> $GITHUB_OUTPUT
else
echo "changed=true" >> $GITHUB_OUTPUT
fi
- name: Commit updated exports
if: steps.check.outputs.changed == 'true'
run: |
git config user.name "CIA Export Bot"
git config user.email "[email protected]"
git add data/cia-exports/
git commit -m "Update CIA exports $(date +'%Y-%m-%d %H:%M')"
git push
- name: Trigger site rebuild
if: steps.check.outputs.changed == 'true'
run: gh workflow run deploy.yml
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}Phase 3: Validation & Caching
Schema Validation:
// scripts/validate-cia-data.js
import Ajv from 'ajv';
class CIADataValidator {
async validateExport(productName, data) {
// Fetch schema from CIA
const schema = await this.fetchSchema(productName);
// Validate data
const ajv = new Ajv();
const valid = ajv.validate(schema, data);
if (!valid) {
throw new Error(`Validation failed: ${ajv.errorsText()}`);
}
return true;
}
async fetchSchema(productName) {
const url = `https://github.com/Hack23/cia/raw/master/json-export-specs/schemas/${productName}.schema.json`;
return await fetch(url).then(r => r.json());
}
}Caching Layer:
// scripts/cache-manager.js
class CIAExportCache {
// Archive previous version
async archiveCurrent() {
const timestamp = new Date().toISOString().split('T')[0];
await fs.rename(
'data/cia-exports/current/',
`data/cia-exports/archive/${timestamp}/`
);
}
// Update current with new data
async updateCurrent(exports) {
await this.archiveCurrent();
await fs.writeJSON('data/cia-exports/current/', exports);
}
}Phase 4: Static Site Integration
Data Loading for Visualizations:
// assets/js/cia-data-loader.js
class CIADataLoader {
// Load cached CIA export
async loadExport(productName) {
const url = `/data/cia-exports/current/${productName}.json`;
return await fetch(url).then(r => r.json());
}
// Check if data is fresh (< 24h old)
async isDataFresh() {
const metadata = await this.loadMetadata();
const lastFetch = new Date(metadata.lastFetch);
const now = new Date();
return (now - lastFetch) < 24 * 60 * 60 * 1000;
}
// Fallback to live CIA API if cache stale
async loadWithFallback(productName) {
try {
if (await this.isDataFresh()) {
return await this.loadExport(productName);
} else {
return await this.loadFromCIA(productName);
}
} catch (e) {
return await this.loadFromCIA(productName);
}
}
}Files to Create/Modify
.github/workflows/
fetch-cia-exports.yml # Nightly fetch workflow
data/
cia-exports/
current/ # Latest CIA exports (19 files)
archive/ # Historical versions
metadata/ # Fetch metadata
scripts/
fetch-cia-exports.js # CIA export client
validate-cia-data.js # Schema validation
cache-manager.js # Cache management
assets/
js/
cia-data-loader.js # Browser-side data loader
docs/
CIA_INTEGRATION.md # Integration documentation
🤖 Recommended Agent
devops-engineer - Best suited for:
- Data fetching automation
- GitHub Actions workflows
- Caching strategies
- API integration
- Monitoring and alerting
Secondary: performance-engineer for caching optimization
✅ Acceptance Criteria
- Automated CIA export fetch workflow operational
- All 19 CIA visualization products fetched nightly
- Schema validation against CIA schemas passing
- Local caching structure established
- Version tracking for CIA data updates
- Offline data access working
- Static site uses cached CIA exports
- Graceful fallback to live CIA API
- Monitoring and alerting configured
- Documentation (CIA_INTEGRATION.md) complete
- No duplication of CIA's digital twin functionality
- Manual fetch trigger available
- Comprehensive logging
- Error handling and recovery
📚 References
- CIA Platform: https://www.hack23.com/cia
- CIA Repository: https://github.com/Hack23/cia
- CIA Export Specs: https://github.com/Hack23/cia/tree/master/json-export-specs
- CIA Sample Data: https://github.com/Hack23/cia/tree/master/service.data.impl/sample-data
- CIA Data Model: https://github.com/Hack23/cia/blob/master/DATA_MODEL.md
- OSINT Methodology: https://github.com/Hack23/cia/blob/master/DATA_ANALYSIS_INTOP_OSINT.md
- JSON Schema: https://json-schema.org/
- GitHub Actions: https://docs.github.com/en/actions
- Security Policy: https://github.com/Hack23/ISMS-PUBLIC/blob/main/Secure_Development_Policy.md
🏷️ Labels
enhancement, github_actions, dependencies
🔒 Security & Compliance
- Read-only API access to CIA platform
- Secure credential storage
- Rate limiting respect
- Data validation before caching
- Audit logging
- No PII storage (CIA handles this)
- ISMS compliance
- GDPR alignment (consume CIA's compliant data)
💡 Key Principle
CIA = Source of Truth | Riksdagsmonitor = Consumer
CIA maintains the comprehensive digital twin with all data processing, OSINT analysis, and quality assurance. Riksdagsmonitor fetches and caches CIA's JSON exports for static site rendering and offline visualization.
Priority: High
Estimated Effort: 5-7 days
Dependencies: Issue #13 (CIA schema integration)
Related Issues: Issue #17 (news generation), Issue #19 (OSINT)