Skip to content

Dashboard Data Quality: Download and Integrate Missing CIA CSV Files #286

@pethers

Description

@pethers

📋 Issue Type

Performance / Data Integration

🎯 Objective

Download and integrate all missing CIA Platform CSV files to eliminate empty/sparse data in dashboards and replace mock data fallbacks with real intelligence data from the CIA Platform PostgreSQL views.

📊 Current State

Data Quality Analysis

Measured: 2026-02-18

Empty/Sparse CSV Files (< 1KB):

  • distribution_crisis_resilience.csv - 441 bytes (15 lines)
  • distribution_experience_levels.csv - 218 bytes (9 lines)
  • distribution_influence_buckets.csv - 188 bytes (7 lines)
  • distribution_ministry_risk_levels.csv - 91 bytes (2 lines) ⚠️
  • distribution_ministry_risk_quarterly.csv - 405 bytes (17 lines)
  • distribution_politician_risk_levels.csv - 83 bytes (3 lines) ⚠️
  • distribution_risk_by_party.csv - 466 bytes (27 lines)
  • distribution_risk_score_buckets.csv - 164 bytes (8 lines)
  • distribution_voting_anomaly_classification.csv - 66 bytes (2 lines) ⚠️
  • percentile_risk_score_evolution.csv - 519 bytes (30 lines)
  • percentile_voting_anomaly_detection.csv - 278 bytes (13 lines)

Total: 11 files with insufficient data for meaningful visualization

Mock Data Fallback Active:

  • js/risk-dashboard.js - Uses generated mock data when CSV is empty
  • Risk assessment dashboard showing synthetic data instead of real CIA intelligence

Available in CIA Repository:

  • 50+ CSV files in https://github.com/Hack23/cia/tree/master/service.data.impl/sample-data/
  • Real PostgreSQL view exports with complete data
  • Includes: party performance, election cycles, committee activity, ministry data, seasonal patterns

🚀 Desired State

Complete Data Coverage:

  • ✅ All distribution_* CSV files populated with real data
  • ✅ All percentile_* CSV files populated with real data
  • ✅ Risk dashboard uses real CIA intelligence (no mock data)
  • ✅ Ministry dashboards show actual government performance
  • ✅ Anomaly detection based on real voting patterns
  • ✅ Election cycle predictions use historical data (1994-2024)

File Size Targets:

  • distribution_politician_risk_levels.csv: > 10KB (400+ politicians)
  • distribution_ministry_risk_levels.csv: > 5KB (all ministries)
  • distribution_voting_anomaly_classification.csv: > 10KB (all anomaly types)
  • All other distribution files: > 5KB minimum

📊 CIA Data Integration Context

CIA Product(s):

  • Party Performance Dashboard
  • Risk Assessment Intelligence
  • Ministry Effectiveness Scorecard
  • Committee Productivity Matrix
  • Election Cycle Analysis
  • Seasonal Activity Patterns
  • Anomaly Detection Intelligence

Data Source:

  • Repository: https://github.com/Hack23/cia/tree/master/service.data.impl/sample-data/
  • PostgreSQL Views: Real production database exports
  • Format: CSV with UTF-8 encoding, comma-delimited

Sample Data Examples:

  • view_riksdagen_politician_sample.csv - 2081 lines (2.1MB)
  • view_riksdagen_politician_experience_summary_sample.csv - 2095 lines
  • distribution_annual_committee_documents.csv - 820 lines
  • distribution_party_momentum.csv - 451 lines
  • view_riksdagen_committee_decisions.csv - 503 lines

Methodology:

  • CIA Platform OSINT methodology from DATA_ANALYSIS_INTOP_OSINT.md
  • 45 risk rules across 349 MPs (15,705 assessment points)
  • Historical data: 2494 politicians (1971-2024)
  • Real-time parliamentary data integration

Implementation Notes:

  • Review data manifest: /cia-data/data-manifest.json
  • Existing download script: /cia-data/download-csv.sh
  • Validate against schemas in CIA repo
  • Check for new files added since last sync

🔧 Implementation Approach

1. Data Discovery & Inventory (2 hours)

# Clone CIA repo sample-data directory
git clone --depth 1 --filter=blob:none --sparse https://github.com/Hack23/cia.git /tmp/cia-repo
cd /tmp/cia-repo
git sparse-checkout set service.data.impl/sample-data

# Generate file inventory
find service.data.impl/sample-data -name "*.csv" -type f -exec ls -lh {} \; > /tmp/cia-csv-inventory.txt

# Compare with existing riksdagsmonitor files
comm -23 <(sort /tmp/cia-csv-inventory.txt) <(cd /path/to/riksdagsmonitor && ls cia-data/*.csv | sort)

2. Download Missing/Updated Files (3 hours)

# Update download-csv.sh script to include all missing files
cd /path/to/riksdagsmonitor

# Add new file mappings to download-csv.sh:
MISSING_FILES=(
  "distribution_party_performance.csv"
  "distribution_annual_ballots.csv"
  "distribution_annual_party_members.csv"
  "distribution_gender_by_party.csv"
  "distribution_behavioral_patterns_by_party.csv"
  "distribution_decision_patterns_by_party.csv"
  "top10_*.csv"  # All Top 10 rankings
  # ... (complete list from inventory comparison)
)

# Download with validation
./cia-data/download-csv.sh --validate --update-manifest

3. Data Validation & Quality Assurance (2 hours)

# Validate CSV structure
npm run validate:csv

# Check for:
# - UTF-8 encoding
# - Valid CSV format (no truncated rows)
# - Required columns present
# - Data types match expected schema
# - File sizes > minimum threshold

4. Update Data Manifest (1 hour)

// cia-data/data-manifest.json
{
  "files": [
    {
      "name": "distribution_politician_risk_levels.csv",
      "source": "view_riksdagen_politician_risk_summary",
      "description": "Risk level distribution for 400+ politicians",
      "fields": ["risk_level", "politician_count", "percentage", "avg_risk_score"],
      "dashboards": ["risk-dashboard"],
      "last_updated": "2026-02-18",
      "size_bytes": 12458,
      "record_count": 402
    }
    // ... all files
  ]
}

5. Remove Mock Data Fallback (1 hour)

// js/risk-dashboard.js - Remove mock data generation
// Delete functions:
// - generateMockPoliticians()
// - generateMockAnomalyData()

// Update to fail gracefully with clear error:
if (!data || data.length === 0) {
  showDataQualityWarning('Risk data unavailable. Contact support.');
  return;
}

6. Update Documentation (1 hour)

  • Update cia-data/README.md with complete file inventory
  • Document field descriptions for new files
  • Update dashboard documentation with data sources
  • Create data dictionary mapping CSV → Dashboard → Visualization

🤖 Recommended Agent

Agent: data-pipeline-specialist

Rationale:

  • Expert in CIA data consumption and ETL workflows
  • Knows CSV validation and data quality checks
  • Familiar with CIA Platform PostgreSQL views and export formats
  • Can implement automated data refresh workflows
  • Understands caching strategies and data versioning

Custom Instructions:

1. Download all missing CSV files from CIA Platform sample-data
2. Validate data quality (encoding, structure, completeness)
3. Update data-manifest.json with complete metadata
4. Remove mock data fallback from risk-dashboard.js
5. Ensure all dashboards use real CIA intelligence data
6. Document data pipeline and refresh process
7. Create validation tests for CSV data quality

✅ Acceptance Criteria

Data Coverage:

  • All 11 sparse CSV files now have > 5KB data
  • distribution_politician_risk_levels.csv has 400+ rows
  • No CSV files < 1KB in cia-data/
  • All distribution_* files downloaded from CIA repo
  • All percentile_* files downloaded from CIA repo
  • All view_riksdagen_* sample files present

Code Quality:

  • Mock data generation removed from risk-dashboard.js
  • All dashboards render with real CIA data
  • No "using mock data" console warnings
  • Data validation tests pass for all CSV files
  • download-csv.sh script updated with all files

Documentation:

  • data-manifest.json updated with all files
  • Field descriptions documented for each file
  • Dashboard → CSV mapping documented
  • cia-data/README.md updated with complete inventory

Validation:

  • All CSV files validated (UTF-8, proper format)
  • File sizes verified (no empty/truncated files)
  • npm run validate:csv passes
  • No git-tracked files with < 1KB size

📚 References

Repository:

CIA Documentation:

ISMS Policy:

Architecture:

  • Data Pipeline: FUTURE_ARCHITECTURE.md § Data Pipeline & CIA Integration
  • Data Model: DATA_MODEL.md § CIA Data Exports

🏷️ Labels

type:performance, type:data-integration, priority:high, component:data-pipeline, component:cia-integration, agent:data-pipeline-specialist, status:ready, dashboard-quality

🔗 Related Issues

📈 Success Metrics

Before: 11 CSV files < 1KB, mock data in dashboards
After: All CSV files > 5KB, real CIA intelligence data in all dashboards

Performance Impact: Local CSV loading ~10x faster than remote fetch
Data Quality: Real intelligence instead of synthetic mock data
User Impact: Accurate political analysis based on real parliamentary data


Estimated Effort: 10 hours
Complexity: Medium
Priority: High (blocks dashboard quality improvements)
Dependencies: None (can start immediately)

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions