-
Notifications
You must be signed in to change notification settings - Fork 1
Description
📋 Issue Type
Performance / Data Integration
🎯 Objective
Download and integrate all missing CIA Platform CSV files to eliminate empty/sparse data in dashboards and replace mock data fallbacks with real intelligence data from the CIA Platform PostgreSQL views.
📊 Current State
Data Quality Analysis
Measured: 2026-02-18
Empty/Sparse CSV Files (< 1KB):
distribution_crisis_resilience.csv- 441 bytes (15 lines)distribution_experience_levels.csv- 218 bytes (9 lines)distribution_influence_buckets.csv- 188 bytes (7 lines)distribution_ministry_risk_levels.csv- 91 bytes (2 lines)⚠️ distribution_ministry_risk_quarterly.csv- 405 bytes (17 lines)distribution_politician_risk_levels.csv- 83 bytes (3 lines)⚠️ distribution_risk_by_party.csv- 466 bytes (27 lines)distribution_risk_score_buckets.csv- 164 bytes (8 lines)distribution_voting_anomaly_classification.csv- 66 bytes (2 lines)⚠️ percentile_risk_score_evolution.csv- 519 bytes (30 lines)percentile_voting_anomaly_detection.csv- 278 bytes (13 lines)
Total: 11 files with insufficient data for meaningful visualization
Mock Data Fallback Active:
js/risk-dashboard.js- Uses generated mock data when CSV is empty- Risk assessment dashboard showing synthetic data instead of real CIA intelligence
Available in CIA Repository:
- 50+ CSV files in
https://github.com/Hack23/cia/tree/master/service.data.impl/sample-data/ - Real PostgreSQL view exports with complete data
- Includes: party performance, election cycles, committee activity, ministry data, seasonal patterns
🚀 Desired State
Complete Data Coverage:
- ✅ All distribution_* CSV files populated with real data
- ✅ All percentile_* CSV files populated with real data
- ✅ Risk dashboard uses real CIA intelligence (no mock data)
- ✅ Ministry dashboards show actual government performance
- ✅ Anomaly detection based on real voting patterns
- ✅ Election cycle predictions use historical data (1994-2024)
File Size Targets:
distribution_politician_risk_levels.csv: > 10KB (400+ politicians)distribution_ministry_risk_levels.csv: > 5KB (all ministries)distribution_voting_anomaly_classification.csv: > 10KB (all anomaly types)- All other distribution files: > 5KB minimum
📊 CIA Data Integration Context
CIA Product(s):
- Party Performance Dashboard
- Risk Assessment Intelligence
- Ministry Effectiveness Scorecard
- Committee Productivity Matrix
- Election Cycle Analysis
- Seasonal Activity Patterns
- Anomaly Detection Intelligence
Data Source:
- Repository:
https://github.com/Hack23/cia/tree/master/service.data.impl/sample-data/ - PostgreSQL Views: Real production database exports
- Format: CSV with UTF-8 encoding, comma-delimited
Sample Data Examples:
view_riksdagen_politician_sample.csv- 2081 lines (2.1MB)view_riksdagen_politician_experience_summary_sample.csv- 2095 linesdistribution_annual_committee_documents.csv- 820 linesdistribution_party_momentum.csv- 451 linesview_riksdagen_committee_decisions.csv- 503 lines
Methodology:
- CIA Platform OSINT methodology from DATA_ANALYSIS_INTOP_OSINT.md
- 45 risk rules across 349 MPs (15,705 assessment points)
- Historical data: 2494 politicians (1971-2024)
- Real-time parliamentary data integration
Implementation Notes:
- Review data manifest:
/cia-data/data-manifest.json - Existing download script:
/cia-data/download-csv.sh - Validate against schemas in CIA repo
- Check for new files added since last sync
🔧 Implementation Approach
1. Data Discovery & Inventory (2 hours)
# Clone CIA repo sample-data directory
git clone --depth 1 --filter=blob:none --sparse https://github.com/Hack23/cia.git /tmp/cia-repo
cd /tmp/cia-repo
git sparse-checkout set service.data.impl/sample-data
# Generate file inventory
find service.data.impl/sample-data -name "*.csv" -type f -exec ls -lh {} \; > /tmp/cia-csv-inventory.txt
# Compare with existing riksdagsmonitor files
comm -23 <(sort /tmp/cia-csv-inventory.txt) <(cd /path/to/riksdagsmonitor && ls cia-data/*.csv | sort)2. Download Missing/Updated Files (3 hours)
# Update download-csv.sh script to include all missing files
cd /path/to/riksdagsmonitor
# Add new file mappings to download-csv.sh:
MISSING_FILES=(
"distribution_party_performance.csv"
"distribution_annual_ballots.csv"
"distribution_annual_party_members.csv"
"distribution_gender_by_party.csv"
"distribution_behavioral_patterns_by_party.csv"
"distribution_decision_patterns_by_party.csv"
"top10_*.csv" # All Top 10 rankings
# ... (complete list from inventory comparison)
)
# Download with validation
./cia-data/download-csv.sh --validate --update-manifest3. Data Validation & Quality Assurance (2 hours)
# Validate CSV structure
npm run validate:csv
# Check for:
# - UTF-8 encoding
# - Valid CSV format (no truncated rows)
# - Required columns present
# - Data types match expected schema
# - File sizes > minimum threshold4. Update Data Manifest (1 hour)
// cia-data/data-manifest.json
{
"files": [
{
"name": "distribution_politician_risk_levels.csv",
"source": "view_riksdagen_politician_risk_summary",
"description": "Risk level distribution for 400+ politicians",
"fields": ["risk_level", "politician_count", "percentage", "avg_risk_score"],
"dashboards": ["risk-dashboard"],
"last_updated": "2026-02-18",
"size_bytes": 12458,
"record_count": 402
}
// ... all files
]
}5. Remove Mock Data Fallback (1 hour)
// js/risk-dashboard.js - Remove mock data generation
// Delete functions:
// - generateMockPoliticians()
// - generateMockAnomalyData()
// Update to fail gracefully with clear error:
if (!data || data.length === 0) {
showDataQualityWarning('Risk data unavailable. Contact support.');
return;
}6. Update Documentation (1 hour)
- Update
cia-data/README.mdwith complete file inventory - Document field descriptions for new files
- Update dashboard documentation with data sources
- Create data dictionary mapping CSV → Dashboard → Visualization
🤖 Recommended Agent
Agent: data-pipeline-specialist
Rationale:
- Expert in CIA data consumption and ETL workflows
- Knows CSV validation and data quality checks
- Familiar with CIA Platform PostgreSQL views and export formats
- Can implement automated data refresh workflows
- Understands caching strategies and data versioning
Custom Instructions:
1. Download all missing CSV files from CIA Platform sample-data
2. Validate data quality (encoding, structure, completeness)
3. Update data-manifest.json with complete metadata
4. Remove mock data fallback from risk-dashboard.js
5. Ensure all dashboards use real CIA intelligence data
6. Document data pipeline and refresh process
7. Create validation tests for CSV data quality
✅ Acceptance Criteria
Data Coverage:
- All 11 sparse CSV files now have > 5KB data
- distribution_politician_risk_levels.csv has 400+ rows
- No CSV files < 1KB in cia-data/
- All distribution_* files downloaded from CIA repo
- All percentile_* files downloaded from CIA repo
- All view_riksdagen_* sample files present
Code Quality:
- Mock data generation removed from risk-dashboard.js
- All dashboards render with real CIA data
- No "using mock data" console warnings
- Data validation tests pass for all CSV files
- download-csv.sh script updated with all files
Documentation:
- data-manifest.json updated with all files
- Field descriptions documented for each file
- Dashboard → CSV mapping documented
- cia-data/README.md updated with complete inventory
Validation:
- All CSV files validated (UTF-8, proper format)
- File sizes verified (no empty/truncated files)
- npm run validate:csv passes
- No git-tracked files with < 1KB size
📚 References
Repository:
- Main: https://github.com/Hack23/riksdagsmonitor
- CIA Platform: https://github.com/Hack23/cia/tree/master/service.data.impl/sample-data
CIA Documentation:
- OSINT Methodology: https://github.com/Hack23/cia/blob/master/DATA_ANALYSIS_INTOP_OSINT.md
- Business Products: https://github.com/Hack23/cia/blob/master/BUSINESS_PRODUCT_DOCUMENT.md
- Sample Data README: https://github.com/Hack23/cia/blob/master/service.data.impl/sample-data/README.md
ISMS Policy:
- Secure Development: https://github.com/Hack23/ISMS-PUBLIC/blob/main/Secure_Development_Policy.md
- Data Classification: https://github.com/Hack23/ISMS-PUBLIC/blob/main/CLASSIFICATION.md
Architecture:
- Data Pipeline: FUTURE_ARCHITECTURE.md § Data Pipeline & CIA Integration
- Data Model: DATA_MODEL.md § CIA Data Exports
🏷️ Labels
type:performance, type:data-integration, priority:high, component:data-pipeline, component:cia-integration, agent:data-pipeline-specialist, status:ready, dashboard-quality
🔗 Related Issues
- Issue CIA Data Consumption Pipeline (JSON Export Integration) #18: CIA Data Pipeline Foundation (if exists)
- Issue Visualize CIA OSINT Intelligence Exports #19: CIA Intelligence Dashboard (if exists)
- Issue Visualize CIA Election 2026 Forecasting Models #20: CIA Data Visualization (if exists)
📈 Success Metrics
Before: 11 CSV files < 1KB, mock data in dashboards
After: All CSV files > 5KB, real CIA intelligence data in all dashboards
Performance Impact: Local CSV loading ~10x faster than remote fetch
Data Quality: Real intelligence instead of synthetic mock data
User Impact: Accurate political analysis based on real parliamentary data
Estimated Effort: 10 hours
Complexity: Medium
Priority: High (blocks dashboard quality improvements)
Dependencies: None (can start immediately)