Skip to content

Fix SQL syntax error in extract-sample-data.sql Phase 2 preventing view extraction #8356

@pethers

Description

@pethers

🎯 Objective

Fix SQL syntax error in extract-sample-data.sql Phase 2 that prevents ANY view extraction from occurring.

📋 Background

PR #8348 extraction log shows that 0 views were extracted despite 28 materialized views being refreshed successfully. The root cause is a SQL syntax error at line ~1487 in the extraction command generation SQL.

Error from log (line 2243-2246):

psql:../extract-sample-data.sql:1618: ERROR:  42601: syntax error at or near "viewname"
LINE 301:                 file_prefix
                                ^

This error occurs during Phase 2 ("Generating extraction commands") which builds the dynamic SQL commands to extract each view. Because this phase fails, Phase 3 executes zero extraction commands.

📊 Current State (Measured from Log)

  • Phase 0: ✅ 28 materialized views refreshed successfully (all populated)
  • Phase 1: ✅ 55+ distinct value CSV files extracted
  • Phase 2: ❌ SQL SYNTAX ERROR in extraction command generation
  • Phase 3: ⚠️ 0 view extraction commands executed (empty due to Phase 2 failure)
  • Phase 4: ✅ 94 tables extracted (93 with data, 1 empty)
  • Phase 5-6: ✅ Statistics and distributions generated
  • Overall Result: 0 of 109 views extracted, but all other phases successful

Misleading File: The problematic_views.csv file showing "materialized view not populated" errors is from a PREVIOUS validation run, not from this extraction attempt. This caused initial misanalysis.

✅ Acceptance Criteria

  • Fix SQL syntax error around line 1487 in extract-sample-data.sql
  • Verify SQL compiles without errors: psql -f extract-sample-data.sql --dry-run
  • Re-run full extraction and verify all 109 views are processed
  • Confirm extraction summary shows: "VIEWS: Total processed: 109"
  • All view sample CSV files created in sample-data/ directory
  • Zero SQL syntax errors in extraction log
  • Update extraction documentation if needed

🛠️ Implementation Guidance

Files to Modify:

  • service.data.impl/src/main/resources/extract-sample-data.sql - Fix syntax error around line 1487

Root Cause:
Line 1487 in the WEEKLY views section is missing the E'\n' string continuation:

-- INCORRECT (line 1487):
'INSERT INTO cia_extraction_tracking (object_type, object_name, status, row_count) VALUES (''view'', ''' || '%s' || ''', CASE WHEN %s = 0 THEN ''empty'' ELSE ''success'' END, %s);' || E''

-- CORRECT (should be):
'INSERT INTO cia_extraction_tracking (object_type, object_name, status, row_count) VALUES (''view'', ''' || '%s' || ''', CASE WHEN %s = 0 THEN ''empty'' ELSE ''success'' END, %s);' || E'\n' ||

Fix:

-- Around line 1487, in the WEEKLY views section
-- Find this line:
                'INSERT INTO cia_extraction_tracking (object_type, object_name, status, row_count) VALUES (''view'', ''' || '%s' || ''', CASE WHEN %s = 0 THEN ''empty'' ELSE ''success'' END, %s);' || E''

-- Replace with:
                'INSERT INTO cia_extraction_tracking (object_type, object_name, status, row_count) VALUES (''view'', ''' || '%s' || ''', CASE WHEN %s = 0 THEN ''empty'' ELSE ''success'' END, %s);' || E'\n' ||

Testing Strategy:

# 1. Syntax validation
cd service.data.impl/src/main/resources
psql -U postgres -d cia_dev -f extract-sample-data.sql 2>&1 | tee test_run.log

# 2. Check for errors
grep -i "ERROR" test_run.log

# 3. Verify view count
grep "VIEWS: Total processed:" test_run.log

# 4. Check sample files created
ls -1 service.data.impl/sample-data/view_*_sample.csv | wc -l  # Should be ~109

🔍 Verification Steps

  1. Before Fix: Run extraction, see "ERROR: syntax error at or near viewname"
  2. After Fix: Run extraction, see "VIEWS: Total processed: 109"
  3. Confirm: 109 view_*_sample.csv files exist in sample-data/
  4. Validate: extraction_summary_report.csv shows 109 view entries

🤖 Recommended Agent

Agent: @stack-specialist

Rationale: This issue requires PostgreSQL SQL expertise, understanding of dynamic SQL generation with format() functions, and knowledge of the CIA database schema. The Stack Specialist has the SQL and database skills to fix this syntax error correctly.

📚 Related Resources

  • PR sample data update and log #8348 - Contains extract-sample-data.log showing the error
  • service.data.impl/sample-data/extract-sample-data.log - Full extraction log (2843 lines)
  • Line 2243-2246 of log shows the syntax error
  • Line ~1487 of extract-sample-data.sql contains the bug

📊 Impact

Critical - Without this fix, zero views are extracted, making sample data incomplete for:

  • View debugging and validation
  • Query performance testing
  • Intelligence product development
  • Temporal analysis and trends

📊 Metadata

Priority: Critical | Effort: S (15-30 min) | Impact: Enables extraction of all 109 views

Labels: type: bug, priority: critical, domain: database, size: small

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions