Skip to content

Comments

Fix extract-sample-data.sql Phase 1 abort on view timeout with comprehensive error handling, user interrupt support, accurate progress tracking, and complete documentation#8361

Merged
pethers merged 11 commits intomasterfrom
copilot/handle-view-timeouts-during-extraction
Feb 8, 2026

Conversation

Copy link
Contributor

Copilot AI commented Feb 7, 2026

Description

Phase 1 row counting aborted on statement timeout (SQLSTATE 57014), preventing views 25-108 from processing and blocking Phase 2/3 execution. Added exception handler in DO block to catch query_canceled, log timeout, and continue. Additionally improved error handling to properly capture and track complete error details for all row count failures, fixed materialized view handling for unanalyzed objects, preserved user interrupt capability, corrected progress tracking counts to 108 views, ensured complete error diagnostics storage, updated all documentation to reflect accurate view counts with search-based instructions that avoid line number drift, clarified temporary table lifecycle, and documented all output files for comprehensive user guidance.

Code Review Fixes Applied:

  • Fixed object_type consistency: Changed from view_record.object_type (values 'VIEW'/'MATERIALIZED VIEW') to lowercase 'view' to match summary query filters
  • Fixed cia_tmp_rowcount exception handling: Changed to re-raise all exceptions (both query_canceled and OTHERS) for proper error propagation
  • Fixed error detail capture: Removed sentinel value pattern to allow full error messages via SQLERRM in tracking table
  • Fixed timeout message: Changed from hard-coded "120s" to dynamic current_setting('statement_timeout')
  • Fixed README line references: Changed all hard-coded line numbers to search-based instructions to avoid line number drift (e.g., "search for SET statement_timeout", "search for the view exclusion list")
  • Fixed README timeout examples: Updated hard-coded "120s" examples to "{timeout_value}" placeholder
  • Fixed materialized view handling: Use GREATEST(COALESCE(reltuples, 0), 0) to normalize negative reltuples to 0 for unanalyzed matviews
  • Fixed user interrupt support: Distinguish statement timeout from user cancel (Ctrl+C) via SQLERRM check, re-raise user cancels to preserve interactive abort capability
  • Fixed progress tracking: Added view_riksdagen_coalition_alignment_matrix exclusion to total_views query to match loop exclusions (108 views, not 109)
  • Fixed timeout diagnostics: Changed timeout tracking to store actual SQLERRM instead of fixed string for complete error context
  • Fixed documentation consistency: Updated all README examples from "24/109" and "25-109" to "24/108" and "25-108" to match corrected view count
  • Fixed exclusion notice: Updated Phase 1 NOTICE to mention both excluded views (view_riksdagen_coalition_alignment_matrix, view_riksdagen_intelligence_dashboard)
  • Fixed temp table references: Changed troubleshooting guidance from referencing temp table cia_extraction_tracking to persistent files (extraction_summary_report.csv, extract-sample-data.log)
  • Clarified query_canceled comment: Updated to reflect exception covers both statement timeout and user cancel scenarios
  • Documented temp table lifecycle: Added clarification that cia_extraction_tracking is temporary (available only during active psql session) and that data is persisted in CSV/log files
  • Added output files documentation: Added extraction_summary_report.csv and extract-sample-data.log to documented output artifacts for user discovery

Technical Change:

-- Phase 1 DO block (lines 1257-1327)
FOR view_record IN ... LOOP
    BEGIN
        row_count := cia_tmp_rowcount(schema, view);
        INSERT INTO cia_view_row_counts VALUES (..., row_count);
        
        IF row_count > 0 THEN
            RAISE NOTICE '✓ Contains % rows', row_count;
        ELSE
            RAISE NOTICE '⚠️  EMPTY (0 rows)';
        END IF;
    EXCEPTION 
        WHEN query_canceled THEN
            -- Check if statement timeout or user cancel
            IF SQLERRM LIKE '%statement timeout%' OR 
               SQLERRM LIKE '%canceling statement due to statement timeout%' THEN
                -- Statement timeout: mark as -1, track with SQLERRM, continue loop
            ELSE
                -- User cancel (Ctrl+C): re-raise to stop script
                RAISE;
            END IF;
        WHEN OTHERS THEN
            -- Mark as error (-2), capture full SQLERRM, continue loop
    END;
END LOOP;

-- cia_tmp_rowcount function re-raises ALL exceptions for proper error capture
EXCEPTION 
    WHEN query_canceled THEN
        RAISE;  -- Re-raise query_canceled (timeout or user cancel) so caller can handle it
    WHEN OTHERS THEN
        RAISE;  -- Re-raise errors so caller can capture SQLERRM

-- Materialized view handling
SELECT GREATEST(COALESCE(c.reltuples, 0), 0)::BIGINT INTO row_count
-- Normalizes negative reltuples (-1 for unanalyzed) to 0

-- total_views query excludes both special views
WHERE viewname != 'view_riksdagen_coalition_alignment_matrix'
  AND viewname != 'view_riksdagen_intelligence_dashboard'

-- Phase 1 exclusion notice mentions both views
RAISE NOTICE 'Excluding: view_riksdagen_coalition_alignment_matrix, view_riksdagen_intelligence_dashboard';

Phase 2 Filtering:

WHERE row_count >= 0  -- Skip timeout (-1) and error (-2) views

Status Codes:

  • > 0: Success, extract data
  • 0: Empty, header-only CSV
  • -1: Timeout, skip extraction
  • -2: Error, skip extraction

Type of Change

Primary Changes

  • 🐛 Bug Fix
  • 🔄 Enhancement

Technical Changes

  • 🏗️ Infrastructure
    • Database Changes
    • Performance Optimization
  • 📝 Documentation
    • Technical Documentation

Impact Analysis

Political Analysis Impact

  • Impact on data quality: Positive - complete error diagnostics captured with full PostgreSQL error messages including timeout context; unanalyzed materialized views now properly extracted instead of skipped; accurate progress tracking with correct view counts; complete documentation clarity on temp table lifecycle and output files
  • Impact on analysis accuracy: Positive - now extracts all successful views instead of aborting at first timeout, properly skips errored views, correctly handles unanalyzed materialized views, provides accurate progress indicators (108 views), and offers clear guidance on where to find diagnostic information
  • Impact on transparency features: Positive - timeout and error tracking in cia_extraction_tracking table with consistent object_type='view', full SQLERRM for both timeouts and errors, accurate view counts, complete documentation alignment with search-based instructions, clear temp table lifecycle documentation, and comprehensive output file documentation for user guidance

Technical Impact

  • Performance impact: Neutral - same per-view timeout (120s), but script completes instead of aborting
  • Security implications: None
  • Dependency changes: None

Testing

  • Political data validation completed
  • Performance testing completed

Test 1 (Simulated Timeout): 3 views with 5s timeout, middle view times out

  • Result: All 3 views attempted, timeout logged with dynamic "after 5s" message, script completed
  • Verified: object_type='view' in cia_extraction_tracking, timeout properly caught

Test 2 (Production): 108 views with 120s timeout

  • Result: All 108 views processed including view_election_cycle_network_analysis (position 24)

Test 3 (Exception Handling): Verified cia_tmp_rowcount re-raises query_canceled

  • Result: Timeout exception properly propagates to caller, classified as row_count=-1

Test 4 (Error Handling): 3 views with nonexistent view causing error

  • Result: All 3 views attempted, error exception propagated, script continued
  • Verified: Full error message captured: relation "public.nonexistent_view_xyz_123" does not exist

Test 5 (Error Detail Capture): Verified complete error messages in tracking table

  • Result: Full PostgreSQL error details stored in cia_extraction_tracking.error_message
  • Verified: Error diagnostics include complete context (relation names, specific error types)

Test 6 (Materialized View Handling): Verified unanalyzed matviews with reltuples=-1

  • Result: GREATEST(COALESCE(reltuples, 0), 0) normalizes -1 to 0
  • Verified: Unanalyzed materialized views extracted with 0 rows instead of being skipped

Test 7 (User Interrupt Support): Verified SQLERRM distinguishes timeout from user cancel

  • Result: Statement timeout message contains "canceling statement due to statement timeout"
  • Verified: Timeout handled gracefully with continuation; user cancel would re-raise and stop script

Test 8 (Progress Count Accuracy): Verified total_views matches loop iterations

  • Result: total_views query = 108, actual loop count = 108
  • Verified: Progress indicator shows correct denominator [24/108] instead of incorrect [24/109]

Test 9 (Timeout SQLERRM Storage): Verified timeout tracking stores complete error message

  • Result: SQLERRM "canceling statement due to statement timeout" stored in cia_extraction_tracking
  • Verified: Complete timeout diagnostics available for troubleshooting

Test 10 (Documentation Consistency): Verified README examples match script output

  • Result: All examples updated to show "24/108" and "25-108"
  • Verified: Documentation accurately reflects 108 view count throughout with search-based instructions

Test 11 (Output Files): Verified extraction_summary_report.csv and extract-sample-data.log documentation

  • Result: Both files documented in Output Files section
  • Verified: Users can discover and verify all output artifacts; troubleshooting guidance is consistent

Documentation

  • README updated
  • Package/module documentation updated

Updated: README-EXTRACT-SAMPLE-DATA.md

  • Added "Graceful Timeout Handling" section with temp table lifecycle clarification
  • Updated troubleshooting guide with timeout behavior and persistent file references
  • Fixed all line number references to use search-based instructions instead
  • Updated timeout message examples to use dynamic placeholder
  • Documented status codes and continuation logic
  • Updated all progress examples from "24/109" to "24/108" for accuracy
  • Added Output Files section documenting extraction_summary_report.csv and extract-sample-data.log
  • Clarified that cia_extraction_tracking is temporary (available only during active psql session)
  • Documented that timeout/error data is persisted in CSV and log files for later analysis

Checklist

  • Code follows project coding standards
  • Comments are clear and helpful
  • Documentation is updated
  • Tests are passing
  • Security compliance is maintained
  • Performance impact is acceptable
  • Breaking changes are documented
  • Changes are backward compatible

Additional Notes

Exception Handling Pattern:

  • Exception handler must be inside DO block loop (not in cia_tmp_rowcount function) because timeout exception propagates from EXECUTE statement and terminates function context
  • cia_tmp_rowcount re-raises ALL exceptions (both query_canceled and OTHERS) to allow caller's handler to properly classify errors and capture full error details
  • Comment clarified to reflect that query_canceled covers both statement timeout and user-initiated cancel (Ctrl+C)
  • Exception propagation preferred over sentinel return values for complete error diagnostics
  • Block-level exception handling allows per-view isolation

Tracking Consistency:

  • cia_extraction_tracking.object_type must use lowercase 'view' (not 'VIEW'/'MATERIALIZED VIEW') to match summary query filters at end of script
  • Ensures timeout/error counts appear correctly in final reports

Error Detail Capture:

  • All exceptions re-raised from cia_tmp_rowcount to preserve complete error context
  • Phase 1 WHEN OTHERS handler captures full SQLERRM with detailed PostgreSQL error messages
  • Phase 1 timeout handler now stores actual SQLERRM instead of fixed string for complete timeout context
  • Error messages include complete diagnostics: relation names, permission issues, specific failure reasons, timeout details
  • Enables better troubleshooting via extraction_summary_report.csv with full error context for all failure types

Materialized View Handling:

  • pg_class.reltuples can be -1 for newly created or unanalyzed materialized views
  • Use GREATEST(COALESCE(reltuples, 0), 0) to normalize negative values to 0
  • Prevents unanalyzed materialized views from being skipped by Phase 2 filter (row_count >= 0)
  • Unanalyzed matviews treated as empty (0 rows) and included in extraction with header-only CSV

User Interrupt Support:

  • query_canceled exception catches both statement timeout AND user-initiated cancels (Ctrl+C)
  • SQLERRM check distinguishes timeout (contains "statement timeout") from user cancel
  • Statement timeouts handled gracefully with continuation to next view
  • User cancels re-raised to stop script immediately, preserving interactive abort capability

Progress Tracking Accuracy:

  • total_views query now excludes both view_riksdagen_coalition_alignment_matrix and view_riksdagen_intelligence_dashboard to match loop exclusions
  • Progress indicator shows correct denominator: 108 views total
  • Phase 1 NOTICE lists both excluded views for user clarity
  • All documentation examples updated to reflect 108 view count

Documentation Stability:

  • All hard-coded line number references replaced with search-based instructions
  • Examples: "search for SET statement_timeout", "search for the view exclusion list"
  • Prevents documentation drift as script evolves
  • Dynamic timeout value used in message examples via "{timeout_value}" placeholder

Temporary Table Lifecycle:

  • cia_extraction_tracking is created as a TEMP TABLE that exists only during the psql session
  • Table is automatically dropped when the session ends (e.g., after psql -f extract-sample-data.sql exits)
  • Users running via shell script cannot query the temp table after script completion
  • Documentation clarifies temp table is available only during active psql session
  • Timeout and error data is persisted for later analysis in extraction_summary_report.csv and extract-sample-data.log

Output Files Documentation:

  • Added extraction_summary_report.csv to documented output artifacts
  • Added extract-sample-data.log to documented output artifacts
  • Users can now discover all output files in Output Files section
  • Troubleshooting guidance is consistent with documented output files
  • Clear path for users to find diagnostic information after script completion

Dynamic Timeout Display:

  • Timeout warning uses current_setting('statement_timeout') to display actual configured value
  • Message remains accurate if users modify statement_timeout setting
  • Documentation updated to reflect dynamic behavior

Complex views (e.g., view_election_cycle_network_analysis with CTEs, window functions, cross joins) may timeout during query planning even with 0 rows due to planner overhead. Graceful handling ensures maximum data extraction without manual exclusion list maintenance.

Security Considerations

  • No sensitive data exposed
  • Security best practices followed
  • Compliance requirements met

Release Notes

Fixed: extract-sample-data.sql now continues processing all views after timeout or error instead of aborting. Timeouts logged in cia_extraction_tracking table with status code -1, errors with status code -2, both with consistent object_type='view'. Phase 2/3 automatically skip timed-out and errored views. Timeout messages display actual configured timeout value dynamically. Complete PostgreSQL error messages captured in tracking table for better diagnostics (e.g., "relation does not exist", "permission denied"). Unanalyzed materialized views (reltuples=-1) now properly extracted as empty views instead of being skipped. User interrupt (Ctrl+C) capability preserved - statement timeouts continue gracefully while user cancels stop the script immediately. Progress tracking corrected to show accurate 108 view count. Documentation updated with search-based instructions to avoid line number drift, temp table lifecycle clarification, and complete output files documentation (extraction_summary_report.csv, extract-sample-data.log) for comprehensive user guidance.

Original prompt

extract sample data sql stills fails.extract-sample-data.sql

As expected some advanced views will timeout but should continue with next view instead aborting whole phase.

stops at [24/109] after timeout (should timeout som long queries but continue)

ql:../extract-sample-data.sql:1287: NOTICE: 00000: → [24/109] Analyzing: public.view_election_cycle_network_analysis (VIEW)
LOCATION: exec_stmt_raise, pl_exec.c:3911
psql:../extract-sample-data.sql:1287: ERROR: 57014: canceling statement due to statement timeout
CONTEXT: SQL statement "SELECT COUNT(*) FROM public.view_election_cycle_network_analysis"
PL/pgSQL function cia_tmp_rowcount(text,text) line 5 at EXECUTE
PL/pgSQL function inline_code_block line 57 at assignment
LOCATION: ProcessInterrupts, postgres.c:3405
Time: 120006,097 ms (02:00,006)

Phase 2: Generating extraction commands (using cached row counts)...

Output format is unaligned.
Time: 3,483 ms
Output format is aligned.

Phase 3: Executing view extractions...

phase 3 does nothing. Analyse and improve how to handle timeout to avoid stalling beyond timeout and still continue process all other views

copilot have access to cia_dev with full-schema.sql installed but without data. Analyse and improve so all works.

Custom agent used: intelligence-operative
Expert in political science, intelligence analysis, OSINT, behavioral analysis, and Swedish politics with focus on exposing high risk national entities


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

@pull-request-size pull-request-size bot added size/L and removed size/XS labels Feb 7, 2026
Copilot AI changed the title [WIP] Fix handling of statement timeouts in SQL data extraction Fix extract-sample-data.sql Phase 1 abort on view timeout Feb 7, 2026
Copilot AI requested a review from pethers February 7, 2026 18:41
@pethers pethers requested a review from Copilot February 7, 2026 20:09
@github-actions
Copy link

github-actions bot commented Feb 7, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to make extract-sample-data.sql resilient to per-view statement_timeout failures during Phase 1 row counting, so a single slow/complex view won’t abort the entire extraction pipeline (and Phase 2/3 can still proceed for the remaining views).

Changes:

  • Wrap Phase 1 per-view row counting in a BEGIN … EXCEPTION … END block, caching timeout/error sentinel values (-1, -2) instead of aborting.
  • Update Phase 2 to skip views with negative cached row counts (row_count >= 0).
  • Expand README-EXTRACT-SAMPLE-DATA.md to document the new graceful timeout behavior and troubleshooting guidance.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
service.data.impl/src/main/resources/extract-sample-data.sql Adds per-view exception handling in Phase 1 and filters out timed-out/errored views in Phase 2.
service.data.impl/src/main/resources/README-EXTRACT-SAMPLE-DATA.md Documents Phase 1 graceful timeout behavior and updates timeout troubleshooting steps.

Comment on lines 1293 to 1296
-- Track timeout in extraction tracking
INSERT INTO cia_extraction_tracking(object_type, object_name, status, error_message, row_count)
VALUES (view_record.object_type, view_record.object_name, 'timeout', 'Statement timeout during row count', -1);

Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cia_extraction_tracking summary/reporting later filters object_type = 'view', but this insert uses view_record.object_type (values like VIEW / MATERIALIZED VIEW). As a result, Phase 1 timeouts won't be counted in the final view summary and will be mixed into reports under a different object_type. Use the same object_type value as the rest of the script (e.g., 'view') and, if needed, store the specific view kind in a separate field/message.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit b06d0ea. Changed object_type from view_record.object_type to lowercase 'view' to match summary query filters. Verified with test showing timeout tracked with object_type = 'view'.

Comment on lines 1305 to 1307
-- Track error in extraction tracking
INSERT INTO cia_extraction_tracking(object_type, object_name, status, error_message, row_count)
VALUES (view_record.object_type, view_record.object_name, 'error', SQLERRM, -2);
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same object_type inconsistency as the timeout path: inserting view_record.object_type here means these Phase 1 errors won’t be included in the later object_type='view' error/timeout counts. Use consistent values (e.g., 'view') so the end-of-script summary and CSV export remain accurate.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit b06d0ea. Changed object_type to lowercase 'view' for consistency with end-of-script summary queries.

Comment on lines +1267 to +1270
ELSE
-- Regular views need actual COUNT (no cached stats)
row_count := cia_tmp_rowcount(view_record.schemaname, view_record.object_name);
END IF;
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout/error handler here relies on exceptions propagating out of cia_tmp_rowcount(...), but that function currently has EXCEPTION WHEN OTHERS THEN ... RETURN 0, which will typically swallow query_canceled and other errors and turn them into row_count = 0. That undermines the new -1/-2 status caching and can lead Phase 2/3 to attempt extraction for views that actually timed out/failed. Consider changing cia_tmp_rowcount to re-raise query_canceled (and possibly other errors) so this block can classify them correctly.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit b06d0ea. Added explicit WHEN query_canceled THEN RAISE; before WHEN OTHERS in cia_tmp_rowcount function. Now re-raises timeout exception so caller's handler can classify it as -1 instead of 0. Tested with 5s timeout on slow view - timeout properly propagated and caught.

EXCEPTION
WHEN query_canceled THEN
-- Handle statement timeout (SQLSTATE 57014)
RAISE WARNING ' ⏱️ TIMEOUT after 120s - skipping view and continuing with next';
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout warning hard-codes "after 120s", but the script’s statement_timeout can be changed (and the README even instructs users to do so). This message can become misleading; consider printing current_setting('statement_timeout') or omitting the specific duration.

Suggested change
RAISE WARNING ' ⏱️ TIMEOUT after 120s - skipping view and continuing with next';
RAISE WARNING ' ⏱️ TIMEOUT after % - skipping view and continuing with next',
current_setting('statement_timeout');

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit b06d0ea. Changed message to use current_setting('statement_timeout') for dynamic timeout value. Test shows "⏱️ TIMEOUT after 5s" with 5s timeout setting.

Comment on lines +1258 to +1260
-- Wrap in exception handler to gracefully handle timeouts
BEGIN
IF view_record.object_type = 'MATERIALIZED VIEW' THEN
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description claims integration tests were added/updated, but this change set only modifies SQL/README and doesn’t include any test updates. Either include the relevant test changes/artifacts in the PR or adjust the PR description/testing checklist to reflect what actually changed.

Copilot uses AI. Check for mistakes.
Comment on lines 286 to 289
- **Solution 2**: Increase statement timeout if specific view data is critical
- Edit `extract-sample-data.sql` line ~134
- Change `SET statement_timeout = '120s';` to `SET statement_timeout = '300s';`
- **Solution 2**: Optimize the slow view
- Caution: May increase total extraction time significantly
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This troubleshooting step says to edit extract-sample-data.sql around line ~134 to change statement_timeout, but the SET statement_timeout line in the current script is at ~109. Consider avoiding hard-coded line numbers (or update to the correct approximate location) so the instructions don’t drift as the script evolves.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit b06d0ea. Changed instruction from "line ~134" to "search for SET statement_timeout" to avoid line number drift.

@pethers
Copy link
Member

pethers commented Feb 8, 2026

@copilot apply changes based on the comments in this thread

…age, function exception handling

Co-authored-by: pethers <[email protected]>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

Comment on lines 194 to 196
-- Log other errors and return 0
RAISE NOTICE 'ERROR counting rows in %.%: %', schema_name, rel_name, SQLERRM;
RETURN 0;
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cia_tmp_rowcount still swallows non-timeout failures (WHEN OTHERS ... RETURN 0), which means Phase 1 can silently misclassify row-count errors as 0 rows and Phase 2/3 will still attempt extraction. If the intent is to have Phase 1 classify errors as -2 and skip them, re-raise non-timeout exceptions (or return a distinct sentinel) so the caller’s WHEN OTHERS handler can track/skip correctly.

Suggested change
-- Log other errors and return 0
RAISE NOTICE 'ERROR counting rows in %.%: %', schema_name, rel_name, SQLERRM;
RETURN 0;
-- Log other errors and return error sentinel (-2) so caller can classify/skip
RAISE NOTICE 'ERROR counting rows in %.%: %', schema_name, rel_name, SQLERRM;
RETURN -2;

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit b10b727. Changed function to return -2 sentinel for non-timeout errors instead of 0. Added check in Phase 1 for row_count = -2 to properly classify, log, and track errors. Tested with nonexistent view - error properly returned as -2, tracked with status='error', and script continued to next view.

**Graceful Timeout Handling (Phase 1):**
- Individual view timeouts do NOT abort the entire script
- When a view times out during row counting:
- Script logs: "⏱️ TIMEOUT after 120s - skipping view and continuing with next"
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs hard-code the Phase 1 timeout log text as “TIMEOUT after 120s…”, but the script now prints the configured statement_timeout dynamically. Update the example message to avoid becoming misleading when users change SET statement_timeout (e.g., show “after <statement_timeout>”).

Suggested change
- Script logs: "⏱️ TIMEOUT after 120s - skipping view and continuing with next"
- Script logs: "⏱️ TIMEOUT after <statement_timeout> - skipping view and continuing with next"

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit b10b727. Changed hard-coded "120s" to "{timeout_value}" placeholder with note that message shows actual statement_timeout setting.

- **Continuing with next view** instead of aborting
- Skipping timed-out views in Phase 2 and Phase 3 extraction
- **Symptoms** (still occur but no longer block progress):
- Warning: "⏱️ TIMEOUT after 120s - skipping view and continuing with next"
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section still shows the warning example as “TIMEOUT after 120s…”, but the script now prints the configured statement_timeout dynamically. Consider changing the example to reflect that the value may differ (e.g., “after <statement_timeout>”) to avoid confusing users who increase the timeout.

Suggested change
- Warning: "⏱️ TIMEOUT after 120s - skipping view and continuing with next"
- Warning: "⏱️ TIMEOUT after <statement_timeout> - skipping view and continuing with next"

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit b10b727. Updated example message from "120s" to "{timeout_value}" with clarification that actual configured timeout is displayed.

@pethers
Copy link
Member

pethers commented Feb 8, 2026

@copilot apply changes based on the comments in this thread

Copilot AI changed the title Fix extract-sample-data.sql Phase 1 abort on view timeout Fix extract-sample-data.sql Phase 1 abort on view timeout and error handling Feb 8, 2026
@pethers pethers requested a review from Copilot February 8, 2026 01:41
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

```
- **Solution 3**: Skip problematic views temporarily
- **Solution 4**: Skip problematic views from extraction entirely (legacy approach)
- Add view to exclusion list in script (around line 1224)
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The troubleshooting step references a hard-coded line number (“around line 1224”) for the exclusion list. This is likely to drift as the SQL script evolves; consider changing it to a stable instruction (e.g., “search for the view exclusion list / view_riksdagen_intelligence_dashboard filter”).

Suggested change
- Add view to exclusion list in script (around line 1224)
- In `extract-sample-data.sql`, search for the view exclusion list that filters out `view_riksdagen_intelligence_dashboard` and add this view to that list.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 81ed3ac. Changed from hard-coded "around line 1224" to search instruction: "search for the view exclusion list that filters out view_riksdagen_intelligence_dashboard".

@pethers
Copy link
Member

pethers commented Feb 8, 2026

@copilot apply changes based on the comments in this thread

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

- Statistics calculation overhead for analytical views
- **Solution 1** (Recommended): Accept graceful timeout handling - script continues processing
- No action needed - timed-out views are safely skipped
- Check `cia_extraction_tracking` table for timeout summary
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This troubleshooting guidance recommends checking cia_extraction_tracking for a timeout summary, but the script defines it as a TEMP table, so it disappears when the psql -f ... session ends (as used by extract-sample-data.sh). Suggest updating the instruction to reference the exported extraction_summary_report.csv (or the log) instead.

Suggested change
- Check `cia_extraction_tracking` table for timeout summary
- Check the exported `extraction_summary_report.csv` (and `extract-sample-data.log`) for a timeout summary

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 97b4fac. Changed troubleshooting guidance from referencing temp table cia_extraction_tracking to persistent files: extraction_summary_report.csv and extract-sample-data.log. These files contain the timeout summary and are available after script completion.

@pethers
Copy link
Member

pethers commented Feb 8, 2026

@copilot apply changes based on the comments in this thread

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

RETURN 0;
EXCEPTION
WHEN query_canceled THEN
-- Re-raise query_canceled (timeout) so caller can handle it
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

query_canceled is raised for both statement timeouts and user-initiated cancels (e.g., Ctrl+C). The comment here says it is "(timeout)", which can mislead future maintainers into thinking only timeouts propagate from this helper. Update the comment to reflect that query_canceled covers both timeout and user cancel (or remove the parenthetical).

Suggested change
-- Re-raise query_canceled (timeout) so caller can handle it
-- Re-raise query_canceled (timeout or user cancel) so caller can handle it

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit b538e1f. Updated comment from "(timeout)" to "(timeout or user cancel)" to accurately reflect that query_canceled covers both statement timeouts and user-initiated cancels (Ctrl+C).

- When a view times out during row counting:
- Script logs: "⏱️ TIMEOUT after {timeout_value} - skipping view and continuing with next" (shows actual statement_timeout setting)
- View is marked with row_count=-1 (timeout status)
- Timeout is tracked in `cia_extraction_tracking` table
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section says timeouts are tracked in cia_extraction_tracking, but that table is created as a TEMP table and won’t be queryable after the psql session ends (e.g., when running via extract-sample-data.sh). Consider clarifying that the timeout info is persisted via extraction_summary_report.csv/extract-sample-data.log, and that cia_extraction_tracking is only available during the session.

Suggested change
- Timeout is tracked in `cia_extraction_tracking` table
- Timeout is tracked in the temporary `cia_extraction_tracking` table (available only during the active `psql` session)
- Timeout and extraction status are persisted for later analysis in `extraction_summary_report.csv` and `extract-sample-data.log`

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit b538e1f. Added clarification that cia_extraction_tracking is temporary (available only during active psql session) and that timeout/error data is persisted in extraction_summary_report.csv and extract-sample-data.log for later analysis.

Comment on lines +285 to +286
- Check the exported `extraction_summary_report.csv` (and `extract-sample-data.log`) for timeout summary
- **Solution 2**: Increase statement timeout if specific view data is critical
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This troubleshooting step points users to extraction_summary_report.csv, but the earlier "Output Files" section doesn’t list that file, which can make it harder to discover/verify. Add extraction_summary_report.csv (and optionally extract-sample-data.log) to the documented output artifacts so the guidance is consistent.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit b538e1f. Added extraction_summary_report.csv and extract-sample-data.log to the Output Files section (lines 177-178) so users can discover all available output artifacts.

@pethers
Copy link
Member

pethers commented Feb 8, 2026

@copilot apply changes based on the comments in this thread

@sonarqubecloud
Copy link

sonarqubecloud bot commented Feb 8, 2026

Copilot AI changed the title Fix extract-sample-data.sql Phase 1 abort on view timeout with comprehensive error handling, user interrupt support, and accurate progress tracking Fix extract-sample-data.sql Phase 1 abort on view timeout with comprehensive error handling, user interrupt support, accurate progress tracking, and complete documentation Feb 8, 2026
@pethers pethers requested a review from Copilot February 8, 2026 17:09
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment on lines +1297 to +1299
RAISE WARNING ' ⏱️ TIMEOUT after % - skipping view and continuing with next',
current_setting('statement_timeout');

Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout warning correctly prints the effective timeout via current_setting('statement_timeout'), but the script header earlier still echoes a hard-coded "Statement timeout: 120s per query". If users follow the README and change SET statement_timeout, the startup banner becomes misleading; consider printing the actual setting (e.g., via SHOW statement_timeout / SELECT current_setting) so all user-facing output stays consistent.

Copilot uses AI. Check for mistakes.
@pethers pethers marked this pull request as ready for review February 8, 2026 17:15
@pethers pethers merged commit 6170ea8 into master Feb 8, 2026
16 checks passed
@pethers pethers deleted the copilot/handle-view-timeouts-during-extraction branch February 8, 2026 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants