Allow cancellation of scalar subqueries#98163
Allow cancellation of scalar subqueries#98163YjyJeff wants to merge 15 commits intoClickHouse:masterfrom
Conversation
Scalar subqueries are executed during the query analysis phase, where the interactive cancel callback was not being checked. This made it impossible to cancel queries like `SELECT (SELECT max(number) FROM system.numbers) + 1` using Ctrl+C or KILL QUERY. Poll the `getInteractiveCancelCallback` between timed pulls (100ms timeout) during scalar subquery execution, in both the new analyzer and the old interpreter paths. Closes ClickHouse#1576 Co-Authored-By: Claude Opus 4.6 <[email protected]>
3549540 to
753d36f
Compare
Verify that a scalar subquery can be cancelled via Ctrl+C (SIGINT). The test starts `SELECT (SELECT max(number) FROM system.numbers) + 1`, sends SIGINT to the client, and checks for `QUERY_WAS_CANCELLED` in `system.query_log`. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Algunenano
left a comment
There was a problem hiding this comment.
Let's use Setting::interactive_delay instead of hardcoding 100 where possible please
`IN` subqueries are built via `CompletedPipelineExecutor` in `FutureSetFromSubquery::buildSetInplace` and `buildOrderedSetInplace`, where the interactive cancel callback was not being checked. Set the cancel callback on the executor so that Ctrl+C can interrupt these subqueries as well. Also extend the stateless test to cover `IN (subquery)` cancellation with a MergeTree table, and remove the unnecessary exit code check. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Replace the hardcoded 100ms pull timeout with `Setting::interactive_delay / 1000` in all cancellation polling loops for scalar and IN subqueries, so the cancellation check interval respects the user-configured setting. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Only use timed pull and invoke `cancel_callback` when the interactive cancel callback is set. Otherwise fall back to the plain blocking `pull` to avoid unnecessary overhead in non-interactive sessions. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Hi @Algunenano , I have fixed the problem. However, I am unable to reproduce the error for 04003_cancel_scalar_subquery on my local machine 🤔 |
The previous query `SELECT max(number) FROM system.numbers` could complete before the cancellation signal arrived, making the test unreliable. Replace it with `SELECT number FROM system.numbers`, which produces an infinite stream and ensures the query is still running when `KILL QUERY` is issued. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
697001b to
118b3cd
Compare
Use `SELECT max(rand()) FROM system.numbers` as the IN subquery instead of `SELECT number FROM system.numbers`. The aggregation never completes because it reads from an infinite stream, which better matches the scalar subquery cancellation scenario and ensures `KILL QUERY` always arrives while the query is still running. Also insert random values into the test table so the rows are unlikely to match the subquery result, preventing any early short-circuit exit.
4aef4ff to
d9e4e27
Compare
…ution The issue was that calling `sendProgress` during scalar subquery execution would reset the progress counters via `fetchValuesAndResetPiecewiseAtomically`. This caused the JSON output format to only see the outer query's statistics (1 row) instead of including the scalar subquery's reads (100000 rows). The fix introduces a dedicated `SubqueryCancelCallback` that checks for cancellation without sending progress updates. This callback: - Still checks for cancel packets from the client - Still sends profile events and logs - Does NOT call `sendProgress`, avoiding the counter reset Changes: - Add `subquery_cancel_callback` member and getter/setter in `Context.h` - Create `make_cancel_callback(bool send_progress)` helper in `TCPHandler.cpp` to generate callbacks with optional progress sending - Update scalar subquery execution (`evaluateScalarSubqueryIfNeeded.cpp`, `ExecuteScalarSubqueriesVisitor.cpp`) to use `getSubqueryCancelCallback` - Update IN subquery execution (`PreparedSets.cpp`) to use the new callback The statistics are now correctly accumulated in `state.progress` and visible in the JSON output format, while still allowing query cancellation to work. Fixes test: `02136_scalar_read_rows_json` Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
Hi @Algunenano, I have fixed all the other reproducible failures. However, I am unable to reproduce the error for 04003_cancel_scalar_subquery locally, despite trying on multiple machines. Do you have any insights? |
Refactor the test to use helper functions `wait_query_started` and `kill_query` similar to `02932_kill_query_sleep`, making it more robust and maintainable. Changes: - Add `wait_query_started` function to reliably wait for query to appear in `system.processes` with timeout handling - Add `kill_query` function to use `KILL QUERY` instead of `kill -INT` and wait for query to actually be cancelled - Use unique query IDs with `$RANDOM` suffix to avoid conflicts - Add descriptive echo messages for better test output readability - Reduce sleep interval from 0.5s to 0.1s for faster test execution The test now follows ClickHouse testing best practices and is more reliable in CI environments. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Algunenano
left a comment
There was a problem hiding this comment.
I don't understand this query, or the code changes do not match the tests.
The test is creating a query with a scalar subquery, and killing it. But this already worked and has worked for years.
What doesn't work is if you send that query and then press Ctrl+C in the client.
…pattern" This reverts commit 518be3b.
| throw Exception(ErrorCodes::BAD_ARGUMENTS, "Cannot evaluate correlated scalar subquery"); | ||
|
|
||
| auto & context = scope.context; | ||
| auto cancel_callback = context->getQueryContext()->getSubqueryCancelCallback(); |
There was a problem hiding this comment.
This will throw THERE_IS_NO_QUERY whenever the context has no query context (e.g. EXPLAIN PLAN, analysis-only paths, or contexts constructed outside a real query). Same applies to the three other call sites in ExecuteScalarSubqueriesVisitor.cpp and PreparedSets.cpp.
All four should be guarded:
auto cancel_callback = context->hasQueryContext()
? context->getQueryContext()->getSubqueryCancelCallback()
: InteractiveCancelCallback{};This is also the most likely cause of the flaky test — the unguarded call crashes the client immediately, before it registers in system.processes.
| throw Exception(ErrorCodes::BAD_ARGUMENTS, "Cannot evaluate correlated scalar subquery"); | ||
|
|
||
| auto & context = scope.context; | ||
| auto cancel_callback = context->getQueryContext()->getSubqueryCancelCallback(); |
There was a problem hiding this comment.
Minor: interactive_delay is in microseconds, so dividing by 1000 gives milliseconds. If the setting is less than 1000 (i.e. < 1ms), interactive_delay_ms becomes 0, which would busy-loop in executor->pull(chunk, 0). Consider clamping to a minimum:
const UInt64 interactive_delay_ms = std::max<UInt64>(100, context->getSettingsRef()[Setting::interactive_delay] / 1000);| while (chunk.getNumRows() == 0 && executor->pull(chunk, interactive_delay_ms)) | ||
| cancel_callback(); | ||
| } | ||
| else |
There was a problem hiding this comment.
The return value of cancel_callback() is discarded here (and in all other call sites). The callback returns true when the query should be cancelled — ignoring it means the cancellation only takes effect on the next pull() call, which might block for another interactive_delay_ms.
Consider:
while (chunk.getNumRows() == 0 && executor->pull(chunk, interactive_delay_ms))
if (cancel_callback())
break;|
|
||
| using InteractiveCancelCallback = std::function<bool()>; | ||
| InteractiveCancelCallback interactive_cancel_callback; /// Callback for usage in interactive sessions with CompletedPipelineExecutor | ||
| InteractiveCancelCallback subquery_cancel_callback; /// Lightweight callback for subquery cancellation without sending progress |
There was a problem hiding this comment.
Worth considering whether a separate subquery_cancel_callback field is necessary. The only difference from interactive_cancel_callback is skipping sendProgress(). An alternative would be to pass a bool send_progress parameter to the existing callback, or have the subquery execution sites just call the existing interactive_cancel_callback and handle the progress reset issue differently.
Not a blocker, but adding per-concern callbacks to Context adds up over time.
There was a problem hiding this comment.
Same thought here. I'd rather reduce the field and reuse interactive_cancel_callback. Also, what's the issue with sending progress generated by subqueries? AFAICT we only execute it every interactive_delay or 100ms, so it seems fine to send progress during the query analysis.
In fact I'd suggest this is desirable, the user sees the query is doing work instead of appearing stuck
There was a problem hiding this comment.
Also, what's the issue with sending progress generated by subqueries?
When sendProgress() is called, it internally calls fetchValuesAndResetPiecewiseAtomically(), which resets the progress counters after reading them. This is fine for normal query execution where progress is sent periodically and the counters accumulate between sends.
However, for scalar and IN subqueries:
- Subqueries execute during the analysis/preparation phase - they run before the main query execution starts
- Progress counters are shared - the subquery uses the same progress tracking mechanism as the main query
- Timing conflict - When we call sendProgress() during subquery execution (via the cancel callback), it resets the counters while the subquery is still running
- Statistics loss - The JSON formatter reads from state.progress.read_rows, which gets reset to near-zero, showing only 1 row instead of the actual 100,001 rows read. This is the reason why 02136_scalar_read_rows_json failed in the init commit
There was a problem hiding this comment.
I need to investigate this. I think having progress is desirable, so we should have both
There was a problem hiding this comment.
Sounds good, I'll wait for your investigation and update the code accordingly.
|
|
||
| query_state->query_context->setInteractiveCancelCallback( | ||
| [this, &query_state]() | ||
| /// Helper lambda to create cancel callbacks with optional progress sending |
There was a problem hiding this comment.
The make_cancel_callback refactor is nice — clean extraction.
The comment on line 790 says "without sending progress" but the callback still calls sendSelectProfileEvents() and sendLogs(). Might be more accurate to say "without resetting progress counters" or "without sending progress packets".
| @@ -0,0 +1,42 @@ | |||
| #!/usr/bin/env bash | |||
There was a problem hiding this comment.
A few issues with the test:
-
The poll loop doesn't detect early client death. If the client crashes (e.g. from the unguarded
getQueryContext()above), the loop spins for 30 seconds thenkill -INTfails with "No such process". Addkill -0 $client_pid 2>/dev/null || breakin the loop. -
The
query_logquery should filter ontype = 'ExceptionWhileProcessing'— without it you might match the initialQueryStartevent which has no exception, or get duplicate lines. -
No cleanup
trap— if the test fails midway,${CLICKHOUSE_TEST_UNIQUE_NAME}_tis left behind. Addtrap 'cleanup' EXITat the top.
Fix potential crashes when accessing `getSubqueryCancelCallback` in contexts where no query context exists. This can happen in certain execution paths where subqueries are evaluated outside of a normal query execution context. The issue was that calling `getQueryContext()->getSubqueryCancelCallback()` would crash if `getQueryContext()` returned null. Now we check `hasQueryContext()` first before attempting to access the cancel callback. Changes: - `evaluateScalarSubqueryIfNeeded.cpp`: Check `hasQueryContext()` before accessing subquery cancel callback, return nullptr if no context - `ExecuteScalarSubqueriesVisitor.cpp`: Same check for old interpreter path - `PreparedSets.cpp`: Check `hasQueryContext()` before setting cancel callback for IN subquery execution (two locations) This fixes CI failures where queries without query context were attempting to access the cancel callback and crashing. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Ensure that the interactive delay used for cancel callback polling during subquery execution has a minimum value of 100ms. This prevents issues with very small or zero delay values that could cause excessive CPU usage from tight polling loops. Changes: - `evaluateScalarSubqueryIfNeeded.cpp`: Use `std::max(UInt64(100), ...)` to enforce minimum 100ms delay for scalar subquery cancellation polling - `ExecuteScalarSubqueriesVisitor.cpp`: Same for old interpreter path - `PreparedSets.cpp`: Same for IN subquery execution (two locations) This matches the pattern used elsewhere in the codebase and ensures reasonable polling intervals even when `interactive_delay` is set to a very small value. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Add `max_rows_to_read = 0` and `max_bytes_to_read = 0` settings to both test queries to prevent resource limit exceptions during cancellation testing. Without these settings, the queries may fail with resource limit errors instead of the expected `QUERY_WAS_CANCELLED` exception. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
1d4e9fb to
c7a2bd8
Compare
Wrap the subquery cancel callback to check its return value and throw `QUERY_WAS_CANCELLED_BY_CLIENT` exception when it returns true. This ensures proper cancellation handling for scalar and IN subqueries in both the analyzer and old interpreter paths. The wrapper is applied when retrieving the callback from the query context, so all call sites automatically get the exception-throwing behavior without needing individual checks. Modified files: - evaluateScalarSubqueryIfNeeded.cpp: Wrap callback for scalar subqueries - ExecuteScalarSubqueriesVisitor.cpp: Wrap callback for scalar subqueries (old interpreter) - PreparedSets.cpp: Wrap callback for IN subqueries (2 locations) Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
| throw Exception(ErrorCodes::QUERY_WAS_CANCELLED_BY_CLIENT, "Received 'Cancel' packet from the client, canceling the query."); | ||
| return false; | ||
| }; | ||
| } |
There was a problem hiding this comment.
Cancel callback wrapping duplicated across four call sites
Medium Severity
The identical cancel-callback wrapping logic (check raw_cancel_callback(), throw QUERY_WAS_CANCELLED_BY_CLIENT if true, return false) is copy-pasted across four call sites in three files (evaluateScalarSubqueryIfNeeded.cpp, ExecuteScalarSubqueriesVisitor.cpp, PreparedSets.cpp ×2). If the cancellation behavior ever needs adjustment (e.g., different error message, additional cleanup, or logging), all four copies must be updated in lockstep. A single helper function—either a free function or a method on Context—would eliminate this duplication.
Additional Locations (2)
There was a problem hiding this comment.
This is a valid concern. There are 5 copies of this.
Let's deduplicate the wrapper into a helper and use interactive_cancel_callback directly.
| wait $client_pid | ||
|
|
||
| $CLICKHOUSE_CLIENT --query "SYSTEM FLUSH LOGS query_log" | ||
| $CLICKHOUSE_CLIENT --query "SELECT exception FROM system.query_log WHERE query_id = '$CLICKHOUSE_TEST_UNIQUE_NAME' AND current_database = '$CLICKHOUSE_DATABASE'" | grep -oF "QUERY_WAS_CANCELLED" |
There was a problem hiding this comment.
Test query_log query lacks type filter for reliability
Low Severity
The system.query_log queries at lines 20 and 40 don't filter on type (e.g., type = 'ExceptionWhileProcessing'). Multiple log entries can exist per query_id (QueryStart, ExceptionWhileProcessing, etc.). While QueryStart rows typically have empty exception fields so grep ignores them, the test is fragile—if duplicate exception rows appear or log behavior changes, the output could contain more or fewer QUERY_WAS_CANCELLED lines than the reference expects.
Additional Locations (1)
|
Hi @Algunenano, I have fixed all the failed tests. 04003_cancel_scalar_subquery now supports cancellation via Ctrl-C. Do you have any further comments? |
|
@YjyJeff I'm currently busy, but I'll review it as soon as I can. Thank you! |
|
Please write the changelog entry. |
| throw Exception(ErrorCodes::QUERY_WAS_CANCELLED_BY_CLIENT, "Received 'Cancel' packet from the client, canceling the query."); | ||
| return false; | ||
| }; | ||
| } |
There was a problem hiding this comment.
This is a valid concern. There are 5 copies of this.
Let's deduplicate the wrapper into a helper and use interactive_cancel_callback directly.
|
|
||
| using InteractiveCancelCallback = std::function<bool()>; | ||
| InteractiveCancelCallback interactive_cancel_callback; /// Callback for usage in interactive sessions with CompletedPipelineExecutor | ||
| InteractiveCancelCallback subquery_cancel_callback; /// Lightweight callback for subquery cancellation without sending progress |
There was a problem hiding this comment.
Same thought here. I'd rather reduce the field and reuse interactive_cancel_callback. Also, what's the issue with sending progress generated by subqueries? AFAICT we only execute it every interactive_delay or 100ms, so it seems fine to send progress during the query analysis.
In fact I'd suggest this is desirable, the user sees the query is doing work instead of appearing stuck
| throw Exception(ErrorCodes::QUERY_WAS_CANCELLED_BY_CLIENT, "Received 'Cancel' packet from the client, canceling the query."); | ||
| return false; | ||
| }; | ||
| executor.setCancelCallback(std::move(cancel_callback), std::max(UInt64(100), context->getSettingsRef()[Setting::interactive_delay] / 1000)); |
There was a problem hiding this comment.
We don't need a wrapper in this file. CompletedPipelineExecutor::execute polls the callback every interactive_timeout_ms, and when it returns true, calls data->executor->cancel() which stops the pipeline. The cancelled pipeline will then cause the set not to be fully built, and the existing "timeout_overflow_mode=break" logic at line 359 already handles that case.
Co-authored-by: Raúl Marín <[email protected]>
| throw Exception(ErrorCodes::QUERY_WAS_CANCELLED_BY_CLIENT, "Received 'Cancel' packet from the client, canceling the query."); | ||
| return false; | ||
| }; | ||
| executor.setCancelCallback(std::move(cancel_callback), std::max(UInt64(100), context->getSettingsRef()[Setting::interactive_delay] / 1000)); |
There was a problem hiding this comment.
Cancel callback throws instead of returning true to executor
Medium Severity
The cancel callback passed to CompletedPipelineExecutor::setCancelCallback throws an exception instead of returning true. The CompletedPipelineExecutor API contract states "If returns true, query would be cancelled" — all other callers (executeQuery.cpp, LocalConnection.cpp) follow this by returning true. Throwing bypasses the executor's normal cancel path: data->executor->cancel() is never called through line 94 of CompletedPipelineExecutor::execute(), and the data->has_exception check at line 98 is skipped, silently dropping any exception from the pipeline execution thread.
Additional Locations (1)
LLVM Coverage Report
PR changed lines: PR changed-lines coverage: 95.83% (138/144) |
|
FYI: I'm testing this locally with the suggested changes to see the test failures and how we can overcome them. I'll push the changes and fixes when I'm happy with them |
|
It took quite some time and there were too many followups to request it to be addressed in this PR (clickhouse local cancellation, progress bars, etc), so I've created a PR that supersedes this one: #100230 |


Scalar subqueries are executed during the query analysis phase, where the interactive cancel callback was not being checked. This made it impossible to cancel queries like
SELECT (SELECT max(number) FROM system.numbers) + 1using Ctrl+C.Poll the
getInteractiveCancelCallbackbetween timed pulls (100ms timeout) during scalar subquery execution, in both the new analyzer and the old interpreter paths.Closes #1576
Related Roadmap #93288
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Add support for cancelling scalar and IN subqueries. Previously, long-running scalar and IN subqueries could not be interrupted by the client, requiring the query to complete or timeout. Now these subqueries respect cancellation requests (Ctrl-C) and terminate promptly
Documentation entry for user-facing changes
Note
Medium Risk
Touches query execution/cancellation plumbing across multiple subquery code paths and introduces timed pull behavior; regressions could affect responsiveness or subquery execution flow under interactive clients.
Overview
Scalar subqueries and
INsubqueries can now be interrupted by client cancel (e.g., Ctrl+C) instead of running until completion/timeout.This introduces a dedicated
subquery_cancel_callbackinContextand wires it fromTCPHandleras a lightweight cancel path (no progress sending). Subquery execution paths in both the analyzer (evaluateScalarSubqueryIfNeeded), legacy scalar visitor (ExecuteScalarSubqueriesVisitor), and prepared set building (PreparedSets) now poll cancellation between timed executor pulls usinginteractive_delayand throwQUERY_WAS_CANCELLED_BY_CLIENTwhen triggered, with a new stateless test covering scalar andINcases.Written by Cursor Bugbot for commit 5821cf8. This will update automatically on new commits. Configure here.