Set stop_query in TCPHandler callbacks on read failure#99272
Merged
Algunenano merged 3 commits intoClickHouse:masterfrom Mar 13, 2026
Merged
Set stop_query in TCPHandler callbacks on read failure#99272Algunenano merged 3 commits intoClickHouse:masterfrom
Algunenano merged 3 commits intoClickHouse:masterfrom
Conversation
Followup to ClickHouse#98955. The previous fix only covered `receivePacketsExpectCancel`, but the same race exists between any two callbacks that read from the TCP `ReadBuffer` `*in`. When one callback's read fails (canceling the buffer), the exception unwinds and releases `callback_mutex`. Another callback waiting on the mutex can then acquire it and attempt to read from the now-canceled buffer before `checkIfQueryCanceled` sees the cancellation, triggering: "Logical error: 'ReadBuffer is canceled. Can't read from it.'" Fix: wrap the read sections of all four callbacks (`ExternalTablesInitializer`, `InputBlocksReaderCallback`, `ClusterFunctionReadTaskCallback`, `MergeTreeReadTaskCallback`) in try/catch that sets `state.stop_query = true` before rethrowing, matching the existing pattern in `receivePacketsExpectCancel`. Closes ClickHouse#99258 Co-Authored-By: Claude Opus 4.6 <[email protected]>
Contributor
|
Workflow [PR], commit [8a1f5e8] Summary: ❌
AI ReviewSummaryThis PR fixes a real cancellation race in ClickHouse Rules
Final Verdict
|
tests/queries/0_stateless/04035_url_cluster_callback_cancel_race.sh
Outdated
Show resolved
Hide resolved
CLICKHOUSE_URL already includes '?', so use '&' to append query parameters. Also add a descriptive error message to the style check that catches this mistake. Co-Authored-By: Claude Opus 4.6 <[email protected]>
tests/queries/0_stateless/04035_url_cluster_callback_cancel_race.sh
Outdated
Show resolved
Hide resolved
alexey-milovidov
approved these changes
Mar 13, 2026
Contributor
LLVM Coverage Report
PR changed lines: PR changed-lines coverage: 90.91% (60/66, 0 noise lines excluded) |
Merged
via the queue into
ClickHouse:master
with commit Mar 13, 2026
ee7bd41
162 of 164 checks passed
1 task
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Followup to #98955. The previous fix only covered
receivePacketsExpectCancel, but the same race exists between any two callbacks that read from the TCPReadBuffer*in.When one callback's read fails (canceling the buffer), the exception unwinds and releases
callback_mutex. Another callback waiting on the mutex can then acquire it and attempt to read from the now-canceled buffer beforecheckIfQueryCanceledsees the cancellation, triggering: "Logical error: 'ReadBuffer is canceled. Can't read from it.'"Fix: wrap the read sections of all four callbacks (
ExternalTablesInitializer,InputBlocksReaderCallback,ClusterFunctionReadTaskCallback,MergeTreeReadTaskCallback) in try/catch that setsstate.stop_query = truebefore rethrowing, matching the existing pattern inreceivePacketsExpectCancel.Closes #99258
cc @alexey-milovidov
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
...
Documentation entry for user-facing changes
Note
Medium Risk
Touches TCP query execution callbacks and cancellation signaling; changes are localized but affect concurrency/error paths in the server’s native protocol handling.
Overview
Prevents a race in
TCPHandlerwhere concurrent pipeline callbacks could attempt to read from an already-canceled input buffer by wrapping the read portions of several callbacks intry/catchand settingquery_state->stop_query = truebefore rethrowing.Adds a stateless regression test (
04035_url_cluster_callback_cancel_race) that reproduces the failure viaurl()+ parallel replicas settings, and tightens CI style checks to emit a clearer error when shell tests append?toCLICKHOUSE_URL(which already includes it).Written by Cursor Bugbot for commit 8a1f5e8. This will update automatically on new commits. Configure here.