Skip to content

Set stop_query in TCPHandler callbacks on read failure#99272

Merged
Algunenano merged 3 commits intoClickHouse:masterfrom
Algunenano:fix-tcp-handler-callback-cancel-race
Mar 13, 2026
Merged

Set stop_query in TCPHandler callbacks on read failure#99272
Algunenano merged 3 commits intoClickHouse:masterfrom
Algunenano:fix-tcp-handler-callback-cancel-race

Conversation

@Algunenano
Copy link
Copy Markdown
Member

@Algunenano Algunenano commented Mar 11, 2026

Followup to #98955. The previous fix only covered receivePacketsExpectCancel, but the same race exists between any two callbacks that read from the TCP ReadBuffer *in.

When one callback's read fails (canceling the buffer), the exception unwinds and releases callback_mutex. Another callback waiting on the mutex can then acquire it and attempt to read from the now-canceled buffer before checkIfQueryCanceled sees the cancellation, triggering: "Logical error: 'ReadBuffer is canceled. Can't read from it.'"

Fix: wrap the read sections of all four callbacks (ExternalTablesInitializer, InputBlocksReaderCallback, ClusterFunctionReadTaskCallback, MergeTreeReadTaskCallback) in try/catch that sets state.stop_query = true before rethrowing, matching the existing pattern in receivePacketsExpectCancel.

Closes #99258
cc @alexey-milovidov

Changelog category (leave one):

  • CI Fix or Improvement (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

...

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Note

Medium Risk
Touches TCP query execution callbacks and cancellation signaling; changes are localized but affect concurrency/error paths in the server’s native protocol handling.

Overview
Prevents a race in TCPHandler where concurrent pipeline callbacks could attempt to read from an already-canceled input buffer by wrapping the read portions of several callbacks in try/catch and setting query_state->stop_query = true before rethrowing.

Adds a stateless regression test (04035_url_cluster_callback_cancel_race) that reproduces the failure via url() + parallel replicas settings, and tightens CI style checks to emit a clearer error when shell tests append ? to CLICKHOUSE_URL (which already includes it).

Written by Cursor Bugbot for commit 8a1f5e8. This will update automatically on new commits. Configure here.

Followup to ClickHouse#98955. The previous fix only covered `receivePacketsExpectCancel`,
but the same race exists between any two callbacks that read from the TCP
`ReadBuffer` `*in`. When one callback's read fails (canceling the buffer),
the exception unwinds and releases `callback_mutex`. Another callback waiting
on the mutex can then acquire it and attempt to read from the now-canceled
buffer before `checkIfQueryCanceled` sees the cancellation, triggering:
"Logical error: 'ReadBuffer is canceled. Can't read from it.'"

Fix: wrap the read sections of all four callbacks (`ExternalTablesInitializer`,
`InputBlocksReaderCallback`, `ClusterFunctionReadTaskCallback`,
`MergeTreeReadTaskCallback`) in try/catch that sets `state.stop_query = true`
before rethrowing, matching the existing pattern in `receivePacketsExpectCancel`.

Closes ClickHouse#99258

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 11, 2026

Workflow [PR], commit [8a1f5e8]

Summary:

job_name test_name status info comment
Stateless tests (amd_tsan, parallel, 2/2) failure
02515_fix_any_parsing FAIL cidb IGNORED
Logical error: 'assertHasValidVersionMetadata()' (STID: 2508-41de) FAIL cidb IGNORED

AI Review

Summary

This PR fixes a real cancellation race in TCPHandler callback read paths by setting stop_query before rethrowing on callback failures, and adds a targeted stateless regression test for the canceled-ReadBuffer Logical error path. The changes are localized, consistent with the existing receivePacketsExpectCancel safety pattern, and the updated test assertion now correctly fails when Logical error appears.

ClickHouse Rules

Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsHistory.cpp
Safe rollout
Compilation time

Final Verdict

  • Status: ✅ Approve

@clickhouse-gh clickhouse-gh bot added the pr-ci label Mar 11, 2026
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

CLICKHOUSE_URL already includes '?', so use '&' to append query
parameters. Also add a descriptive error message to the style check
that catches this mistake.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@alexey-milovidov alexey-milovidov self-assigned this Mar 13, 2026
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 13, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 83.80% 83.80% +0.00%
Functions 23.90% 23.90% +0.00%
Branches 76.30% 76.30% +0.00%

PR changed lines: PR changed-lines coverage: 90.91% (60/66, 0 noise lines excluded)
Diff coverage report
Uncovered code

@Algunenano Algunenano added this pull request to the merge queue Mar 13, 2026
Merged via the queue into ClickHouse:master with commit ee7bd41 Mar 13, 2026
162 of 164 checks passed
@Algunenano Algunenano deleted the fix-tcp-handler-callback-cancel-race branch March 13, 2026 09:58
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-ci pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Logical error: 'ReadBuffer is canceled. Can't read from it.' (STID: 2508-2b71)

3 participants