Skip to content

Fix race condition causing 'ReadBuffer is canceled' exception#98955

Merged
alexey-milovidov merged 1 commit intomasterfrom
fix-readbuffer-canceled-race
Mar 7, 2026
Merged

Fix race condition causing 'ReadBuffer is canceled' exception#98955
alexey-milovidov merged 1 commit intomasterfrom
fix-readbuffer-canceled-race

Conversation

@alexey-milovidov
Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov commented Mar 7, 2026

Summary

  • Fixed a race condition in TCPHandler where receivePacketsExpectCancel could cancel the TCP ReadBuffer and then release callback_mutex during exception stack unwinding, allowing a pipeline worker thread to acquire the mutex and attempt to read from the canceled buffer before the executor is canceled
  • The worker thread passes checkIfQueryCanceled because stop_query hasn't been set yet, then hits chassert(!isCanceled()) in ReadBuffer::next
  • Fix: set state.stop_query = true in receivePacketsExpectCancel's catch block before re-throwing, so callbacks see the cancellation immediately

CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=98895&sha=d7e87bb2b7ecb3490ad63a12113fbc92fc8e488e&name_0=PR&name_1=BuzzHouse%20%28amd_msan%29

Closes #98866

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix a race condition that could cause a "ReadBuffer is canceled" exception in queries using urlCluster or similar cluster table functions.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

🤖 Generated with Claude Code


Note

Medium Risk
Touches TCP query-cancellation/error paths and changes how cancellation is signaled across threads, which could affect query termination behavior under network failures.

Overview
Fixes a race in TCPHandler::receivePacketsExpectCancel() by wrapping cancel-packet polling/reading in a try/catch and setting state.stop_query = true before rethrowing on any read/parsing error or disconnect.

This ensures other callback-driven worker threads see cancellation via checkIfQueryCanceled() immediately after the connection breaks, avoiding reads from a canceled ReadBuffer that previously could trigger "ReadBuffer is canceled" exceptions.

Written by Cursor Bugbot for commit e858e33. This will update automatically on new commits. Configure here.

…ter function callbacks

When `receivePacketsExpectCancel` encounters an error reading from the
client TCP connection, it throws an exception while holding `callback_mutex`.
During stack unwinding, the mutex is released, and a pipeline worker thread
waiting on the mutex can acquire it and attempt to read from the now-canceled
`ReadBuffer` via `ClusterFunctionReadTaskCallback` (or similar callbacks).
The worker thread passes `checkIfQueryCanceled` because `stop_query` hasn't
been set yet (the main thread hasn't reached `executor.cancel()` yet).
The worker then hits `chassert(!isCanceled())` in `ReadBuffer::next`.

Fix: set `state.stop_query = true` in the catch block of
`receivePacketsExpectCancel` before re-throwing, so that callbacks waiting
on the mutex will see the cancellation via `checkIfQueryCanceled` immediately
after acquiring it.

Closes #98866
https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=98895&sha=d7e87bb2b7ecb3490ad63a12113fbc92fc8e488e&name_0=PR&name_1=BuzzHouse%20%28amd_msan%29

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 7, 2026

Workflow [PR], commit [e858e33]

Summary:

@clickhouse-gh clickhouse-gh bot added the pr-bugfix Pull request with bugfix, not backported by default label Mar 7, 2026
Copy link
Copy Markdown
Member Author

@alexey-milovidov alexey-milovidov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

@alexey-milovidov alexey-milovidov self-assigned this Mar 7, 2026
@alexey-milovidov alexey-milovidov merged commit d51d508 into master Mar 7, 2026
149 of 150 checks passed
@alexey-milovidov alexey-milovidov deleted the fix-readbuffer-canceled-race branch March 7, 2026 08:09
@robot-clickhouse robot-clickhouse added the pr-synced-to-cloud The PR is synced to the cloud repo label Mar 7, 2026
Algunenano added a commit to Algunenano/ClickHouse that referenced this pull request Mar 11, 2026
Followup to ClickHouse#98955. The previous fix only covered `receivePacketsExpectCancel`,
but the same race exists between any two callbacks that read from the TCP
`ReadBuffer` `*in`. When one callback's read fails (canceling the buffer),
the exception unwinds and releases `callback_mutex`. Another callback waiting
on the mutex can then acquire it and attempt to read from the now-canceled
buffer before `checkIfQueryCanceled` sees the cancellation, triggering:
"Logical error: 'ReadBuffer is canceled. Can't read from it.'"

Fix: wrap the read sections of all four callbacks (`ExternalTablesInitializer`,
`InputBlocksReaderCallback`, `ClusterFunctionReadTaskCallback`,
`MergeTreeReadTaskCallback`) in try/catch that sets `state.stop_query = true`
before rethrowing, matching the existing pattern in `receivePacketsExpectCancel`.

Closes ClickHouse#99258

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Logical error: 'ReadBuffer is canceled. Can't read from it.' (STID: 2508-26e4)

2 participants