Skip to content

Fix ReadBuffer canceled assertion in TCPHandler idle loop#100536

Closed
nerve-bot wants to merge 1 commit intoClickHouse:masterfrom
nerve-bot:fix-readbuffer-canceled-exception-swallowed
Closed

Fix ReadBuffer canceled assertion in TCPHandler idle loop#100536
nerve-bot wants to merge 1 commit intoClickHouse:masterfrom
nerve-bot:fix-readbuffer-canceled-exception-swallowed

Conversation

@nerve-bot
Copy link
Copy Markdown
Contributor

@nerve-bot nerve-bot commented Mar 24, 2026

When a pipeline callback (e.g. ClusterFunctionReadTaskCallback via urlCluster) reads
from the TCP ReadBuffer and the read fails, ReadBuffer::next() catches the exception
and calls cancel(), setting the canceled flag permanently. If the exception is handled
but the connection is not closed, the TCPHandler::runImpl() idle loop calls in->eof()
on the canceled buffer. In debug/sanitizer builds this triggers
chassert(!isCanceled()) inside ReadBuffer::next(), aborting the server.

Observed as recurring stress test crashes on master (amd_tsan, amd_msan, amd_debug):

  • STID 2508-2913: TCPHandler::runImpl() at idle loop in->eof()
  • STID 2508-25af: same crash, debug build variant

Changes:

  • Add ReadBuffer::eofOrCanceled() — checks isCanceled() before eof(), so the
    assertion is never reached on a canceled buffer.
  • TCPHandler::runImpl() idle loop now calls in->eofOrCanceled() instead of in->eof().
  • Add explicit rethrowExceptionIfHas() after cancel() in
    PullingAsyncPipelineExecutor::pull() as a defensive measure.
  • Add regression test (gtest_read_buffer_cancel): calls eofOrCanceled() on a canceled
    buffer — crashes without the fix, passes with it.

Previous fix attempts (#98955, #100400) addressed related race conditions but did not
cover this idle-loop path.

Closes #100581

Changelog category (leave one):

  • CI Fix or Improvement (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

...

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 24, 2026

CLA assistant check
All committers have signed the CLA.

@nerve-bot nerve-bot force-pushed the fix-readbuffer-canceled-exception-swallowed branch from 2aa7f64 to 494f746 Compare March 24, 2026 04:10
@pufit pufit added the can be tested Allows running workflows for external contributors label Mar 24, 2026
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 24, 2026

Workflow [PR], commit [95e1ecd]

Summary:

job_name test_name status info comment
Stateless tests (amd_asan_ubsan, distributed plan, parallel, 2/2) failure
02931_rewrite_sum_column_and_constant FAIL cidb
AST fuzzer (arm_asan_ubsan) failure
AddressSanitizer: stack-use-after-scope (STID: 2136-3203) FAIL cidb, issue
AST fuzzer (amd_tsan) failure
ThreadSanitizer (STID: 1290-1eae) FAIL cidb

AI Review

Summary

This PR fixes an exception path where TCPHandler could call eof on a canceled ReadBuffer, triggering debug/sanitizer assertion failures; it also makes exception propagation in PullingAsyncPipelineExecutor::pull explicit in the finished path and adds regression coverage in gtest_read_buffer_cancel. I did not find correctness, safety, compatibility, or rollout blockers in the current HEAD diff.

ClickHouse Rules

Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time

Final Verdict

  • Status: ✅ Approve

@clickhouse-gh clickhouse-gh bot added the pr-bugfix Pull request with bugfix, not backported by default label Mar 24, 2026
#include <string>
#include <IO/ReadBuffer.h>
#include <IO/ReadBufferFromString.h>
#include <IO/ReadHelpers.h>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused include: #include <IO/ReadHelpers.h> does not appear to be used in this test file.

Please remove it to keep dependencies minimal and avoid unnecessary transitive includes in test targets.

TCPHandler::runImpl() idle loop called in->eof() on a ReadBuffer that
had been canceled by a previous query's pipeline callback. In debug and
sanitizer builds, ReadBuffer::next() has chassert(!isCanceled()) which
aborts the server process. This was observed as recurring "Logical error:
ReadBuffer is canceled" crashes in stress tests (amd_tsan, amd_msan,
amd_debug) — STID 2508-2913, 2508-25af.

Add ReadBuffer::eofOrCanceled() which checks isCanceled() before eof(),
preventing the assertion. TCPHandler idle loop now uses eofOrCanceled().

Also add explicit rethrowExceptionIfHas() after cancel() in
PullingAsyncPipelineExecutor::pull() as a defensive measure for the
is_execution_finished path.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@nerve-bot nerve-bot force-pushed the fix-readbuffer-canceled-exception-swallowed branch from 494f746 to 95e1ecd Compare March 24, 2026 04:30
@clickhouse-gh clickhouse-gh bot added pr-ci and removed pr-bugfix Pull request with bugfix, not backported by default labels Mar 24, 2026
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 24, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 83.90% 83.90% +0.00%
Functions 24.40% 24.20% -0.20%
Branches 76.50% 76.40% -0.10%

PR changed lines: PR changed-lines coverage: 100.00% (43/43, 0 noise lines excluded)
Diff coverage report
Uncovered code

@pufit
Copy link
Copy Markdown
Member

pufit commented Mar 29, 2026

#100666

@pufit pufit closed this Mar 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Logical error: 'ReadBuffer is canceled. Can't read from it.' (STID: 2508-2913)

4 participants