Fix ReadBuffer canceled assertion in TCPHandler idle loop#100536
Fix ReadBuffer canceled assertion in TCPHandler idle loop#100536nerve-bot wants to merge 1 commit intoClickHouse:masterfrom
Conversation
2aa7f64 to
494f746
Compare
|
Workflow [PR], commit [95e1ecd] Summary: ❌
AI ReviewSummaryThis PR fixes an exception path where ClickHouse Rules
Final Verdict
|
| #include <string> | ||
| #include <IO/ReadBuffer.h> | ||
| #include <IO/ReadBufferFromString.h> | ||
| #include <IO/ReadHelpers.h> |
There was a problem hiding this comment.
Unused include: #include <IO/ReadHelpers.h> does not appear to be used in this test file.
Please remove it to keep dependencies minimal and avoid unnecessary transitive includes in test targets.
TCPHandler::runImpl() idle loop called in->eof() on a ReadBuffer that had been canceled by a previous query's pipeline callback. In debug and sanitizer builds, ReadBuffer::next() has chassert(!isCanceled()) which aborts the server process. This was observed as recurring "Logical error: ReadBuffer is canceled" crashes in stress tests (amd_tsan, amd_msan, amd_debug) — STID 2508-2913, 2508-25af. Add ReadBuffer::eofOrCanceled() which checks isCanceled() before eof(), preventing the assertion. TCPHandler idle loop now uses eofOrCanceled(). Also add explicit rethrowExceptionIfHas() after cancel() in PullingAsyncPipelineExecutor::pull() as a defensive measure for the is_execution_finished path. Co-Authored-By: Claude Opus 4.6 <[email protected]>
494f746 to
95e1ecd
Compare
LLVM Coverage Report
PR changed lines: PR changed-lines coverage: 100.00% (43/43, 0 noise lines excluded) |
When a pipeline callback (e.g.
ClusterFunctionReadTaskCallbackviaurlCluster) readsfrom the TCP
ReadBufferand the read fails,ReadBuffer::next()catches the exceptionand calls
cancel(), setting thecanceledflag permanently. If the exception is handledbut the connection is not closed, the
TCPHandler::runImpl()idle loop callsin->eof()on the canceled buffer. In debug/sanitizer builds this triggers
chassert(!isCanceled())insideReadBuffer::next(), aborting the server.Observed as recurring stress test crashes on master (amd_tsan, amd_msan, amd_debug):
TCPHandler::runImpl()at idle loopin->eof()Changes:
ReadBuffer::eofOrCanceled()— checksisCanceled()beforeeof(), so theassertion is never reached on a canceled buffer.
TCPHandler::runImpl()idle loop now callsin->eofOrCanceled()instead ofin->eof().rethrowExceptionIfHas()aftercancel()inPullingAsyncPipelineExecutor::pull()as a defensive measure.gtest_read_buffer_cancel): callseofOrCanceled()on a canceledbuffer — crashes without the fix, passes with it.
Previous fix attempts (#98955, #100400) addressed related race conditions but did not
cover this idle-loop path.
Closes #100581
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
...
Documentation entry for user-facing changes