Skip to content

Fix abnormal server termination when http client goes away#20464

Merged
alesapin merged 2 commits intoClickHouse:masterfrom
azat:abnormal-server-termination-fix
Feb 15, 2021
Merged

Fix abnormal server termination when http client goes away#20464
alesapin merged 2 commits intoClickHouse:masterfrom
azat:abnormal-server-termination-fix

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented Feb 13, 2021

Changelog category (leave one):

  • Bug Fix

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fix abnormal server termination when http client goes away

Fixes: #19451 (please add the no-backport label, since it has not bee included into any release yet)

For details, see commit descriptions

@robot-clickhouse robot-clickhouse added the pr-bugfix Pull request with bugfix, not backported by default label Feb 13, 2021
azat added 2 commits February 13, 2021 13:22
In [1] stress tests found:

    2021.02.12 14:20:58.800988 [ 17728 ] {de3e7894-b401-4f7d-8530-90cd5ab06682} <Debug> executeQuery: (from [::1]:45792, using production parser) (comment: /usr/share/clickhouse-test/queries/0_stateless/01520_client_print_query_id.expect) SELECT * FROM numbers(34599)
    2021.02.12 14:20:58.916484 [ 17728 ] {de3e7894-b401-4f7d-8530-90cd5ab06682} <Trace> ContextAccess (default): Access granted: CREATE TEMPORARY TABLE ON *.*
    2021.02.12 14:20:59.071980 [ 17728 ] {de3e7894-b401-4f7d-8530-90cd5ab06682} <Trace> InterpreterSelectQuery: FetchColumns -> Complete
    2021.02.12 14:21:10.708202 [ 17728 ] {de3e7894-b401-4f7d-8530-90cd5ab06682} <Information> executeQuery: Read 34599 rows, 270.30 KiB in 11.876294055 sec., 2913 rows/sec., 22.76 KiB/sec.
    2021.02.12 14:22:10.506261 [ 17728 ] {de3e7894-b401-4f7d-8530-90cd5ab06682} <Debug> DynamicQueryHandler: Done processing query
    2021.02.12 14:22:18.238037 [ 375 ] {} <Fatal> BaseDaemon: (version 21.3.1.5996, build id: 8DBCED54529C989F7AD4D991F51410774D55DE6C) (from thread 17728) Terminate called for uncaught exception:
    Code: 24, e.displayText() = DB::Exception: Cannot write to ostream at offset 262994, Stack trace (when copying this message, always include the lines below):

    0. ./obj-x86_64-linux-gnu/../contrib/libcxx/include/exception:0: Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0x15c976cb in /usr/bin/clickhouse
    1. ./obj-x86_64-linux-gnu/../src/Common/Exception.cpp:56: DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0x8c9320e in /usr/bin/clickhouse
    2. ./obj-x86_64-linux-gnu/../src/IO/WriteBufferFromOStream.cpp:0: DB::WriteBufferFromOStream::nextImpl() @ 0x8d54da5 in /usr/bin/clickhouse
    3. ./obj-x86_64-linux-gnu/../src/IO/BufferBase.h:39: DB::WriteBufferFromOStream::~WriteBufferFromOStream() @ 0x8d551d7 in /usr/bin/clickhouse
    4. ./obj-x86_64-linux-gnu/../src/IO/WriteBufferFromOStream.cpp:44: DB::Write

    2021.02.12 14:22:18.811071 [ 18134 ] {} <Fatal> BaseDaemon: ########################################
    2021.02.12 14:22:18.878935 [ 18134 ] {} <Fatal> BaseDaemon: (version 21.3.1.5996, build id: 8DBCED54529C989F7AD4D991F51410774D55DE6C) (from thread 17728) (query_id: de3e7894-b401-4f7d-8530-90cd5ab06682) Received signal Aborted (6)
    2021.02.12 14:22:18.943148 [ 18134 ] {} <Fatal> BaseDaemon:
    2021.02.12 14:22:19.007073 [ 18134 ] {} <Fatal> BaseDaemon: Stack trace: 0x7f109932018b 0x7f10992ff859 0x8bb33ae 0x8e301dd 0x17dac8c4 0x17dac7c7 0x8c3fe0b 0x8d552c5 0x8d552ea 0x11a29914 0x11a2a2ca 0x12f96092 0x12f8c65e 0x12f84300 0x15b84110 0x15bc0913 0x15bc103f 0x15d29a12 0x15d27fb0 0x15d267b8 0x8badbad 0x7f10994d5609 0x7f10993fc293
    2021.02.12 14:22:19.255998 [ 18134 ] {} <Fatal> BaseDaemon: 5. raise @ 0x4618b in /usr/lib/x86_64-linux-gnu/libc-2.31.so
    2021.02.12 14:22:19.270203 [ 18134 ] {} <Fatal> BaseDaemon: 6. abort @ 0x25859 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
    2021.02.12 14:22:50.108918 [ 370 ] {} <Fatal> Application: Child process was terminated by signal 6.

  [1]: https://clickhouse-test-reports.s3.yandex.net/19580/6aecb62416ece880cbb8ee3a803e14d841388dde/stress_test_(thread).html#fail1

Verified locally by commenting out->next() call in
WriteBufferFromHTTPServerResponse::nextImpl(), adding a sleep(1) and
canceling HTTP request before it finished, the stacktrace as follow:

    [ 6351 ] {} <Fatal> BaseDaemon: (version 21.3.1.1, build id: 9B40466BF3D2F5AED78A52A995A4A2FD3116787C) (from thread 6677) Terminate called for uncaught exception:
    Code: 24, e.displayText() = DB::Exception: Cannot write to ostream at offset 4, Stack trace (when copying this message, always include the lines below):

    0. /src/ch/clickhouse/.cmake/../src/Common/StackTrace.cpp:298: StackTrace::tryCapture() @ 0x30a52a in /src/ch/clickhouse/.cmake/src/libclickhouse_common_iod.so
    1. /src/ch/clickhouse/.cmake/../src/Common/StackTrace.cpp:260: StackTrace::StackTrace() @ 0x30a4e5 in /src/ch/clickhouse/.cmake/src/libclickhouse_common_iod.so
    2. /src/ch/clickhouse/.cmake/../src/Common/Exception.cpp:53: DB::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, bool) @ 0x2a61ae in /src/ch/clickhouse/.cmake/src/libclickhouse_common_iod.so
    3. /src/ch/clickhouse/.cmake/../src/IO/WriteBufferFromOStream.cpp:22: DB::WriteBufferFromOStream::nextImpl() @ 0x3b468a in /src/ch/clickhouse/.cmake/src/libclickhouse_common_iod.so
    4. /src/ch/clickhouse/.cmake/../src/IO/WriteBuffer.h:47

    <snip>
    [ 8966 ] {} <Fatal> BaseDaemon: 7. __cxxabiv1::__terminate(void (*)()) @ 0x1784ca in /src/ch/clickhouse/.cmake/contrib/replxx-cmake/libreplxxd.so
    <snip>
    [ 8966 ] {} <Fatal> BaseDaemon: 10. /src/ch/clickhouse/.cmake/../src/IO/WriteBufferFromOStream.cpp:0: DB::WriteBufferFromOStream::~WriteBufferFromOStream() @ 0x3b48c1 in /src/ch/clickhouse/.cmake/src/libclickhouse_common_iod.so
    [ 8966 ] {} <Fatal> BaseDaemon: 11. /src/ch/clickhouse/.cmake/../src/IO/WriteBufferFromOStream.cpp:44: DB::WriteBufferFromOStream::~WriteBufferFromOStream() @ 0x3b48ec in /src/ch/clickhouse/.cmake/src/libclickhouse_common_iod.so
    <snip>
    [ 8966 ] {} <Fatal> BaseDaemon: 14. /src/ch/clickhouse/.cmake/../src/IO/WriteBufferFromHTTPServerResponse.cpp:218: DB::WriteBufferFromHTTPServerResponse::~WriteBufferFromHTTPServerResponse() @ 0x3b33cd in /src/ch/clickhouse/.cmake/src/libclickhouse_common_iod.so
    <snip>
    [ 8966 ] {} <Fatal> BaseDaemon: 22. /src/ch/clickhouse/.cmake/../src/Server/HTTPHandler.h:43: DB::HTTPHandler::Output::~Output() @ 0x260421 in /src/ch/clickhouse/.cmake/src/libclickhouse_serverd.so
    [ 8966 ] {} <Fatal> BaseDaemon: 23. /src/ch/clickhouse/.cmake/../src/Server/HTTPHandler.cpp:778: DB::HTTPHandler::handleRequest(Poco::Net::HTTPServerRequest&, Poco::Net::HTTPServerResponse&) @ 0x253fd4 in /src/ch/clickhouse/.cmake/src/libclickhouse_serverd.so
@azat azat force-pushed the abnormal-server-termination-fix branch from cd6ba0a to 33f54cd Compare February 13, 2021 10:23
@alesapin alesapin self-assigned this Feb 15, 2021
@alesapin alesapin merged commit 9557bca into ClickHouse:master Feb 15, 2021
@azat azat deleted the abnormal-server-termination-fix branch February 15, 2021 18:04
azat added a commit to azat/ClickHouse that referenced this pull request Feb 19, 2021
Even after ClickHouse#20464 it was still possible, for example [1].

    2021.02.19 11:40:21.886191 [ 68373 ] {} <Trace> DynamicQueryHandler: Request URI: /?database=test_ds2d6y&log_comment=/usr/share/clickhouse-test/queries/0_stateless/01302_aggregate_state_exception_memory_leak.sh&enable_http_compression=1&http_zlib_compression_level=1

    <snip>

    2021.02.19 11:41:35.289940 [ 365 ] {} <Fatal> BaseDaemon: (version 21.3.1.6058, build id: 8D46D65205E2C8B7FE408A0B4EC76CA0483F9E92) (from thread 68373) Terminate called for uncaught exception:
    Code: 24, e.displayText() = DB::Exception: Cannot write to ostream at offset 262568, Stack trace (when copying this message, always include the lines below):

    0. ./obj-x86_64-linux-gnu/../contrib/libcxx/include/exception:0: Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0x15b3c7db in /usr/bin/clickhouse
    1. ./obj-x86_64-linux-gnu/../src/Common/Exception.cpp:56: DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0x8aba66e in /usr/bin/clickhouse
    2. ./obj-x86_64-linux-gnu/../src/IO/WriteBufferFromOStream.cpp:0: DB::WriteBufferFromOStream::nextImpl() @ 0x8b8c105 in /usr/bin/clickhouse
    3. ./obj-x86_64-linux-gnu/../src/IO/BufferBase.h:39: DB::WriteBufferFromOStream::~WriteBufferFromOStream() @ 0x8b8c537 in /usr/bin/clickhouse
    4. ./obj-x86_64-linux-gnu/../src/IO/WriteBufferFromOStream.cpp:44: DB::Write

  [1]: https://clickhouse-test-reports.s3.yandex.net/16481/5d150cce4778dd14f58dcff67435bdec1efa155b/stress_test_(thread).html#fail1

And according to this partial stacktrace it seems that the dtor of
WriteBufferFromOStream was called from
WriteBufferFromHTTPServerResponse, since the class name starts from
DB::Write*

The problem is that if first time WriteBufferFromOStream::next() fails,
it will reset position to make next write no-op, however
WriteBufferFromHTTPServerResponse::next() will set position to available
buffer back, and next() will throw again, but this time it can be from
dtor.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix Pull request with bugfix, not backported by default pr-no-backport

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants