Skip to content

Fix uncaught exception when HTTP client goes away#20981

Merged
alexey-milovidov merged 3 commits intoClickHouse:masterfrom
azat:http-client-reset-uncaught-exception
Feb 20, 2021
Merged

Fix uncaught exception when HTTP client goes away#20981
alexey-milovidov merged 3 commits intoClickHouse:masterfrom
azat:http-client-reset-uncaught-exception

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented Feb 19, 2021

Changelog category (leave one):

  • Not for changelog.

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fix uncaught exception when HTTP client goes away

Detailed description / Documentation draft:
Even after #20464 it was still possible, for example 1.

2021.02.19 11:40:21.886191 [ 68373 ] {} <Trace> DynamicQueryHandler: Request URI: /?database=test_ds2d6y&log_comment=/usr/share/clickhouse-test/queries/0_stateless/01302_aggregate_state_exception_memory_leak.sh&enable_http_compression=1&http_zlib_compression_level=1

<snip>

2021.02.19 11:41:35.289940 [ 365 ] {} <Fatal> BaseDaemon: (version 21.3.1.6058, build id: 8D46D65205E2C8B7FE408A0B4EC76CA0483F9E92) (from thread 68373) Terminate called for uncaught exception:
Code: 24, e.displayText() = DB::Exception: Cannot write to ostream at offset 262568, Stack trace (when copying this message, always include the lines below):

0. ./obj-x86_64-linux-gnu/../contrib/libcxx/include/exception:0: Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0x15b3c7db in /usr/bin/clickhouse
1. ./obj-x86_64-linux-gnu/../src/Common/Exception.cpp:56: DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0x8aba66e in /usr/bin/clickhouse
2. ./obj-x86_64-linux-gnu/../src/IO/WriteBufferFromOStream.cpp:0: DB::WriteBufferFromOStream::nextImpl() @ 0x8b8c105 in /usr/bin/clickhouse
3. ./obj-x86_64-linux-gnu/../src/IO/BufferBase.h:39: DB::WriteBufferFromOStream::~WriteBufferFromOStream() @ 0x8b8c537 in /usr/bin/clickhouse
4. ./obj-x86_64-linux-gnu/../src/IO/WriteBufferFromOStream.cpp:44: DB::Write

And according to this partial stacktrace it seems that the dtor of
WriteBufferFromOStream was called from
WriteBufferFromHTTPServerResponse, since the class name starts from
DB::Write*

The problem is that if first time WriteBufferFromOStream::next() fails,
it will reset position to make next write no-op, however
WriteBufferFromHTTPServerResponse::next() will set position to available
buffer back, and next() will throw again, but this time it can be from
dtor.

Refs: #19451

Even after ClickHouse#20464 it was still possible, for example [1].

    2021.02.19 11:40:21.886191 [ 68373 ] {} <Trace> DynamicQueryHandler: Request URI: /?database=test_ds2d6y&log_comment=/usr/share/clickhouse-test/queries/0_stateless/01302_aggregate_state_exception_memory_leak.sh&enable_http_compression=1&http_zlib_compression_level=1

    <snip>

    2021.02.19 11:41:35.289940 [ 365 ] {} <Fatal> BaseDaemon: (version 21.3.1.6058, build id: 8D46D65205E2C8B7FE408A0B4EC76CA0483F9E92) (from thread 68373) Terminate called for uncaught exception:
    Code: 24, e.displayText() = DB::Exception: Cannot write to ostream at offset 262568, Stack trace (when copying this message, always include the lines below):

    0. ./obj-x86_64-linux-gnu/../contrib/libcxx/include/exception:0: Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0x15b3c7db in /usr/bin/clickhouse
    1. ./obj-x86_64-linux-gnu/../src/Common/Exception.cpp:56: DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0x8aba66e in /usr/bin/clickhouse
    2. ./obj-x86_64-linux-gnu/../src/IO/WriteBufferFromOStream.cpp:0: DB::WriteBufferFromOStream::nextImpl() @ 0x8b8c105 in /usr/bin/clickhouse
    3. ./obj-x86_64-linux-gnu/../src/IO/BufferBase.h:39: DB::WriteBufferFromOStream::~WriteBufferFromOStream() @ 0x8b8c537 in /usr/bin/clickhouse
    4. ./obj-x86_64-linux-gnu/../src/IO/WriteBufferFromOStream.cpp:44: DB::Write

  [1]: https://clickhouse-test-reports.s3.yandex.net/16481/5d150cce4778dd14f58dcff67435bdec1efa155b/stress_test_(thread).html#fail1

And according to this partial stacktrace it seems that the dtor of
WriteBufferFromOStream was called from
WriteBufferFromHTTPServerResponse, since the class name starts from
DB::Write*

The problem is that if first time WriteBufferFromOStream::next() fails,
it will reset position to make next write no-op, however
WriteBufferFromHTTPServerResponse::next() will set position to available
buffer back, and next() will throw again, but this time it can be from
dtor.
@robot-clickhouse robot-clickhouse added the pr-bugfix Pull request with bugfix, not backported by default label Feb 19, 2021
@alexey-milovidov alexey-milovidov self-assigned this Feb 19, 2021
azat added 2 commits February 20, 2021 10:15
…ing finalize())

Since I saw the following:

    0. DB::WriteBufferFromOStream::nextImpl()
    1. DB::WriteBufferFromHTTPServerResponse::nextImpl()
    2. DB::WriteBufferFromHTTPServerResponse::finalize()
    3. DB::WriteBufferFromHTTPServerResponse::~WriteBufferFromHTTPServerResponse()
    4. DB::StaticRequestHandler::handleRequest(Poco::Net::HTTPServerRequest&, Poco::Net::HTTPServerResponse&)
    5. Poco::Net::HTTPServerConnection::run()
    6. Poco::Net::TCPServerConnection::start()
}
catch (...)
{
out.finalize();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What did you try to achieve by it?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is hard to remember, but likely if the writeStringBinary was triggered flush, then it may throw, so we need to call finalize reset the pos so that next will be no-op (note, that cancel was not there in 2021).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants