Disable send_logs_level for INSERT into Distributed to avoid possible hung#35075
Conversation
|
@Mergifyio update |
✅ Branch has been successfully updated |
|
There are failed checks. |
f3e0550 to
779e7c8
Compare
|
@Mergifyio update |
|
@azat Checks are failed. @Mergifyio update |
|
779e7c8 to
98e1e45
Compare
|
@azat bugfix check is saying that the test passed on prev build: https://s3.amazonaws.com/clickhouse-test-reports/35075/98e1e45f860e96edd3bf002c47fa827bd8a61c69/stateless_tests_bugfix_validate_check.html |
98e1e45 to
81c3be8
Compare
Yep, the problem was in the test, fixed. |
Now the test w/o this patch does fails:
However, the fix is not complete: |
|
@azat let's continue. |
5f7eb94 to
4a064e5
Compare
|
So, I've thought about this, and there is no way to consume Log packets properly in case of INSERT into Distributed, so I just disable send_logs_level for those queries. Also note, that this does not differs to the user, in what ClickHouse did before, since before it simply does not consume those packets, so the client does not saw those messages anyway. |
And another issue due to 10 seconds was not enough for the sever to stop: |
b2751f9 to
c00b66d
Compare
c00b66d to
100410c
Compare
Timeout - OK |
OK, but let's add a no-bc tag, to avoid hung queries in stress tests for backward compatibility check |
… hung
In case of INSERT into Distributed table with send_logs_level!=none it
is possible to receive tons of Log packets, and w/o consuming it
properly the socket buffer will be full, and eventually the query will
hung.
This happens because receiver will not read data until it will send logs
packets, but sender does not reads those Log packets and so receiver
hung, and hence the sender will hung too, because receiver do not
consume Data packets anymore.
In the initial version of this patch I tried to properly consume Log
packets, but it is not possible to ensure that before writing Data
blocks all Log packets had been consumed, that said that with current
protocol implementation it is not possible to fix Log packets consuming
properly, to avoid deadlock, so send_logs_level had been simply
disabled.
But note, that this does not differs to the user, in what ClickHouse did
before, since before it simply does not consume those packets, so the
client does not saw those messages anyway.
<details>
The receiver:
Poco::Net::SocketImpl::poll(Poco::Timespan const&, int)
Poco::Net::SocketImpl::sendBytes(void const*, int, int)
Poco::Net::StreamSocketImpl::sendBytes(void const*, int, int)
DB::WriteBufferFromPocoSocket::nextImpl()
DB::TCPHandler::sendLogData(DB::Block const&)
DB::TCPHandler::sendLogs()
DB::TCPHandler::readDataNext()
DB::TCPHandler::processInsertQuery()
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
ESTAB 4331792 211637 127.0.0.1:9000 127.0.0.1:24446 users:(("clickhouse-serv",pid=46874,fd=3850))
The sender:
Poco::Net::SocketImpl::poll(Poco::Timespan const&, int)
Poco::Net::SocketImpl::sendBytes(void const*, int, int)
Poco::Net::StreamSocketImpl::sendBytes(void const*, int, int)
DB::WriteBufferFromPocoSocket::nextImpl()
DB::WriteBuffer::write(char const*, unsigned long)
DB::CompressedWriteBuffer::nextImpl()
DB::WriteBuffer::write(char const*, unsigned long)
DB::SerializationString::serializeBinaryBulk(DB::IColumn const&, DB::WriteBuffer&, unsigned long, unsigned long) const
DB::NativeWriter::write(DB::Block const&)
DB::Connection::sendData(DB::Block const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool)
DB::RemoteInserter::write(DB::Block)
DB::RemoteSink::consume(DB::Chunk)
DB::SinkToStorage::onConsume(DB::Chunk)
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
ESTAB 67883 3008240 127.0.0.1:24446 127.0.0.1:9000 users:(("clickhouse-serv",pid=41610,fd=25))
</details>
v2: rebase to use clickhouse_client_timeout and add clickhouse_test_wait_queries
v3: use KILL QUERY
v4: adjust the test
v5: disable send_logs_level for INSERT into Distributed
v6: add no-backward-compatibility-check tag
Signed-off-by: Azat Khuzhin <[email protected]>
100410c to
7210be1
Compare
|
Right now RemoteInserter does not read ProfileEvents for INSERT, it handles them only after sending the query or on finish. But ClickHouse#37391 sends them for each INSERT block, but sometimes they can be no ProfileEvents packet, since it sends only non-empty blocks. And this adds too much complexity, and anyway ProfileEvents are useless for the server, so let's send them only if the query is initial (i.e. send by user). Note, that it is okay to change the logic of sending ProfileEvents w/o changing DBMS_TCP_PROTOCOL_VERSION, because there were no public releases with the original patch included yet. Fixes: ClickHouse#37391 Refs: ClickHouse#35075 Signed-off-by: Azat Khuzhin <[email protected]>
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Disable send_logs_level for INSERT into Distributed to avoid possible hung
In case of INSERT into Distributed table with send_logs_level!=none it
is possible to receive tons of Log packets, and w/o consuming it
properly the socket buffer will be full, and eventually the query will
hung.
This happens because receiver will not read data until it will send logs
packets, but sender does not reads those Log packets and so receiver
hung, and hence the sender will hung too, because receiver do not
consume Data packets anymore.
In the initial version of this patch I tried to properly consume Log
packets, but it is not possible to ensure that before writing Data
blocks all Log packets had been consumed, that said that with current
protocol implementation it is not possible to fix Log packets consuming
properly, to avoid deadlock, so send_logs_level had been simply
disabled.
But note, that this does not differs to the user, in what ClickHouse did
before, since before it simply does not consume those packets, so the
client does not saw those messages anyway.
Details
The receiver:
The sender: