Implementation of HedgedRequests by Avogar · Pull Request #19291 · ClickHouse/ClickHouse

Avogar · 2021-01-19T19:42:27Z

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Hedged Requests for remote queries. When setting use_hedged_requests enabled (by default), allow to establish many connections with different replicas for query. New connection is enabled in case existent connection(s) with replica(s) were not established within hedged_connection_timeout or no data was received within receive_data_timeout. Query uses the first connection which send non empty progress packet (or data packet, if allow_changing_replica_until_first_data_packet); other connections are cancelled. Queries with max_parallel_replicas > 1 are supported.

Detailed description / Documentation draft:
Using Hedged Requests allows to reduce tail latency on distributed queries. It presents new timeouts:
hedged_connection_timeout (Milliseconds) - if we can't establish connection with replica after this timeout, we start working with the next replica without cancelling connection to the previous.
receive_data_timeout (Seconds) - this timeout is set when the query is sent to the replica, if we don't receive first packet of data and we don't make any progress in query execution after this timeout, we start working with the next replica, without cancelling connection to the previous.
Working with multiple replicas is performed by using epoll. Hedged Requests also support parallel distributed query execution (see setting max_parallel_replicas).
Also there is a special setting allow_changing_replica_until_first_data_packet (0 as default). It it's enabled we can start new connection until receiving first data packet even if we have already made some progress (but progress haven't updated for receive_data_timeout timeout), otherwise we disable changing replica after the first time we made progress.

This behaviour is worked under special setting use_hedged_requests (1 as default).

qoega · 2021-01-20T20:11:10Z

src/Server/TCPHandler.cpp


 void TCPHandler::sendData(const Block & block)
 {
+    /// For testing hedged requests


Probably we can just run a query like
SELECT sleep(x) and have different x stored in table on different nodes of distributed table? We even can disable replication and insert different rows in replicated table

…_replicas

…-requests

…edged-requests

src/IO/ReadBufferFromPocoSocket.cpp

src/Client/PacketReceiver.h

KochetovNicolai · 2021-02-24T16:08:18Z

src/Client/PacketReceiver.h

+                while (true)
+                {
+                    {
+                        AsyncCallbackSetter async_setter(receiver.connection, ReadCallback{receiver, sink});


AsyncCallbackSetter is a lightweight structure, cheap to set per packet.
But do we need to reset it every time? Why not to store it as Routine field...

I just thought that it's not good to remain this ReadCallback in connection after received packet, because (theoretically) we can use this connection for something else, but maybe there was bad logic. What do you think? Should I move it to Routine field?

Hm, maybe it is true. In case if we store it in Routine, this Routine can be destroyed later then connection is reused by something else.

Let's keep it in callback then

src/Client/HedgedConnectionsFactory.h

src/Client/HedgedConnectionsFactory.cpp

src/Client/HedgedConnectionsFactory.h

KochetovNicolai · 2021-02-24T20:00:03Z

Everything else LGTM in general. (I did not check tests properly so far).

…-requests

src/Client/HedgedConnections.cpp

tests/integration/test_hedged_requests/test.py

…-requests

azat · 2021-03-20T15:57:23Z

src/Core/Settings.h

    M(Seconds, receive_timeout, DBMS_DEFAULT_RECEIVE_TIMEOUT_SEC, "", 0) \
    M(Seconds, send_timeout, DBMS_DEFAULT_SEND_TIMEOUT_SEC, "", 0) \
    M(Seconds, tcp_keep_alive_timeout, 0, "The time in seconds the connection needs to remain idle before TCP starts sending keepalive probes", 0) \
+    M(Milliseconds, hedged_connection_timeout, DBMS_DEFAULT_HEDGED_CONNECTION_TIMEOUT_MS, "Connection timeout for establishing connection with replica for Hedged requests", 0) \


Should it have _ms suffix like other settings?

azat · 2021-03-20T15:57:53Z

src/Core/Settings.h

+    M(Int64, sleep_in_send_tables_status, 0, "Time to sleep in sending tables status response in TCPHandler", 0) \
+    M(Int64, sleep_in_send_data, 0, "Time to sleep in sending data in TCPHandler", 0) \


seconds granularity is too high, how about converting to ms?

azat · 2021-03-20T15:58:25Z

tests/integration/test_hedged_requests_parallel/test.py

+
+    for name in NODES:
+        if name != 'node':
+            NODES[name] = cluster.add_instance(name, with_zookeeper=True, user_configs=['configs/users1.xml'])


Looks like the test does not really requires zookeeper and ReplicatedMergeTree since it uses remote_servers, right?

Yes, you are right, it would be good to remove zookeeper and ReplicatedMergeTree from this test, I will do it, thanks.

alexey-milovidov · 2021-03-21T13:05:45Z

@azat See #21886

azat · 2021-03-21T17:11:22Z

@azat See #21886

Thanks, I've missed it.
It does convert hedged_connection_timeout_ms/receive_data_timeout_ms, but does not sleep_in_send_data/sleep_in_send_tables_status

Avogar · 2021-03-22T10:16:40Z

@azat sleep_in_send_data/sleep_in_send_tables_status settings are needed only for testing and I had no reason to change them to ms. But if you really need this settings to be in ms for some testing reasons, I will change it.

azat · 2021-03-22T10:26:42Z

@azat sleep_in_send_data/sleep_in_send_tables_status settings are needed only for testing and I had no reason to change them to ms. But if you really need this settings to be in ms for some testing reasons, I will change it.

@Avogar Indeed, I was going to use them for stress testing distributed queries - #21944, it will be great to if you will convert them to ms

Avogar · 2021-03-22T10:28:38Z

@azat Ok, will convert it.

Implement HedgedRequests

97b5179

robot-clickhouse added the pr-improvement Pull request with some product improvements label Jan 19, 2021

Avogar marked this pull request as draft January 19, 2021 19:42

Avogar marked this pull request as ready for review January 19, 2021 19:43

Avogar marked this pull request as draft January 19, 2021 19:44

alexey-milovidov mentioned this pull request Jan 19, 2021

Roadmap 2021 (discussion) #17623

Closed

Pavel Kruglov added 2 commits January 20, 2021 02:15

Fix build

507695c

Fix build 2

2aa29e1

qoega reviewed Jan 20, 2021

View reviewed changes

KochetovNicolai self-assigned this Jan 22, 2021

Pavel Kruglov and others added 10 commits January 27, 2021 12:33

Work with any number of replicas simultaneously, support max_parallel…

b3b832c

…_replicas

Fix build, style, tests

01a0cb6

Fix style error

6029597

Merge branch 'master' of github.com:ClickHouse/ClickHouse into hedged…

25e85d7

…-requests

Add LOG_DEBUG for tests debug

d27f511

Fix

7d9eb96

Fix style

f5ad128

Fix synchronization

5b16a54

Restart tests

3e3ee19

Fix build, add comments, update tests

60a92e9

Avogar marked this pull request as ready for review February 2, 2021 12:15

Pavel Kruglov and others added 8 commits February 2, 2021 15:17

Remove LOG_DEBUG

02cc435

Update test

cc14cb1

Fix gcc-9 build

0b4a9ed

Update test

dd9af19

Merge branch 'hedged-requests' of github.com:Avogar/ClickHouse into h…

402308c

…edged-requests

Remove code duplication

2c928f1

Fix style

b8ae9ca

Reset changes in tryGetEntry

3fc8b29

KochetovNicolai reviewed Feb 3, 2021

View reviewed changes

src/IO/ReadBufferFromPocoSocket.cpp Outdated Show resolved Hide resolved