Improve async reading from socket by Avogar · Pull Request #47229 · ClickHouse/ClickHouse

Avogar · 2023-03-03T19:37:45Z

Changelog category (leave one):

Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Add async connection to socket and async writing to socket. Make creating connections and sending query/external tables async across shards. Refactor code with fibers. Closes #46931. We will be able to increase connect_timeout_with_failover_ms by default after this PR (#5188)

CC: @KochetovNicolai, @azat

programs/copier/ClusterCopier.cpp

src/Core/SettingsChangesHistory.h

src/Processors/QueryPlan/ReadFromRemote.cpp

src/Client/Connection.h

src/Client/Connection.cpp

nickitat · 2023-04-03T19:54:42Z

src/QueryPipeline/RemoteQueryExecutorReadContext.cpp

-
-    receive_timeout_usec = timeout.totalMicroseconds();
-    connection_fd_description = fd_description;
+    epoll.add(connection_fd, events);


there is some intersection with the code in ConnectionEstablisher. maybe it could be moved into the base class

Yes, there are some intersection, but also lots of differences. I tried to come up with base class, but TBH it looked not better than now and I decided not to do it.

azat

Great, plus now the code has less atomics, and some abstractions!

Also maybe it worth to write some test? Introduce another setting like sleep_before_receiving_query_ms (now there is only sleep_after_receiving_query_ms), write huge enough query that is larger the socket buffer and see how long it will take without this patches it should take sleep_before_receiving_query_ms*shards while with only sleep_before_receiving_query_ms

programs/copier/ClusterCopier.cpp

src/Client/Connection.cpp

azat · 2023-04-12T12:12:17Z

src/QueryPipeline/RemoteQueryExecutorReadContext.cpp

        {
            checkTimeout(/* blocking= */ true);
-            to_destroy = std::move(to_destroy).resume();
+            resumeUnlocked();


Now this code works with existing fiber, while previously it had been detached first, though it should not be an issue I guess?

It's ok. We call cancelBefore under fiber_lock mutex in AsyncTaskExecutor.

Co-authored-by: Azat Khuzhin <[email protected]>

Co-authored-by: Nikita Taranov <[email protected]>

Avogar · 2023-04-19T11:57:46Z

Also maybe it worth to write some test?

Sure, I will try to do it.

Avogar · 2023-04-19T11:58:59Z

@nickitat sorry for long response to comments, I was distracted by other tasks. Can you continue reviewing please?

Avogar · 2023-04-20T16:53:25Z

Some small testing:
First, test that we send query async across shards. Here we can use the fact that we send temporary tables during query sending, so we can create big temporary table and check if we send it async across shards:
Temporary table:

avogar-dev :) create temporary table test (number UInt64, s String)

CREATE TEMPORARY TABLE test
(
    `number` UInt64,
    `s` String
)

Query id: 8edf267a-7897-4a60-89b3-16d47a607dcd

Ok.

0 rows in set. Elapsed: 0.002 sec.

avogar-dev :) insert into test select number, randomString(number % 1000) from numbers(1000000)

INSERT INTO test SELECT
    number,
    randomString(number % 1000)
FROM numbers(1000000)

Query id: 420a0b9e-9c67-45e8-a41d-2dee0f7c1748

Ok.

0 rows in set. Elapsed: 0.550 sec. Processed 1.05 million rows, 8.37 MB (1.90 million rows/s., 15.22 MB/s.)

First, with sleep_after_receiving_query_ms=0:

avogar-dev :) select * from remote('127.0.0.{1,2,3,4,5,6}', 'system.numbers') limit 10 format Null settings prefer_localhost_replica=0, async_query_sending_for_remote=0, max_threads=1

SELECT *
FROM remote('127.0.0.{1,2,3,4,5,6}', 'system.numbers')
LIMIT 10
FORMAT `Null`
SETTINGS prefer_localhost_replica = 0, async_query_sending_for_remote = 0, max_threads = 1

Query id: 021f8fed-14e7-4488-932a-a55aa6db33f0

Ok.

0 rows in set. Elapsed: 7.572 sec.

avogar-dev :) select * from remote('127.0.0.{1,2,3,4,5,6}', 'system.numbers') limit 10 format Null settings prefer_localhost_replica=0, async_query_sending_for_remote=1, max_threads=1

SELECT *
FROM remote('127.0.0.{1,2,3,4,5,6}', 'system.numbers')
LIMIT 10
FORMAT `Null`
SETTINGS prefer_localhost_replica = 0, async_query_sending_for_remote = 1, max_threads = 1

Query id: c1554b49-ef27-452c-baac-ffd212c391e9

Ok.

0 rows in set. Elapsed: 7.193 sec.

Already can see some improvement. Now with sleep_after_receiving_query_ms=5000 (so we will sleep before reading temporary table for 5 sec on each shard):

avogar-dev :) select * from remote('127.0.0.{1,2,3,4,5,6}', 'system.numbers') limit 10 format Null settings prefer_localhost_replica=0, async_query_sending_for_remote=0, max_threads=1

SELECT *
FROM remote('127.0.0.{1,2,3,4,5,6}', 'system.numbers')
LIMIT 10
FORMAT `Null`
SETTINGS prefer_localhost_replica = 0, async_query_sending_for_remote = 0, max_threads = 1

Query id: bd0046ec-35a8-47a2-b9fc-b080f07ddf0e

Ok.

0 rows in set. Elapsed: 37.512 sec.

avogar-dev :) select * from remote('127.0.0.{1,2,3,4,5,6}', 'system.numbers') limit 10 format Null settings prefer_localhost_replica=0, async_query_sending_for_remote=1, max_threads=1

SELECT *
FROM remote('127.0.0.{1,2,3,4,5,6}', 'system.numbers')
LIMIT 10
FORMAT `Null`
SETTINGS prefer_localhost_replica = 0, async_query_sending_for_remote = 1, max_threads = 1

Query id: cc27cab4-cc8c-4745-b125-a5bd00ffbe4c

Ok.

0 rows in set. Elapsed: 11.984 sec.

As we can see, now we defenitey send query/tmp tables async across shards :)

Now, test that we connect to replicas async across replicas and shards:
Before:

avogar-dev :) select * from remote('{129.0.0.1|127.0.0.1,129.0.0.2|127.0.0.2}', 'system.numbers') limit 10 format Null settings prefer_localhost_replica=0,  max_threads=1, hedged_connection_timeout_ms=1000, connect_timeout_with_failover_ms=5000

SELECT *
FROM remote('{129.0.0.1|127.0.0.1,129.0.0.2|127.0.0.2}', 'system.numbers')
LIMIT 10
FORMAT `Null`
SETTINGS prefer_localhost_replica = 0, max_threads = 1, hedged_connection_timeout_ms = 1000, connect_timeout_with_failover_ms = 5000

Query id: 474fd53e-e064-4ef8-80c1-77c8fffb5dd2

Ok.

0 rows in set. Elapsed: 10.380 sec.

10 seconds. We spend 5 seconds trying to connect to replica on each shard and don't change replica during it.

Now:

:) select * from remote('{129.0.0.1|127.0.0.1,129.0.0.2|127.0.0.2}', 'system.numbers') limit 10 format Null settings prefer_localhost_replica=0, async_query_sending_for_remote=0, max_threads=1, hedged_connection_timeout_ms=1000, connect_timeout_with_failover_ms=5000

SELECT *
FROM remote('{129.0.0.1|127.0.0.1,129.0.0.2|127.0.0.2}', 'system.numbers')
LIMIT 10
FORMAT `Null`
SETTINGS prefer_localhost_replica = 0, async_query_sending_for_remote = 0, max_threads = 1, hedged_connection_timeout_ms = 1000, connect_timeout_with_failover_ms = 5000

Query id: de3ab98c-ee08-4288-9b86-cfce613fbcac

Ok.

0 rows in set. Elapsed: 2.035 sec.

avogar-dev :) select * from remote('{129.0.0.1|127.0.0.1,129.0.0.2|127.0.0.2}', 'system.numbers') limit 10 format Null settings prefer_localhost_replica=0, async_query_sending_for_remote=1, max_threads=1, hedged_connection_timeout_ms=1000, connect_timeout_with_failover_ms=5000

SELECT *
FROM remote('{129.0.0.1|127.0.0.1,129.0.0.2|127.0.0.2}', 'system.numbers')
LIMIT 10
FORMAT `Null`
SETTINGS prefer_localhost_replica = 0, async_query_sending_for_remote = 1, max_threads = 1, hedged_connection_timeout_ms = 1000, connect_timeout_with_failover_ms = 5000

Query id: 9c6c99ae-310b-459c-afff-3c359227331b

Ok.

0 rows in set. Elapsed: 1.145 sec.

With async_query_sending_for_remote=0 we have 2 seconds. We don't do it async across shards, but we do it async across replicas (according to hedged_connection_timeout_ms = 1000). With async_query_sending_for_remote=1 we have 1 sec, we connect async across shards and replicas.

I will add some integration tests, but they won't rely on query execution time (it can lead to flakiness), I will use some profile events instead

nickitat

lgtm

Avogar force-pushed the non-blocking-connect branch from 76b795d to 3fae85f Compare March 3, 2023 19:58

robot-clickhouse-ci-1 added the pr-improvement Pull request with some product improvements label Mar 3, 2023

nickitat self-assigned this Mar 6, 2023

alexey-milovidov mentioned this pull request Mar 7, 2023

increase connect_timeout_with_failover_secure_ms to 1000 #47266

Closed

1 task

Avogar marked this pull request as ready for review March 9, 2023 12:04

Avogar added 3 commits March 15, 2023 12:18

Improve async reading from socket

17f5356

Fix more tests

b603ce6

Merge with master

9596c21

Avogar force-pushed the non-blocking-connect branch from f30202c to 9596c21 Compare March 15, 2023 13:01

Fix build

2f2d713

Avogar marked this pull request as draft March 16, 2023 15:38

Avogar and others added 18 commits March 17, 2023 13:02

Fix, make better

4b46b5b

Merge branch 'master' into non-blocking-connect

1d12e85

Clean

15f5a46

Fix

ba389c2

Clean

5312763

Fix possible race conditions

38e4486

Fix build

a8ea305

Bump

b786f6f

Try fix

783e3ab

Refactor RemoteQueryExecutor, make it more thread safe

bf7e62f

Merge branch 'master' into non-blocking-connect

3ee12e2

Merge branch 'master' into non-blocking-connect

f3f93dd

Fix special build

c838013

Merge branch 'master' into non-blocking-connect

713ecb2

Fix special build

d76f7ec

Fix style

e784304

Fix special builds

7617599

Merge branch 'master' into non-blocking-connect

bc92e31

Avogar marked this pull request as ready for review March 27, 2023 11:50

Avogar and others added 2 commits March 28, 2023 16:00

Fix tests

e632317

Delete unneded file

223aa6a

Avogar requested a review from nickitat March 29, 2023 11:42

DanRoscigno mentioned this pull request Apr 3, 2023

missing docs for connect_timeout_with_failover_secure_ms #47267

Closed

nickitat reviewed Apr 3, 2023

View reviewed changes

azat reviewed Apr 12, 2023

View reviewed changes

Avogar and others added 4 commits April 19, 2023 13:30

Update programs/copier/ClusterCopier.cpp

58b973f

Co-authored-by: Azat Khuzhin <[email protected]>

Update src/Processors/QueryPlan/ReadFromRemote.cpp

66bdd52

Co-authored-by: Nikita Taranov <[email protected]>

Merge branch 'master' into non-blocking-connect

2ad161d

Update src/Client/Connection.cpp

a23262b

Co-authored-by: Nikita Taranov <[email protected]>

Fix deadlock, add profile event

39ba4c2

Add tests

fd8e510

nickitat approved these changes Apr 20, 2023

View reviewed changes

Fix style

fb139e9

Avogar merged commit 9749448 into ClickHouse:master Apr 21, 2023

This was referenced Apr 21, 2023

Increase default value for connect_timeout_with_failover_ms #49009

Merged

Fix possible logical error "Cannot cancel. Either no query sent or already cancelled" #49106

Merged

Avogar mentioned this pull request Jul 6, 2023

Connection timeout when a remote request is made with use_hedged_requests. #51263

Closed

Avogar mentioned this pull request Jul 18, 2023

Attempt to fix assert in tsan with fibers #52241

Merged

Algunenano mentioned this pull request Dec 1, 2023

Unexpected timeouts in HedgedConnectionsFactory #40135

Closed

azat mentioned this pull request Jul 2, 2024

Possible crash with async sockets (likely due to async_query_sending_for_remote) #65942

Closed

Conversation

Avogar commented Mar 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nickitat Apr 3, 2023

Choose a reason for hiding this comment

Uh oh!

Avogar Apr 19, 2023

Choose a reason for hiding this comment

Uh oh!

azat left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

azat Apr 12, 2023

Choose a reason for hiding this comment

Uh oh!

Avogar Apr 19, 2023

Choose a reason for hiding this comment

Uh oh!

Avogar commented Apr 19, 2023

Uh oh!

Avogar commented Apr 19, 2023

Uh oh!

Avogar commented Apr 20, 2023

Uh oh!

nickitat left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Avogar commented Mar 3, 2023 •

edited

Loading