Fix max_distributed_connections#9673
Conversation
|
Probably we should find better way to test it: |
Actually the test fails correctly without processors, digging |
|
Let me clarify what's going on. When we build query pipeline, there are two parameters we are interested in: the number of streams we build ( Usually,
In order to implement 1., In order to support 2., the number of threads was additionally limited by This pr fixes 3., but broke 2, because It is easy to fix - set |
|
Please also look here. I have forgotten to limit the number of threads for http case. It must also use |
|
Added "bugfix" label because this was significant performance regression that will be visible on large clusters like those we have in Yandex Metrica (and sadly enough is not visible on any testing environments). |
@KochetovNicolai it had been fixed already |
There was a problem hiding this comment.
I suppose using minimum is not the best choice. Imagine query like select number from system.number union all select number from system.numbers. It could have been executed in in 2 threads, but after this change it it will use only one.
I think we should think more about it.
There was a problem hiding this comment.
Indeed (I also though like this), and this will go away, already testing better approach
There was a problem hiding this comment.
Hm, I though that I found better alternative but I didn't (too much interpreters:) )
As for union all:
- if
max_threads=1will be specified than it should use one thread (like before) - if
max_threads=2will be specified than it will be runned in parallel (like before, since each interpreter will havemax_streams=2)
@KochetovNicolai what did I miss?
There was a problem hiding this comment.
FWIW:
$ time clickhouse-client --max_threads 1 -q 'select sleep(1) union all select sleep(1)'
0
0
real 0m2.322s
$ time clickhouse-client --max_threads 2 -q 'select sleep(1) union all select sleep(1)'
0
0
real 0m1.327s
$ time clickhouse-client --max_threads 2 -q 'select sleep(1) union all select sleep(1)' --experimental_use_processors 0
0
0
real 0m1.419s
$ time clickhouse-client --max_threads 1 -q 'select sleep(1) union all select sleep(1)' --experimental_use_processors 0
0
0
real 0m2.421s
Thank you! |
af92dc6 to
7659d44
Compare
7659d44 to
afbe87c
Compare
|
Tests failures looks unrelated:
|
@KochetovNicolai added a test to avoid regression - 7c24cee9b79878d20278be89c3994ba59d5aa1ca |
dbms/tests/queries/0_stateless/01085_max_distributed_connections_http.sh
Outdated
Show resolved
Hide resolved
7c24cee to
3d126e4
Compare
|
01085_max_distributed_connections_http fails :( |
|
Actually, it seems that reference file is wrong. |
if (max_streams > 1 && !is_remote)
max_streams *= settings.max_streams_to_max_threads_ratio;I think we need to limit the number of threads here too. |
Yep, you are right! |
So revert the last change? |
Well, let's not revert anything. |
3d126e4 to
b90d094
Compare
00111_shard_external_sort_distributed breaks everything after, but I cannot find a lot from the logs (maybe clickhouse-server had been killed by OOM, especially TSAN version) P.S. recent changes passes all tests except for 01085_max_distributed_connections_http, where wrong .reference file had been provided) |
00965_live_view_watch_heartbeat (does not looks relevant) P.S. I can rebase to force CI |
With Processors max_distributed_connections does not used anymore, since it always uses not more then max_threads (see TCPHandler::processOrdinaryQueryWithProcessors()/ QueryPipeline::execute()/PipelineExecutor::execute()) So just use QueryPipeline::getNumThreads(), that takes max_distributed_connections into account.
b90d094 to
550cfef
Compare
Looks flacky:
|
|
Unfortunately, these tests clearly indicate a bug (most likely not in your PR). |
…ions Fix max_distributed_connections (cherry picked from commit 9a2760d)
With prefer_localhost_replica=1 max_distributed_connections uses max_threads before this patch. Fixes: ClickHouse#14936 Previous fix: ClickHouse#9673
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fix max_distributed_connections (w/ and w/o Processors)