Use background thread pool for distributed sends#10263
Use background thread pool for distributed sends#10263alexey-milovidov merged 4 commits intoClickHouse:masterfrom
Conversation
33b97f7 to
a40b4a1
Compare
a40b4a1 to
19ab04b
Compare
Fails in upstream/master too
|
19ab04b to
c78803b
Compare
Added |
c78803b to
cb818d7
Compare
da37e24 to
5c61f0f
Compare
Does not looks related
Uses |
ba91491 to
633f4d6
Compare
|
test_inserts_batching did not fix. |
633f4d6 to
5054322
Compare
5054322 to
fe4be16
Compare
Include info about: - kafka streaming - dns cache updates
…ibuted sends After ClickHouse#8756 the problem with 1 thread for each (distributed table, disk) for distributed sends became even worse (since there can be multiple disks), so use predefined thread pool for this tasks, that can be controlled with background_distributed_schedule_pool_size knob.
fe4be16 to
5d11118
Compare
|
Actually not sure that "pr-improvement" will be enough, since it can be pretty tricky to debug the problems with distributed sends i.e. why it became slower (for "regular" user), maybe backward incompatible is better? |
|
It's only relevant when using a huge number of Distributed tables (rare case). And it should not become slower as we have 16 background threads. Data is sent almost as is without any processing on our side, so it will either saturate the network or there are slow peers (that will require additional debugging). |
Follow-up-for: ClickHouse#10315 Follow-up-for: ClickHouse#10263
CurrentMetrics::Increment add amount for specified metric only for the lifetime of the object, but this is not the intention, since DistributedFilesToInsert is a gauge and after ClickHouse#10263 it can exit from the callback (and enter again later, for example after SYSTEM STOP DISTRIBUTED SEND it will always exit from it, until SYSTEM START DISTRIBUTED SEND). So make Increment member of a class (this will also fix possible issues with substructing value on DROP TABLE).
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Use background thread pool (background_schedule_pool_size) for distributed sends
Detailed description / Documentation draft:
After #8756 the problem with background threads for distributed sends became even worse (since thread per volume will be created).
Fixes: #9551
Refs: #8756
See-also: #10315 (same thing for
Bufferengine)