Skip to content

Allow arbitrary number of merges to process concurrently according to their priorities. #22381

@alexey-milovidov

Description

@alexey-milovidov

Current behaviour

Merge operation occupies a slot in background pool.
If merge operation is started we can either continue to process it util finished or cancel it.
For this reason, we have a logic to limit the size of merge operation:
- if half of pool is occupied and big merge is already processed, then only smaller merges (by specific ratio) can be processed.

This logic is needed to avoid stalls in smaller merges: big merges should not occupy the pool for a long time and small merges should be able to proceed.

Describe the solution you'd like

Allow more merges to start. Every merge has the inner loop where a block of data is processed. Wrap the iteration of this inner loop (a unit of work) to std::function. These units of work will be pushed to another pool for processing (a container under a mutex). A separate thread pool will select these units of work to process according to their priorities. The logic for priorities can be as simple: the total size to merge is smaller - the higher priority.

The size of thread pool can be smaller and it will be configured by user. For example, on extremely slow cloud instances with network disks it can be just 2.

Big merges will be effectively paused when smaller merges needed to process. There is no concern if multiple big merges will be assigned.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions