-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Allow arbitrary number of merges to process concurrently according to their priorities. #22381
Description
Current behaviour
Merge operation occupies a slot in background pool.
If merge operation is started we can either continue to process it util finished or cancel it.
For this reason, we have a logic to limit the size of merge operation:
- if half of pool is occupied and big merge is already processed, then only smaller merges (by specific ratio) can be processed.
This logic is needed to avoid stalls in smaller merges: big merges should not occupy the pool for a long time and small merges should be able to proceed.
Describe the solution you'd like
Allow more merges to start. Every merge has the inner loop where a block of data is processed. Wrap the iteration of this inner loop (a unit of work) to std::function. These units of work will be pushed to another pool for processing (a container under a mutex). A separate thread pool will select these units of work to process according to their priorities. The logic for priorities can be as simple: the total size to merge is smaller - the higher priority.
The size of thread pool can be smaller and it will be configured by user. For example, on extremely slow cloud instances with network disks it can be just 2.
Big merges will be effectively paused when smaller merges needed to process. There is no concern if multiple big merges will be assigned.