-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Option to increase concurrency on MergeTreeDataPart removal during Cleanup. #6372
Description
Hello ClickHouse maintainers, I'm James and I am the person responsible for administrating our cluster at Sentry. We love what you've built at Yandex, so much that we replaced our Postgres infrastructure with it and now use ClickHouse as our primary search and storage system. I'd like to pose a feature that will allow us to write to our production cluster at greater frequency while also enabling us to increase our column count.
In our current production cluster, write frequency is bound by the single-threaded cleanup of obsolete parts. This is due to the amount of files (from both column count and Nullable types) that need to be unlinked for each part. If we increase our write frequency, the single thread cannot keep up with the amount of parts to remove, and the replica(s) will eventually refuse writes (as they should).
If this process was multi-threaded, we would be able to increase our write frequency and our column count. The disks we currently use are Google Compute Environment SSD Persistent Disks, which allow concurrent syscalls like unlink.
Either an option to enable/disable concurrent cleanup, or a count of threads to submit to the BackgroundSchedulePool would be sufficient for us. Anything that could give us the concurrency that our disks support would be gratefully appreciated.