Speed up parts loading process of MergeTree by metadata cache#32928
Speed up parts loading process of MergeTree by metadata cache#32928CurtizJ merged 71 commits intoClickHouse:masterfrom
Conversation
47bab74 to
37ba800
Compare
|
@Mergifyio update |
❌ Pull request can't be updated with latest base branch changesDetailsMergify needs the author permission to update the base branch of the pull request. |
@CurtizJ I have no idea why this happend, do you know why ? |
You need to enable edits by maintainers (on the right menu of the PR): |
Sometimes it can be impossible if you have very old fork. |
I can't find this option in PR. Indeed I had very old fork in 2019 |
AFAIR it is not about fork time (and I had more older fork), but about fork in some orgs. |
@taiyang-li you can commit as usual and it should work well. |
ok |
|
@CurtizJ If I want to wrap function First style: #if USE_ROCKSDB
void load(const MergeTreeData & storage, const PartMetaCachePtr & meta_cache, const DiskPtr & disk, const String & part_path);
#else
void load(const MergeTreeData & storage, const DiskPtr & disk, const String & part_path);
#endSecond style void load(const MergeTreeData & storage,
#if USE_ROCKSDB
const PartMetaCachePtr & meta_cache,
#endif
const DiskPtr & disk,
const String & part_path); |
|
I prefer the first variant. |
|
Failed test |
|
Failed test |
|
All checks have passed @CurtizJ |
|
@taiyang-li Some tests have failed because of randomization of settings. The CI is unstable for the last several days, sorry for that. The checks weren't started at all for the last commit, but the previous commit before merge of master was ok, so let's merge. |
|
@CurtizJ Thanks! |
getauxval() from glibc-compatibility did not work always correctly: - It does not work after setenv(), and this breaks vsyscalls, like sched_getcpu() [1] (and BaseDaemon.cpp always set TZ if timezone is defined, which is true for CI [2]). Also note, that fixing setenv() will not fix LSan, since the culprit is getauxval() [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1163404 [2]: ClickHouse#32928 (comment) - Another think that is definitely broken is LSan (Leak Sanitizer), it relies on worked getauxval() but it does not work if __environ is not initialized yet (there is even a commit about this). And because of, at least, one leak had been introduced [3]: [3]: ClickHouse#33840 Fix this by using /proc/self/auxv with fallback to environ solution to make it compatible with environment that does not allow reading from auxv (or no procfs). v2: add fallback to environ solution v3: fix return value for __auxv_init_procfs() Refs: ClickHouse#33957 Signed-off-by: Azat Khuzhin <[email protected]>
getauxval() from glibc-compatibility did not work always correctly: - It does not work after setenv(), and this breaks vsyscalls, like sched_getcpu() [1] (and BaseDaemon.cpp always set TZ if timezone is defined, which is true for CI [2]). Also note, that fixing setenv() will not fix LSan, since the culprit is getauxval() [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1163404 [2]: ClickHouse#32928 (comment) - Another think that is definitely broken is LSan (Leak Sanitizer), it relies on worked getauxval() but it does not work if __environ is not initialized yet (there is even a commit about this). And because of, at least, one leak had been introduced [3]: [3]: ClickHouse#33840 Fix this by using /proc/self/auxv with fallback to environ solution to make it compatible with environment that does not allow reading from auxv (or no procfs). v2: add fallback to environ solution v3: fix return value for __auxv_init_procfs() (cherry picked from commit f187c34) v4: more verbose message on errors, CI founds [1]: AUXV already has value (529267711) [1]: https://s3.amazonaws.com/clickhouse-test-reports/39103/2325f7e8442d1672ce5fb43b11039b6a8937e298/stress_test__memory__actions_.html v5: break at AT_NULL v6: ignore AT_IGNORE v7: suppress TSan and remove superior check to avoid abort() in case of race v8: proper suppressions (not inner function but itself) Refs: ClickHouse#33957 Signed-off-by: Azat Khuzhin <[email protected]>
|
@taiyang-li Just out of curiosity, how do you noticed or observed the performance bottleneck of metadata loading? Is there any tools to help? Thanks. |
|
@cangyin You can notice that disk io util is very high before optimization, mainly because of too many read operations of small metadata files. So using rocksdb to speed up loading is a natural idea. |
|
We found that using this cache capability indeed resolved the startup time issue, but we also discovered a memory leak. Especially under high concurrency, the host would hit the memory limit and restart once a week. Although this MR was later deleted, the memory leak issue was never resolved. We would like to continue using this feature.Here is the leaked stack. ==3135795== |
|
Yes, that's why we removed this feature a few years ago (and burned it). |


Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Speed up parts loading process of MergeTree to accelerate starting up of clickhouse-server.
With this improvement, clickhouse-server was able to decrease starting up time from 75 minutes to 20 seconds, with 700k mergetree parts.
Detailed description / Documentation draft:
We use rocksdb to cache metadata files of mergetree parts. When clickhouse-server starts up and loads parts parallelly, first get metadata from rocksdb with file path as key, if cache miss happends, then read metadata file from IDisk and update cache.
System table
merge_tree_meta_cacheand functioncheckPartMetaCachewere implemented to check cache consistency conveniently.Thanks for the initial working of @sundy-li.