Reduce lock contention for MergeTree tables (by renaming parts without holding lock)#61973
Conversation
|
This is an automated comment for commit 6cfd5b2 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page
Successful checks
|
3c10ef3 to
21e9798
Compare
21e9798 to
757c78f
Compare
|
Likely related test failures:
Unrelated test failures:
|
|
Related:
Unrelated:
|
8878b54 to
adcf7d5
Compare
|
Still some issues:
|
5c1e6a3 to
e08e975
Compare
|
|
This one is ready. Can someone take a look please? I have some doubts about that now the part is added to the data_parts_vector before rename and it may leads to removing detached parts, and triggering LOGICAL_ERROR, I've fixed it in the rollback code by checking if it is detached part, then revert it to temporary state, instead of removing it. But in general renaming under lock, is very bad, since it can hold the lock for way more longer time than it should be, and during this time you can't work with the table at all. |
|
Can someone take a look at this one? |
|
Kind ping |
|
@azat, need to fix all the tests. |
|
@alexey-milovidov tests failures does not looks related, but it is quite old, so I will do a rebase at least. |
e08e975 to
ec2b3a3
Compare
|
ec2b3a3 to
ffa2ab7
Compare
|
Signed-off-by: Azat Khuzhin <[email protected]>
…t holding lock) Under heavy load, or not so heavy but with fsync_part_directory=1, time that renameTo() holds DataPartsLock will be increased, and this will affect almost every operation with this table. On one of production clusters I saw ~60 seconds with fsync_part_directory=1. Move the renameTo() out from the critical section. v2: instead of using DataPartsLock.lock.lock()/unlock() move the renameTo() into MergeTreeData::Transaction::commit() Signed-off-by: Azat Khuzhin <[email protected]>
Signed-off-by: Azat Khuzhin <[email protected]>
Signed-off-by: Azat Khuzhin <[email protected]>
Since there is an assertion that does not allows to remove detached parts during cleanup, which sounds good in general, but breaks this new code. Signed-off-by: Azat Khuzhin <[email protected]>
The problem was that with this patch set renameMergedTemporaryPart() is
called without temporary_directory_lock holded (in MergeTask), since it
is reseted just before calling renameMergedTemporaryPart(), and this can
be seen in logs:
2024.03.29 19:56:42.126919 [ 1341 ] {ea7a3fd2-cf47-4ec7-91a5-51c69fba1b95::-8_0_138_2_2} <Trace> test_btnct5cr.alter_table_0 (ea7a3fd2-cf47-4ec7-91a5-51c69fba1b95) (MergerMutator): Merged 50 parts: [-8_0_0_0_2, -8_138_138_0] -> -8_0_138_2_2
2024.03.29 19:56:42.127034 [ 1341 ] {ea7a3fd2-cf47-4ec7-91a5-51c69fba1b95::-8_0_138_2_2} <Debug> test_btnct5cr.alter_table_0 (ea7a3fd2-cf47-4ec7-91a5-51c69fba1b95): Committing part -8_0_138_2_2 to zookeeper
2024.03.29 19:56:42.128462 [ 884 ] {} <Warning> test_btnct5cr.alter_table_0 (ea7a3fd2-cf47-4ec7-91a5-51c69fba1b95): Removing temporary directory /var/lib/clickhouse/store/ea7/ea7a3fd2-cf47-4ec7-91a5-51c69fba1b95/tmp_merge_-8_0_138_2_2/
2024.03.29 19:56:42.128647 [ 1341 ] {ea7a3fd2-cf47-4ec7-91a5-51c69fba1b95::-8_0_138_2_2} <Debug> test_btnct5cr.alter_table_0 (ea7a3fd2-cf47-4ec7-91a5-51c69fba1b95): Part -8_0_138_2_2 committed to zookeeper
...
2024.03.29 19:56:54.586084 [ 57841 ] {bf240267-0620-4294-afc1-479c58e6be89} <Error> executeQuery: std::exception. Code: 1001, type: std::__1::__fs::filesystem::filesystem_error, e.what() = filesystem error: in file_size: No such file or directory ["/var/lib/clickhouse/store/ea7/ea7a3fd2-cf47-4ec7-91a5-51c69fba1b95/-8_0_138_2_2/data.cmrk3"]
This should fix failures of 00993_system_parts_race_condition_drop_zookeeper in [1].
[1]: https://s3.amazonaws.com/clickhouse-test-reports/61973/f6f826c85dd5b7bb8db16286fd10dcf441a440f7/stateless_tests__coverage__[4_6].html
Though now it looks hackish...
Signed-off-by: Azat Khuzhin <[email protected]>
Signed-off-by: Azat Khuzhin <[email protected]>
Signed-off-by: Azat Khuzhin <[email protected]>
…d_parts_need_rename
CI founds [1]:
Logical error: 'precommitted_parts.size() >= precommitted_parts_need_rename.size()'
[1]: https://s3.amazonaws.com/clickhouse-test-reports/61973/5c1e6a3e956917bdbb7eaa467934e5b75f17a923/stateless_tests__tsan__s3_storage__[5_5].html
The problem is that after precommitted_parts cleaned from detached parts
it may be less then precommitted_parts_need_rename, so to avoid this,
let's just copy it to a new container.
Signed-off-by: Azat Khuzhin <[email protected]>
ffa2ab7 to
6cfd5b2
Compare
|
|
@azat, please resubmit. |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
This change was reverted
Reduce lock contention for MergeTree tables (by renaming parts without holding lock)
Under heavy load, or not so heavy but with fsync_part_directory=1, time that renameTo() holds DataPartsLock will be increased, and this will affect almost every operation with this table.
On one of production clusters I saw ~60 seconds with fsync_part_directory=1.
Move the renameTo() out from the critical section.