Conversation
| @@ -275,7 +291,7 @@ bool MergeTreeDataMergerMutator::selectPartsToMerge( | |||
| prev_part = ∂ | |||
There was a problem hiding this comment.
The bug is here. If we found a new partition we will reset prev_part to nullptr at line 255, but after that restore it back. So on the next iteration, we will skip can_merge_callback check for the first part in the partition, because partition_id != *prev_partition_id and unconditionally add this part to the candidates list. Even if this part already participates in merge/mutation. MergeSelectors can easily select this part and assign a merge.
Previously we have seen very similar behavior for ReplicatedMergeTree in 00992 and 00993 tests.
There was a problem hiding this comment.
By the way, does it lead only to extra merges being assigned (with no harm) + subsequent exception with logical error in code? Or something more dangerous?
There was a problem hiding this comment.
I think the worst consequence is unprocessable entries https://github.com/clickhouse/ClickHouse/blob/master/src/Storages/StorageReplicatedMergeTree.cpp#L1465 in the ReplicatedMergeTreeQueue (we have seen this rare error in 00992 and 00993 tests).
|
|
||
| /// Parts are belong to partitions. Only parts within same partition could be merged. | ||
| using PartsInPartition = std::vector<Part>; | ||
| using PartsRange = std::vector<Part>; |
There was a problem hiding this comment.
I've renamed these aliases because they don't represent partitions, but just continuous parts range. Previous naming was slightly confusing.
Co-authored-by: Nikita Mikhaylov <[email protected]>
| # This test was introduced to check concurrency for TTLs merges and mutations | ||
| # but it revealed a bug when we assign different merges to the same part | ||
| # on the borders of partitions. | ||
| def test_no_ttl_merges_in_busy_pool(started_cluster): |
There was a problem hiding this comment.
Actually crash reproduces only in debug build. In other modes it will hang until timeout.
Backport #14444 to 20.8: Fix select parts for merge
Backport #14444 to 20.7: Fix select parts for merge
Backport #14444 to 20.5: Fix select parts for merge
Backport #14444 to 20.6: Fix select parts for merge
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fix bug which leads to wrong merges assignment if table has partitions with a single part.