Delete parts by replacing them with empty parts#41145
Delete parts by replacing them with empty parts#41145CheSema merged 40 commits intoClickHouse:masterfrom
Conversation
137b570 to
0e70311
Compare
c86e8ad to
8a308a9
Compare
|
We can also create empty covering part on disk for |
779ccc3 to
875946b
Compare
|
@kssenii, 02417_load_marks_async fails in Fast Test blocking other tests from starting: |
875946b to
c33c4ec
Compare
|
The test 01825_type_json_schema_race_long is failed. I investigate it. At first look it seems like Object(‘json’) has to be wrapped in Tuple. This PR seemed to be related #41290 UPD: test fixed with commit UPD2: |
6ae9026 to
2e10652
Compare
ec419d5 to
9a925e6
Compare
f6d78d8 to
9f2c00d
Compare
|
Hooray. Dreams come true. |
This should help with it. |
I saw a server crash with error `Part 20230603_61687_61697_1 intersects part 20230603_61690_61695_1 (state Active) It is a bug. (LOGICAL_ERROR)`. This was using v22.12. I can see that since then the error message was adjusted, but I don't believe the bug itself has been resolved. I believe the bug comes from this change (c976b28). It's associated with this pull request: (ClickHouse#41145). Specifically changes to the file `src/Storages/MergeTree/MergeTreeData.cpp` seem maybe incorrect to me, since the logic used when a part is covered by another part seems to have changed as described below. The change modifies the function `renameTempPartAndReplaceImpl`, such that a call to `getPartHierarchy` is used where previously there was a call to `getActivePartsToReplace`. These two functions have a similar implementation, but one notable difference -- in the latter, execution breaks early if a covering part is found over the new part. In our outer function, there was a condition looking for this covering part, which if hit, would print the warning and return false. Without the check, it instead continues, sees that there is an intersecting part, and crashed with the exception. I believe that the "intersecting" part would also be considered a "covering" part, as in this case it covers the block range from before -> after the range of the new part. I believe a solution is to simply re-add the condition which checks for a covering part, and return early in that case. However, my assessment could be very wrong, I can't say I understand this code all that well. I also don't really know how I could write a test for this either. The other relevant detail; This occurred at the moment when I was running a mutation on a *different* partition of the table. It happened on two occasions, so I'm pretty sure connected to the mutation, although I don't know why operations in different partitions would affect each other in this way.
I saw a server crash with error `Part 20230603_61687_61697_1 intersects part 20230603_61690_61695_1 (state Active) It is a bug. (LOGICAL_ERROR)`. This was using v22.12. I can see that since then the error message was adjusted, but I don't believe the bug itself has been resolved. I believe the bug comes from this change (c976b28). It's associated with this pull request: (ClickHouse#41145). Specifically changes to the file `src/Storages/MergeTree/MergeTreeData.cpp` seem maybe incorrect to me, since the logic used when a part is covered by another part seems to have changed as described below. The change modifies the function `renameTempPartAndReplaceImpl`, such that a call to `getPartHierarchy` is used where previously there was a call to `getActivePartsToReplace`. These two functions have a similar implementation, but one notable difference -- in the latter, execution breaks early if a covering part is found over the new part. In our outer function, there was a condition looking for this covering part, which if hit, would print the warning and return false. Without the check, it instead continues, sees that there is an intersecting part, and crashed with the exception. I believe that the "intersecting" part would also be considered a "covering" part, as in this case it covers the block range from before -> after the range of the new part. I believe a solution is to simply re-add the condition which checks for a covering part, and return early in that case. However, my assessment could be very wrong, I can't say I understand this code all that well. I also don't really know how I could write a test for this either. The other relevant detail; This occurred at the moment when I was running a mutation on a *different* partition of the table. It happened on two occasions, so I'm pretty sure connected to the mutation, although I don't know why operations in different partitions would affect each other in this way.
|
does detach part also make an empty partition? (detach not listed in the affected methods) |
|
Yes, DETACH PART[ITION] works the same way as DROP PART[ITION] |
|
This seems to be an issue still when using a volume with a s3 disk and a cache. Resurrected partitions are driving us crazy. Any advice? |
Try to create a minimal repro and report the issue. |
The issue:
#33457
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
This PR changes how followed queries delete parts: truncate table, alter table drop part, alter table drop partition. Now these queries make empty parts which cover old parts. This makes truncate query works without exclusive lock which means concurrent reads aren't locked. Also achieved durability in all those queries. If request is succeeded then no resurrected pars appear later. Note that atomicity is achieved only with transaction scope.