Improved `IN PARTITION` mutations by quickhouse · Pull Request #41344 · ClickHouse/ClickHouse

quickhouse · 2022-09-15T09:50:32Z

Changelog category (leave one):

Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Improved IN PARTITION mutations.

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

filimonov · 2022-09-19T08:01:20Z

src/Storages/StorageMergeTree.cpp

Otherwise it will be empty (NULL).

So for example 3 is not possible?

If getPartitionIdsAffectedByCommands can return empty or single element list - why it return list? May be it should use optional?

This is a common code with StorageReplicatedMergeTree, we can't change it.

Nobody asks to change it (I think the comment above was about reliability and readability of this code that relies on implicit invariant that probably does not even exist). Please add a comment that explains why this function always returns one partition for StorageMergeTree (I don't understand why). And throw LOGICAL_ERROR if it returned more than one partition.

@tavplubix it's not me saying it returns only one partition for StorageMergeTree, but @filimonov . When it returns more than one partition, the optimization will not be applied.

Please add a comment

src/Storages/StorageMergeTree.h

…ation.

src/Storages/StorageMergeTree.cpp

src/Storages/MergeTree/MergeTreeMutationEntry.cpp

src/Storages/StorageMergeTree.cpp

tavplubix · 2022-09-28T17:09:36Z

Seems like this implementation is something completely different from the solution proposed by @filimonov. Is it expected? Please add a comment with something like design doc describing you solution and explaining how (and why) it works.

quickhouse · 2022-10-03T08:27:03Z

Seems like this implementation is something completely different from the solution proposed by @filimonov. Is it expected? Please add a comment with something like design doc describing you solution and explaining how (and why) it works.

I fixed this in 47c9fd5, implementation is in sync with the proposal now. Mutation by partition map allows to stop iterating by current_mutations_by_version early.

tavplubix · 2022-10-03T11:38:14Z

src/Storages/MergeTree/MergeTreeMutationEntry.cpp

+        if (partition_id)
+        {
+            *out << "partition id: ";
+            writeEscapedString(*partition_id, *out);
+            *out << "\n";
+        }


Do we really need to serialize partition_id here? Seems like we already have partition id inside commands

We don't need serialization of it specifically, but we need a way to distinguish old storage-wide mutations from new. If we don't change format, we will try to recalculate paritition_id for older and possibly unfinished partition-specific mutations, skip unrelated partitions and stick on merges when in the same partition subset of parts is mutated beyond correct version, or stick being unable to mutate another subset beyond it because there is no related mutation for that. I think this serialization is a neat way to avoid that.

subset of parts is mutated beyond correct version

AFAIK it should not be a problem

I mean that we have different mutations:

with partition_id and IN PARTITION expression, which are new partition-specific mutations — these shall create partition-specific changes;

without partition_id and IN PARTITION expression, which are old or new storage-wide mutations — these create storage-wide changes;

without partition_id but with IN PARTITION expression, which are old partition-specific mutations. They were supposed to make storage-wide changes earlier (before this patch) and may be partially finished on some parts, so we must continue working the same way on them. If we remove these guys from remaining really unaffected parts (parts_to_do), we would have to rollback unaffected parts which already mutated to that version, or we will have finished mutation and different versions in a partition which will prevent merges to happen until new mutation (in that partition or whole storage) appears.

Generally, I think it is a good thing to have partition set calculated at the moment when the mutation is created, anyway.

I understand your point, but I'm not sure about

we must continue working the same way on them. If we remove these guys from remaining really unaffected parts (parts_to_do), we would have to rollback unaffected parts which already mutated to that version

I agree that it's undesirable to have some parts with data version greater than expected version for the partition. But AFAIK it will not break anything.

will prevent merges to happen until new mutation (in that partition or whole storage) appears

It will prevent only merges of parts with different data versions, seems like it's not critical. Also it probably can be addressed by taking last_mutation_by_partition into account in getCurrentMutationVersion.

I don't like serializing partition id for two reasons:

It duplicates partition id that we already have in IN PARTITION clause (and we already serialize it).

It unconditionally changes format of mutation entries without any settings. It will make downgrade impossible once you run IN PARTITION mutation. IN PARTITION is not a new feature, so this change is backward incompatible. We can introduce a setting like merge_tree_mutation_entry_version or serialize_partition_id_for_in_partition_mutations for compatibility, but I don't like this option. It's better to avoid changing format, and seems like it's possible to avoid it.

tavplubix · 2022-10-03T11:41:42Z

src/Storages/StorageMergeTree.h

    /// This set have to be used with `currently_processing_in_background_mutex`.
    DataParts currently_merging_mutating_parts;

+    /// Accessed under `currently_processing_in_background_mutex` lock.


Consider using TSA annotations

@tavplubix it does not quite fit in this case because there is (at least) one usage of std::unique_lock instead of std::lock_guard in selectPartsToMerge() and we would not be able to build complete tree of function calls properly annotated.

https://lists.llvm.org/pipermail/cfe-dev/2016-November/051468.html

You can use TSA_SUPPRESS_WARNING_FOR_READ in selectPartsToMerge

This is impossible because std::unique_lock has no thread annotations. There is nothing to suppress, however, but it is needed to mark mutex as locked for Clang instead. It would work if we only used std::lock_guard.

It is possible and it works fine with std::unique_lock and TSA_SUPPRESS_WARNING_FOR_READ

src/Storages/StorageMergeTree.cpp

…ree`.

…es errors in logs when run on replicated ClickHouse instances).

… generates errors in logs when run on replicated ClickHouse instances)." This reverts commit 735e062.

quickhouse · 2022-10-13T15:10:05Z

Problem with exceptions was caused by lack of error handling in Kazoo ZooKeeper client python-zk/kazoo#672 . By default use_keeper=True for all test instances. Example: https://gist.github.com/quickhouse/09d6da34f35a13024b1f3432c6969c03

quickhouse · 2022-10-13T15:25:36Z

Problem with exceptions is caused by wrong interoperability of logging and pytest and seems unfixable pytest-dev/pytest#5502 . I would recommend to throw pytest away. The problem may randomly appear in any test in test suite. Example of output: https://gist.github.com/quickhouse/09d6da34f35a13024b1f3432c6969c03 .

alexey-milovidov · 2023-01-29T17:10:57Z

Improved IN PARTITION mutations.

@quickhouse how exactly?

quickhouse · 2023-02-02T10:26:01Z

I apologize, I am completely exhausted and can't look at this in the near future.

…

On Sun, Jan 29, 2023, 8:11 PM Alexey Milovidov ***@***.***> wrote: Improved IN PARTITION mutations. @quickhouse <https://github.com/quickhouse> how exactly? — Reply to this email directly, view it on GitHub <#41344 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AZ5CUDWSWFIWJX233NGUN6DWU2QCZANCNFSM6AAAAAAQNG5KOU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

filimonov · 2023-02-02T11:13:12Z

@alexey-milovidov the implementation of ALTER UPDATE/DELETE IN PARTITION for MergeTree was incorrect and was rolled back #40589

That PR was implementing that concept https://gist.github.com/filimonov/a2c79385159efe2935e616ac7492632b (discussed with @tavplubix )

But since Vladimir ( @quickhouse ) did not finish that task, I'm closing the PR we will reassign the task to finish it (we will send a new PR)

aadant · 2023-02-08T17:11:35Z

@filimonov : please also fix the user experience of this feature, if the IN PARTITION is touching selected parts (by virtue of partition pruning), Current implementation would show all parts (I wonder if it is even processing them as I see the counter parts_to_do decreases over time !). My test was mutating 1 partition and I saw thousands of them in system.mutations.

quickhouse changed the title ~~Improved IN PARTITION mutations.~~ Improved IN PARTITION mutations Sep 15, 2022

robot-ch-test-poll added the pr-improvement Pull request with some product improvements label Sep 15, 2022

tavplubix self-assigned this Sep 15, 2022

filimonov reviewed Sep 19, 2022

View reviewed changes

src/Storages/StorageMergeTree.h Outdated Show resolved Hide resolved

filimonov reviewed Sep 19, 2022

View reviewed changes

src/Storages/StorageMergeTree.h Outdated Show resolved Hide resolved

Apply mutations in StorageMergeTree according IN PARTITION inform…

401e96c

…ation.

quickhouse force-pushed the betternohardlinking branch 2 times, most recently from 401e96c to 1be12b8 Compare September 28, 2022 10:42

quickhouse marked this pull request as ready for review September 28, 2022 10:42

tavplubix added the can be tested Allows running workflows for external contributors label Sep 28, 2022

filimonov reviewed Sep 28, 2022

View reviewed changes

src/Storages/StorageMergeTree.cpp Outdated Show resolved Hide resolved

tavplubix reviewed Sep 28, 2022

View reviewed changes

src/Storages/MergeTree/MergeTreeMutationEntry.cpp Outdated Show resolved Hide resolved

src/Storages/StorageMergeTree.cpp Outdated Show resolved Hide resolved

src/Storages/StorageMergeTree.cpp Outdated Show resolved Hide resolved

Added shortcut for storage-wide mutations.

47c9fd5

quickhouse force-pushed the betternohardlinking branch from 1be12b8 to 47c9fd5 Compare October 3, 2022 08:26

tavplubix reviewed Oct 3, 2022

View reviewed changes

src/Storages/StorageMergeTree.cpp Show resolved Hide resolved

tavplubix marked this pull request as draft October 3, 2022 11:51

quickhouse added 3 commits October 5, 2022 14:45

Style fix.

038ea2f

Loading of mutations for better performance.

bcfc588

Added test for ALTER ... IN PARTITION.

557858f

quickhouse force-pushed the betternohardlinking branch from 1653b7e to 557858f Compare October 5, 2022 21:42

Added and rewritten tests in `test_mutations_in_partitions_of_merge_t…

a6ec470

…ree`.

quickhouse force-pushed the betternohardlinking branch from 26f8045 to a6ec470 Compare October 6, 2022 07:01

quickhouse marked this pull request as ready for review October 6, 2022 07:02

quickhouse requested a review from tavplubix October 6, 2022 07:02

Better name for a test.

568f829

quickhouse added 3 commits October 12, 2022 14:17

Style.

7d454ab

Removed kill=True from tests when restarting ClickHouse (it generat…

735e062

…es errors in logs when run on replicated ClickHouse instances).

Revert "Removed kill=True from tests when restarting ClickHouse (it…

69ecddd

… generates errors in logs when run on replicated ClickHouse instances)." This reverts commit 735e062.

Fixed tidy messages.

1004bed

quickhouse force-pushed the betternohardlinking branch from b5fd89d to 1004bed Compare October 20, 2022 06:53

tavplubix marked this pull request as draft November 28, 2022 15:23

azat mentioned this pull request Dec 26, 2022

tests/integration: suppress exceptions during logging (due to pytest) #44618

Merged

Merge branch 'master' into betternohardlinking

f5e4352

filimonov closed this Feb 2, 2023

tavplubix mentioned this pull request Feb 9, 2023

Add new table engine UniqueMergeTree #44534

Closed

tavplubix mentioned this pull request Feb 20, 2023

Scan relevant partitions only when executing mutations #46594

Open

ilejn mentioned this pull request Apr 19, 2023

Simplified partition mutations #48941

Open

tavplubix mentioned this pull request May 30, 2023

Mutation with "in partition" not working #35494

Closed

den-crane mentioned this pull request Jul 7, 2024

RAM memory usage for ALTER DELETE IN PARTITION 23.12.6.19 #66177

Closed

Conversation

quickhouse commented Sep 15, 2022

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

filimonov Sep 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tavplubix Oct 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

quickhouse Oct 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tavplubix commented Sep 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

quickhouse commented Oct 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

quickhouse Oct 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

quickhouse Oct 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

quickhouse commented Oct 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

quickhouse commented Oct 13, 2022

Uh oh!

alexey-milovidov commented Jan 29, 2023

Uh oh!

quickhouse commented Feb 2, 2023 via email

Uh oh!

filimonov commented Feb 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aadant commented Feb 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

filimonov Sep 28, 2022 •

edited

Loading

tavplubix Oct 3, 2022 •

edited

Loading

quickhouse Oct 4, 2022 •

edited

Loading

tavplubix commented Sep 28, 2022 •

edited

Loading

quickhouse commented Oct 3, 2022 •

edited

Loading

quickhouse Oct 24, 2022 •

edited

Loading

quickhouse Oct 5, 2022 •

edited

Loading

quickhouse commented Oct 13, 2022 •

edited

Loading

filimonov commented Feb 2, 2023 •

edited

Loading

aadant commented Feb 8, 2023 •

edited

Loading