Skip to content

Improved IN PARTITION mutations#41344

Closed
quickhouse wants to merge 12 commits intoClickHouse:masterfrom
quickhouse:betternohardlinking
Closed

Improved IN PARTITION mutations#41344
quickhouse wants to merge 12 commits intoClickHouse:masterfrom
quickhouse:betternohardlinking

Conversation

@quickhouse
Copy link
Copy Markdown
Contributor

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Improved IN PARTITION mutations.

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

@quickhouse quickhouse changed the title Improved IN PARTITION mutations. Improved IN PARTITION mutations Sep 15, 2022
@robot-ch-test-poll robot-ch-test-poll added the pr-improvement Pull request with some product improvements label Sep 15, 2022
@tavplubix tavplubix self-assigned this Sep 15, 2022
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise it will be empty (NULL).

Copy link
Copy Markdown
Contributor

@filimonov filimonov Sep 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for example 3 is not possible?

If getPartitionIdsAffectedByCommands can return empty or single element list - why it return list? May be it should use optional?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a common code with StorageReplicatedMergeTree, we can't change it.

Copy link
Copy Markdown
Member

@tavplubix tavplubix Oct 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nobody asks to change it (I think the comment above was about reliability and readability of this code that relies on implicit invariant that probably does not even exist). Please add a comment that explains why this function always returns one partition for StorageMergeTree (I don't understand why). And throw LOGICAL_ERROR if it returned more than one partition.

Copy link
Copy Markdown
Contributor Author

@quickhouse quickhouse Oct 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tavplubix it's not me saying it returns only one partition for StorageMergeTree, but @filimonov . When it returns more than one partition, the optimization will not be applied.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment

@quickhouse quickhouse force-pushed the betternohardlinking branch 2 times, most recently from 401e96c to 1be12b8 Compare September 28, 2022 10:42
@quickhouse quickhouse marked this pull request as ready for review September 28, 2022 10:42
@tavplubix tavplubix added the can be tested Allows running workflows for external contributors label Sep 28, 2022
@tavplubix
Copy link
Copy Markdown
Member

tavplubix commented Sep 28, 2022

Seems like this implementation is something completely different from the solution proposed by @filimonov. Is it expected? Please add a comment with something like design doc describing you solution and explaining how (and why) it works.

@quickhouse quickhouse force-pushed the betternohardlinking branch from 1be12b8 to 47c9fd5 Compare October 3, 2022 08:26
@quickhouse
Copy link
Copy Markdown
Contributor Author

quickhouse commented Oct 3, 2022

Seems like this implementation is something completely different from the solution proposed by @filimonov. Is it expected? Please add a comment with something like design doc describing you solution and explaining how (and why) it works.

I fixed this in 47c9fd5, implementation is in sync with the proposal now. Mutation by partition map allows to stop iterating by current_mutations_by_version early.

Comment on lines +74 to +79
if (partition_id)
{
*out << "partition id: ";
writeEscapedString(*partition_id, *out);
*out << "\n";
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to serialize partition_id here? Seems like we already have partition id inside commands

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need serialization of it specifically, but we need a way to distinguish old storage-wide mutations from new. If we don't change format, we will try to recalculate paritition_id for older and possibly unfinished partition-specific mutations, skip unrelated partitions and stick on merges when in the same partition subset of parts is mutated beyond correct version, or stick being unable to mutate another subset beyond it because there is no related mutation for that. I think this serialization is a neat way to avoid that.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subset of parts is mutated beyond correct version

AFAIK it should not be a problem

Copy link
Copy Markdown
Contributor Author

@quickhouse quickhouse Oct 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that we have different mutations:

  1. with partition_id and IN PARTITION expression, which are new partition-specific mutations — these shall create partition-specific changes;
  2. without partition_id and IN PARTITION expression, which are old or new storage-wide mutations — these create storage-wide changes;
  3. without partition_id but with IN PARTITION expression, which are old partition-specific mutations. They were supposed to make storage-wide changes earlier (before this patch) and may be partially finished on some parts, so we must continue working the same way on them. If we remove these guys from remaining really unaffected parts (parts_to_do), we would have to rollback unaffected parts which already mutated to that version, or we will have finished mutation and different versions in a partition which will prevent merges to happen until new mutation (in that partition or whole storage) appears.

Generally, I think it is a good thing to have partition set calculated at the moment when the mutation is created, anyway.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your point, but I'm not sure about

we must continue working the same way on them. If we remove these guys from remaining really unaffected parts (parts_to_do), we would have to rollback unaffected parts which already mutated to that version

I agree that it's undesirable to have some parts with data version greater than expected version for the partition. But AFAIK it will not break anything.

will prevent merges to happen until new mutation (in that partition or whole storage) appears

It will prevent only merges of parts with different data versions, seems like it's not critical. Also it probably can be addressed by taking last_mutation_by_partition into account in getCurrentMutationVersion.

I don't like serializing partition id for two reasons:

  • It duplicates partition id that we already have in IN PARTITION clause (and we already serialize it).
  • It unconditionally changes format of mutation entries without any settings. It will make downgrade impossible once you run IN PARTITION mutation. IN PARTITION is not a new feature, so this change is backward incompatible. We can introduce a setting like merge_tree_mutation_entry_version or serialize_partition_id_for_in_partition_mutations for compatibility, but I don't like this option. It's better to avoid changing format, and seems like it's possible to avoid it.

/// This set have to be used with `currently_processing_in_background_mutex`.
DataParts currently_merging_mutating_parts;

/// Accessed under `currently_processing_in_background_mutex` lock.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using TSA annotations

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, cool!

Copy link
Copy Markdown
Contributor Author

@quickhouse quickhouse Oct 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tavplubix it does not quite fit in this case because there is (at least) one usage of std::unique_lock instead of std::lock_guard in selectPartsToMerge() and we would not be able to build complete tree of function calls properly annotated.

https://lists.llvm.org/pipermail/cfe-dev/2016-November/051468.html

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use TSA_SUPPRESS_WARNING_FOR_READ in selectPartsToMerge

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is impossible because std::unique_lock has no thread annotations. There is nothing to suppress, however, but it is needed to mark mutex as locked for Clang instead. It would work if we only used std::lock_guard.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible and it works fine with std::unique_lock and TSA_SUPPRESS_WARNING_FOR_READ

@tavplubix tavplubix marked this pull request as draft October 3, 2022 11:51
@quickhouse quickhouse force-pushed the betternohardlinking branch from 1653b7e to 557858f Compare October 5, 2022 21:42
@quickhouse quickhouse force-pushed the betternohardlinking branch from 26f8045 to a6ec470 Compare October 6, 2022 07:01
@quickhouse quickhouse marked this pull request as ready for review October 6, 2022 07:02
@quickhouse quickhouse requested a review from tavplubix October 6, 2022 07:02
…es errors in logs when run on replicated ClickHouse instances).
… generates errors in logs when run on replicated ClickHouse instances)."

This reverts commit 735e062.
@quickhouse
Copy link
Copy Markdown
Contributor Author

quickhouse commented Oct 13, 2022

Problem with exceptions was caused by lack of error handling in Kazoo ZooKeeper client python-zk/kazoo#672 . By default use_keeper=True for all test instances. Example: https://gist.github.com/quickhouse/09d6da34f35a13024b1f3432c6969c03

@quickhouse
Copy link
Copy Markdown
Contributor Author

Problem with exceptions is caused by wrong interoperability of logging and pytest and seems unfixable pytest-dev/pytest#5502 . I would recommend to throw pytest away. The problem may randomly appear in any test in test suite. Example of output: https://gist.github.com/quickhouse/09d6da34f35a13024b1f3432c6969c03 .

@alexey-milovidov
Copy link
Copy Markdown
Member

Improved IN PARTITION mutations.

@quickhouse how exactly?

@quickhouse
Copy link
Copy Markdown
Contributor Author

quickhouse commented Feb 2, 2023 via email

@filimonov
Copy link
Copy Markdown
Contributor

filimonov commented Feb 2, 2023

@alexey-milovidov the implementation of ALTER UPDATE/DELETE IN PARTITION for MergeTree was incorrect and was rolled back #40589

That PR was implementing that concept https://gist.github.com/filimonov/a2c79385159efe2935e616ac7492632b (discussed with @tavplubix )

But since Vladimir ( @quickhouse ) did not finish that task, I'm closing the PR we will reassign the task to finish it (we will send a new PR)

@filimonov filimonov closed this Feb 2, 2023
@aadant
Copy link
Copy Markdown

aadant commented Feb 8, 2023

@filimonov : please also fix the user experience of this feature, if the IN PARTITION is touching selected parts (by virtue of partition pruning), Current implementation would show all parts (I wonder if it is even processing them as I see the counter parts_to_do decreases over time !). My test was mutating 1 partition and I saw thousands of them in system.mutations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants