Skip to content

Add unique identifiers IMergeTreeDataPart structure#16033

Merged
alesapin merged 6 commits intoClickHouse:masterfrom
nvartolomei:nv/parts-uuid
Nov 22, 2020
Merged

Add unique identifiers IMergeTreeDataPart structure#16033
alesapin merged 6 commits intoClickHouse:masterfrom
nvartolomei:nv/parts-uuid

Conversation

@nvartolomei
Copy link
Copy Markdown
Contributor

@nvartolomei nvartolomei commented Oct 15, 2020

Add part unique identifiers. Not generated unless assign_part_uuids is set to 1 in MergeTree settings (can be done only if all replicas are upgraded).

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

@robot-clickhouse robot-clickhouse added the pr-not-for-changelog This PR should not be mentioned in the changelog label Oct 15, 2020
@alesapin alesapin self-assigned this Oct 20, 2020
Copy link
Copy Markdown
Member

@alesapin alesapin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add UUID generation for parts in this PR?

@nvartolomei
Copy link
Copy Markdown
Contributor Author

@nvartolomei nvartolomei marked this pull request as ready for review November 4, 2020 12:17
Copy link
Copy Markdown
Member

@alesapin alesapin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, clear to me, need clarification for format_version.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other replicas will fetch part with UUID and get it from this file?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UUID now is exchanged via part exchange protocol. For InMemory parts it is read from the wire directly. For Disk parts: uuid.txt file is fetched and then uuid is loaded together with other metadata in loadColumnsChecksumsIndexes.

Adding it to ZK would is redundant but might be viewed as consistent with other log entires. Though using it as source of truth wouldn't be very convenient (pass it as an argument to Fetcher::fetchPart?).

For now uuids are not generated at all, they are present only if the
part is updated manually (as you can see in the integration test).

The only place where they can be seen today by an end user is in
`system.parts` table. I was looking for hiding this column behind an
option but couldn't find an easy way to do that.

Likely this is also required for WAL, but need to think how not to break
compatibility.

Relates to #13574, #13574

Next 1: In the upcoming PR the plan is to integrate de-duplication based on
these fingerprints in the query pipeline.

Next 2: We'll enable automatic generation of uuids and come up with a
way for conditionally sending uuids when processing distributed queries
only when part movement is in progress.
Avoid breaking backwards compatibility by default for now.
We are breaking backwards compatibility anyway (but agted by a setting)
@alesapin
Copy link
Copy Markdown
Member

2020-11-20 18:13:00 01541_max_memory_usage_for_user:                                        [ FAIL ] 21.16 sec. - having stderror:
2020-11-20 18:13:00 curl: (28) Operation timed out after 10001 milliseconds with 10030730 bytes received
2020-11-20 18:13:00 

Not related to changes seems like a flaky test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants