Skip to content

start deduplication tokens migration#97562

Merged
CheSema merged 3 commits intomasterfrom
chesema-start-migration
Feb 24, 2026
Merged

start deduplication tokens migration#97562
CheSema merged 3 commits intomasterfrom
chesema-start-migration

Conversation

@CheSema
Copy link
Copy Markdown
Member

@CheSema CheSema commented Feb 21, 2026

#95409
Start migration of deduplication hashes with insert_deduplication_version = compatible_double_hashes

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Start migration of deduplication hashes

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@CheSema CheSema marked this pull request as draft February 21, 2026 11:55
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Feb 21, 2026

Workflow [PR], commit [505c89b]

Summary:

job_name test_name status info comment
AST fuzzer (amd_debug) failure
Assertion `px != 0' failed (STID: 0250-2e6a) FAIL cidb
AST fuzzer (amd_ubsan) failure
Server unresponsive: memory limit exceeded FAIL cidb
Finish Workflow failure
python3 ./ci/jobs/scripts/workflow_hooks/feature_docs.py failure

@clickhouse-gh clickhouse-gh bot added the pr-feature Pull request with new product feature label Feb 21, 2026
@CheSema CheSema mentioned this pull request Feb 21, 2026
1 task
@CheSema CheSema marked this pull request as ready for review February 23, 2026 12:39
@azat azat self-assigned this Feb 23, 2026
Copy link
Copy Markdown
Member

@azat azat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general looks OK, since in case of downgrade we will have old hashes, and deduplication will continue to work

Also isn't it too fast enabling it the same release as it was added? I know it should be pretty compatible, but still. If you are OK, I'm OK too.

ORDER BY key
PARTITION BY part
SETTINGS non_replicated_deduplication_window=3, disk='s3_plain_rewritable';
SETTINGS non_replicated_deduplication_window=6, disk='s3_plain_rewritable';
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this reduce twice the window for RMT as well? Maybe we need to account this differently, i.e. calculate only new hashes?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only for non_replicated_deduplication_window it will reduce the window due to double set of hashes.

Copy link
Copy Markdown
Member Author

@CheSema CheSema Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For RMT the second set of hashes is stored in the separate place, no reduction but x2 consumption to store them.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is only for non-RMT I think it is OK

/// Settings without path are top-level server settings (no nesting).
#define LIST_OF_SERVER_SETTINGS_WITHOUT_PATH(DECLARE, ALIAS) \
DECLARE(InsertDeduplicationVersions, insert_deduplication_version, InsertDeduplicationVersions::OLD_SEPARATE_HASHES, R"(
DECLARE(InsertDeduplicationVersions, insert_deduplication_version, InsertDeduplicationVersions::COMPATIBLE_DOUBLE_HASHES, R"(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to guard this under compatibility setting? Or it is intended?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the plan. New instances goes with COMPATIBLE_DOUBLE_HASHES, old ones with OLD_SEPARATE_HASHES under compatibility

@CheSema CheSema added this pull request to the merge queue Feb 24, 2026
Merged via the queue into master with commit 2d7d9b2 Feb 24, 2026
144 of 148 checks passed
@CheSema CheSema deleted the chesema-start-migration branch February 24, 2026 12:00
@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Feb 24, 2026
Algunenano pushed a commit to Algunenano/ClickHouse that referenced this pull request Feb 24, 2026
…ration

start deduplication tokens migration
@CheSema CheSema added the pr-must-backport Pull request should be backported intentionally. Use this label with great care! label Feb 25, 2026
@CheSema
Copy link
Copy Markdown
Member Author

CheSema commented Feb 25, 2026

It has to be in 26.2 to start migration on it

@CheSema
Copy link
Copy Markdown
Member Author

CheSema commented Feb 25, 2026

Interesting, no 26.2 in backports

@CheSema CheSema added v26.2-must-backport and removed pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR pr-must-backport Pull request should be backported intentionally. Use this label with great care! labels Feb 25, 2026
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR label Feb 25, 2026
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Feb 25, 2026
@CheSema CheSema removed pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR labels Feb 25, 2026
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR label Feb 25, 2026
@robot-clickhouse robot-clickhouse added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Feb 25, 2026
@CheSema CheSema removed pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR labels Feb 25, 2026
@CheSema
Copy link
Copy Markdown
Member Author

CheSema commented Feb 25, 2026

It is a part of 26.2

@robot-ch-test-poll2 robot-ch-test-poll2 added pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore labels Feb 25, 2026
@alexbakharew
Copy link
Copy Markdown
Contributor

Looks like it is a backward incompatible change, we need to check it once again

@CheSema
Copy link
Copy Markdown
Member Author

CheSema commented Mar 2, 2026

Looks like it is a backward incompatible change, we need to check it once again

Any details?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-feature Pull request with new product feature pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR pr-synced-to-cloud The PR is synced to the cloud repo v26.2-must-backport

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants