migration deduplications hashes with insert_deduplication_version setting#95409
Merged
migration deduplications hashes with insert_deduplication_version setting#95409
insert_deduplication_version setting#95409Conversation
Contributor
|
Workflow [PR], commit [0ee4d4c] Summary: ❌
|
deduplication_unification_stage setting
f493fd0 to
f7f10d5
Compare
f7f10d5 to
49fb9e9
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR introduces a new server setting deduplication_unification_stage that enables migration from separate deduplication hashes for sync and async inserts to a unified deduplication hash scheme. The setting supports three stages: old_separate_hashes (default, backward compatible), compatible_double_hashes (transition stage using both hash types), and new_unified_hash (final state using only unified hashes).
Changes:
- Added
deduplication_unification_stageserver setting with three migration stages - Introduced
DeduplicationHashstruct to encapsulate hash type and block ID generation - Modified deduplication logic across replicated and non-replicated tables to support multiple hash types
- Added integration tests covering migration scenarios between different stages
Reviewed changes
Copilot reviewed 29 out of 30 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| src/Core/ServerSettings.h/cpp | Added deduplication_unification_stage server setting definition |
| src/Core/SettingsEnums.h/cpp | Added DeduplicationUnificationStage enum with three stages |
| src/Interpreters/InsertDeduplication.h/cpp | Introduced DeduplicationHash struct and refactored hash generation logic |
| src/Storages/StorageReplicatedMergeTree.h/cpp | Added deduplication_hashes_cache member and ZooKeeper paths for unified hashes |
| src/Storages/MergeTree/ReplicatedMergeTreeSink.h/cpp | Updated commitPart signature to use DeduplicationHash instead of string block IDs |
| src/Storages/MergeTree/AsyncBlockIDsCache.h/cpp | Generalized cache to work with DeduplicationHash objects and configurable directory names |
| src/Storages/MergeTree/ReplicatedMergeTreeCleanupThread.cpp | Added cleanup logic for deduplication_hashes directory |
| tests/integration/test_migrtation_deduplication_hash/* | Added integration tests for migration scenarios and sync/async deduplication |
azat
reviewed
Feb 6, 2026
Member
azat
left a comment
There was a problem hiding this comment.
I've looked everything except for ReplicatedMergeTreeSink for now
azat
reviewed
Feb 6, 2026
azat
reviewed
Feb 6, 2026
deduplication_unification_stage settinginsert_deduplication_version setting
azat
reviewed
Feb 6, 2026
azat
reviewed
Feb 6, 2026
azat
reviewed
Feb 6, 2026
azat
reviewed
Feb 6, 2026
azat
reviewed
Feb 6, 2026
azat
approved these changes
Feb 6, 2026
azat
reviewed
Feb 6, 2026
1 task
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation: #95160
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
server setting
insert_deduplication_versionmakes it possible to migrate on unified deduplication hashDocumentation entry for user-facing changes