Skip to content

Non replicated merge tree deduplication#22514

Merged
alesapin merged 23 commits intomasterfrom
merge_tree_deduplication
Apr 7, 2021
Merged

Non replicated merge tree deduplication#22514
alesapin merged 23 commits intomasterfrom
merge_tree_deduplication

Conversation

@alesapin
Copy link
Copy Markdown
Member

@alesapin alesapin commented Apr 2, 2021

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Add new setting non_replicated_deduplication_window for non-replicated MergeTree inserts deduplication.

@robot-clickhouse robot-clickhouse added doc-alert pr-feature Pull request with new product feature labels Apr 2, 2021
@alesapin alesapin marked this pull request as ready for review April 2, 2021 17:50
@alesapin
Copy link
Copy Markdown
Member Author

alesapin commented Apr 2, 2021

Need to try enabled setting by default

/// Preserves order using linked list and remove elements
/// on overflow in FIFO order.
template <typename V>
class LimitedOrderedHashMap
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LRUCache?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's not the last recently used map, it's the last recently added map.

@alesapin
Copy link
Copy Markdown
Member Author

alesapin commented Apr 5, 2021

At least sever didn't crash. I'll check failures.

@alesapin
Copy link
Copy Markdown
Member Author

alesapin commented Apr 5, 2021

Support of deduplication for MergeTree in old syntax is quite complicated because we need to serialize part name, which depends on storage partition key, which depends on metadata snapshot. So I'll just ban it.

@alesapin
Copy link
Copy Markdown
Member Author

alesapin commented Apr 5, 2021

Ok, everything looks expected. Integration s3 merge tree test depends on the order. Func tests just insert equal data. It's especially not obvious for 01197_summing_enum test.

@alesapin
Copy link
Copy Markdown
Member Author

alesapin commented Apr 5, 2021

I'll disable the setting by default and ban for old-syntax merge tree.

@nikitamikhaylov nikitamikhaylov self-assigned this Apr 5, 2021
@alesapin
Copy link
Copy Markdown
Member Author

alesapin commented Apr 6, 2021

need to support alter setting

@alesapin
Copy link
Copy Markdown
Member Author

alesapin commented Apr 6, 2021

Let's wait for performance, but I'm almost sure it's not affected.

@alesapin
Copy link
Copy Markdown
Member Author

alesapin commented Apr 7, 2021

Yes, no related changes in performance.

@alesapin alesapin merged commit 9fd251e into master Apr 7, 2021
@alesapin alesapin deleted the merge_tree_deduplication branch April 7, 2021 07:19
@gyuton
Copy link
Copy Markdown
Contributor

gyuton commented Apr 7, 2021

Internal documentation ticket: DOCSUP-8347.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature Pull request with new product feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants