Skip to content

add docs about insertion deduplication on retries#2394

Merged
justindeguzman merged 13 commits intomainfrom
chesema-insertion-deduplication-on-retries
Jun 28, 2024
Merged

add docs about insertion deduplication on retries#2394
justindeguzman merged 13 commits intomainfrom
chesema-insertion-deduplication-on-retries

Conversation

@CheSema
Copy link
Copy Markdown
Member

@CheSema CheSema commented Jun 11, 2024

No description provided.

That transformed data also has to be deduplicated on retries. Clickhouse deduplicates it in the same way it deduplicates data which is inserted into target table.
User could control that process with settings on the table under materialized view: `replicated_deduplication_window`, `replicated_deduplication_window_seconds` and `non_replicated_deduplication_window`. Also user could use profile setting `deduplicate_blocks_in_dependent_materialized_views`.
For the blocks inserted in tables under materialized views Clickhouse calulates `block_id` as hash from a string which contains concatenation of `block_id`'s for the the source table and other parts which helps to distiguish blocks afret materialised views transformation, like source view's id and the sequential number of that block. That makes deduplication for materialised views works correctly and distinguish the data by its original inserted data, no mater how it has been transormed on its way to the destination table under materialized view.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also mention that update_insert_deduplication_token_in_dependent_materialized_views is deprecated after some version?

@justindeguzman justindeguzman self-requested a review June 28, 2024 23:01
@justindeguzman justindeguzman marked this pull request as ready for review June 28, 2024 23:01
@CheSema
Copy link
Copy Markdown
Member Author

CheSema commented Jun 28, 2024

Did this docs go to the production? The feature has not been merged yet...

@justindeguzman
Copy link
Copy Markdown
Contributor

Sorry about that, will revert asap. I'll just hide it from the sidebar and it won't be accessible. I thought you wanted me to edit it and then merge.

@CheSema
Copy link
Copy Markdown
Member Author

CheSema commented Jun 28, 2024

Yes, I want that.
But I think it is better to merge it together with the feature.
ClickHouse/ClickHouse#61601

I should have filled the description for that PR as well, sorry.

@CheSema
Copy link
Copy Markdown
Member Author

CheSema commented Jul 12, 2024

Sorry about that, will revert asap. I'll just hide it from the sidebar and it won't be accessible. I thought you wanted me to edit it and then merge.

Did you reverted it?

@justindeguzman
Copy link
Copy Markdown
Contributor

Yes this was removed

@CheSema
Copy link
Copy Markdown
Member Author

CheSema commented Jul 17, 2024

The code is committed, I resurrect this doc in a day or two.

@justindeguzman justindeguzman deleted the chesema-insertion-deduplication-on-retries branch August 29, 2024 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants