Revert "Revert "insertion deduplication on retries for materialized views""#66144
Revert "Revert "insertion deduplication on retries for materialized views""#66144
Conversation
|
This is an automated comment for commit bcd08b8 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page
Successful checks
|
src/Interpreters/Squashing.cpp
Outdated
There was a problem hiding this comment.
I have added that chassert(aggr_chunk) in oreder to check that Squashing produces chunks which converts to bool as true.
This has to be valid always, otherwise chunk might dropped in some processor. But that chunk not empty, chunkinfo contains the data.
Also chunk has to have the same columns as the processor header. This why we have tp set columns from header.
BUT. Sometime there are chunks with 0 columns passing through pipeline. And it is valid situation. This is why we have to set size when there is no columns.
14873e5 to
121cd55
Compare
|
01275_parallel_mv -- related flacks. It flack only on aarch64. |
121cd55 to
cb3d0e7
Compare
cb3d0e7 to
376472c
Compare
|
Before that changes all parts were written and committed in consume() method. This was not intentionally. By design the written parts could be delayed and be committed later at the next call consume() method or at the onFinish(). But a very sophisticated dirty code about deduplication made this impossible. I fixed it, now committing a part could be delayed. That means such parts should be able to be canceled properly. I will address it here #66279 |
| @@ -1,5 +1,5 @@ | |||
| #!/usr/bin/env bash | |||
| # Tags: long, no-fasttest, no-parallel | |||
| # Tags: long, no-fasttest, no-parallel, no-asan | |||
There was a problem hiding this comment.
could this necessity to mark some tests as slow indicate that code slowed down noticeably? these tests seem to be good to have with asan though
There was a problem hiding this comment.
I do not know how to mark them as 'slover_than_600_sec'.
As I wrote before: combination Asan + Azure works 10-11 minutes. I do not know how to exclude only it.
There was a problem hiding this comment.
If we already have no-s3-storage, then we should also have (or add) no-azure-storage.
There was a problem hiding this comment.
Cool. there is no-azure-blob-storage already.
ClickHouse/tests/clickhouse-test
Line 1206 in 93abd4a
It just undocumented. I found it when tried to implement it. I will change tags in separate PR.
|
PullRequestCI / Builds_2 (pull_request) Skipped -- I need it, so wait one more round of CI. |
|
I see tidy build is broken -- #66552 |
|
Stateless tests (debug, s3 storage) [2/2] |
kssenii
left a comment
There was a problem hiding this comment.
LGTM, but please check #66144 (comment), may be change this in a separate PR?
|
Cloud fork sync has failed |
* [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240718) * Fix build due to ClickHouse/ClickHouse#66144 --------- Co-authored-by: kyligence-git <[email protected]> Co-authored-by: Chang Chen <[email protected]>
Yes. It does. PS: there is already the test |
Reverts #66134
Here I look at the tests more accurate and fix bugs in stress tests, like #66122
Implements ideas from #60008
Docs in progress ClickHouse/clickhouse-docs#2394
I improved deduplication by enhancing annotation of chunks on a pipeline level.
Now, each chunk could have several attached structures with base class
ChunkInfowhich are differ by the derived type. That annotation is passing with the chunks through theProcessors. SeeChunk::ChunkInfoCollection,CollectionOfDerivedItems<ChunkInfo>.The deduplication token for each chunk is written as
TokenInfo(derived class fromChunkInfo) withSetInitialTokenTransform. After that token could be updated. SeeDeduplicationToken::TokenInfo::BuildingStage.Initial value for
TokenInfois taken either frominsert_deduplication_tokensetting or it is calculated as a hash from inserted data.In order to distinguish equal blocks which should not be deduplicated,
TokenInfois update with more detailed information about the source of the data, like the names of MV on the way to the table.Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
This PR changes how deduplication for MV works.
Fixed a lot of cases like:
Settings
update_insert_deduplication_token_in_dependent_materialized_viewsis depricated. The deduplicated token for inserted blocks in MV is calculated based on source data. Always.CI Settings (Only check the boxes if you know what you are doing):