Validate SummungMergeTree columns to sum are not in sort key, fix partition key validation. Fix auto resolve of summing cols.#78022
Conversation
20b393d to
1739cd3
Compare
|
BTW it's possible to use CREATE TABLE t0 (c0 SimpleAggregateFunction(sum, Int64))
ENGINE = SummingMergeTree() ORDER BY (c0%1);
INSERT INTO TABLE t0 (c0) VALUES (2),(1),(2);
OPTIMIZE TABLE t0 FINAL;
SELECT * FROM t0;
┌─c0─┐
│ 5 │
└────┘ |
|
Ah, the same issue with AggregatingMergeTree CREATE TABLE t0 (c0 SimpleAggregateFunction(sum, Int64))
ENGINE = AggregatingMergeTree() ORDER BY (c0%1);
INSERT INTO TABLE t0 (c0) VALUES (2),(1),(2);
OPTIMIZE TABLE t0 FINAL;
SELECT * FROM t0;
┌─c0─┐
│ 5 │
└────┘ |
This one works fine after the patch, it returns single line 2 as expected |
No, no, no. It should be forbidden It should behave the same as |
I am not sure, it's a different check on column comparability, wrapping it with expression casts it to the comparable type, and then it seems to work fine. With that patch in such case, SummingMergeTree won't mistakenly aggregate it, so despite it being a strange choice for the sorting key, it should work ok |
|
OK. Fine. From my POV there is no difference: |
From my POV too, and both are valid and behave equally. The only difference is that in the second case, you can use SumMerge on c0 in a select query. SimpleAggregateFunction won't be |
It obviously leads to a corruption of a Primary Key and must be forbidden. |
In your example Int64 and SimpleAggregateFunction(sum, Int64) behave the same way with the same result. PK corrupted because column c0 is aggregated, however, it shouldn't and this PR fixes exactly that, for both cases. |
2cbac96 to
e6ce234
Compare
…tition key validation. Fix auto resolve of summing cols.
e6ce234 to
765cc85
Compare
src/Core/SettingsChangesHistory.cpp
Outdated
| addSettingsChanges(merge_tree_settings_changes_history, "25.4", | ||
| { | ||
| {"max_merge_delayed_streams_for_parallel_write", 1000, 100, "New setting"}, | ||
| {"allow_summing_columns_in_partition_or_order_key", true, false, "Don't allow summing of partition or sorting key columns"}, |
There was a problem hiding this comment.
Let's specify that this is a new setting and not a change of the value of an existing setting:
New setting to allow summing...
The information that the behavior is changed from allowing to denying is stated in true to false.
| std::set_intersection(columns_to_sum.begin(), columns_to_sum.end(), | ||
| sorting_key_columns.begin(), sorting_key_columns.end(), | ||
| std::back_inserter(names_intersection)); |
There was a problem hiding this comment.
Are both columns_to_sum and sorting_key_columns ranges sorted to use std::set_intersection?
In the above snippet, sorting is applied to check for duplicates:
auto columns_to_sum_copy = columns_to_sum;
std::sort(columns_to_sum_copy.begin(), columns_to_sum_copy.end());
if (const auto it = std::adjacent_find(columns_to_sum_copy.begin(), columns_to_sum_copy.end()); it != columns_to_sum_copy.end())
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Column {} is listed multiple times in the list of columns to sum", *it);
This implies that columns_to_sum may not be sorted.
We should use columns_to_sum_copy instead of columns_to_sum, and it's probably worth renaming it to columns_to_sum_sorted.
The type of sorting_key_columns is a vector of strings, I'm not sure about the ordering.
https://github.com/ClickHouse/ClickHouse/blob/master/src/Core/Names.h#L13
There was a problem hiding this comment.
It's also not working for existing partition key validation https://fiddle.clickhouse.com/3db966f5-79e6-47b9-a9e9-70bcb06e28ca
4d2e00b
* [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250405) * Fix Build due to ClickHouse/ClickHouse#78515 * remove temp code due to #9192 * Fix test fail due to ClickHouse/ClickHouse#78022 --------- Co-authored-by: kyligence-git <[email protected]> Co-authored-by: Chang chen <[email protected]>
Closes: #76866
Except mentioned issue, it fixes cases like, when merged rows end up in the wrong partition.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Enhance SummingMergeTree validation to skip aggregation for columns used in partition or sort keys.
Documentation entry for user-facing changes