Fix create statement when using default minmax indicies#87723
Fix create statement when using default minmax indicies#87723SmitaRKulkarni wants to merge 5 commits intomasterfrom
Conversation
|
Workflow [PR], commit [d9b11ae] Summary: ❌
|
rschu1ze
left a comment
There was a problem hiding this comment.
@SmitaRKulkarni LGTM. Lets merge from master and merge?
|
@rschu1ze : Thanks for checking. the fails look related, fixing |
| /// if something change in columns. | ||
| void recalculateWithNewColumns(const ColumnsDescription & new_columns, ContextPtr context); | ||
|
|
||
| bool isImplicitlyCreated() const { return name.starts_with(IMPLICITLY_ADDED_MINMAX_INDEX_PREFIX); } |
There was a problem hiding this comment.
To be honest, I liked the previous approach better (= an index is implicit if and only if its name has prefix IMPLICITLY_ADDED_MINMAX_INDEX_PREFIX). It seems easier to reason about than the additonal bool bit.
What I don't understand in the original repro is this: After alter table t drop column s;, only column a with min-max index auto_minmax_index_a remains. Neither a nor auto_minmax_index_a were touched by the DDL. When we run the second show create table t;, why is ClickHouse not able to figure out from the index name prefix that it is an implicit index (so that it can avoid printing it in show create table?
There was a problem hiding this comment.
To be honest, I liked the previous approach better (= an index is implicit if and only if its name has prefix IMPLICITLY_ADDED_MINMAX_INDEX_PREFIX). It seems easier to reason about than the additonal bool bit.
The condition is only true if minmax index flags are enabled. I mean if both minmax index flags are disabled, then a user can create index with IMPLICITLY_ADDED_MINMAX_INDEX_PREFIX prefix.
Also it was failing to even verify that index starting with IMPLICITLY_ADDED_MINMAX_INDEX_PREFIX prefix should not be allowed, as it was not copying those indices (even before verification step)
(
ClickHouse/src/Databases/DatabasesCommon.cpp
Line 149 in d9b11ae
https://github.com/ClickHouse/ClickHouse/actions/runs/18264843109/job/51997742195#step:5:9763
What I don't understand in the original repro is this: After alter table t drop column s;, only column a with min-max index auto_minmax_index_a remains. Neither a nor auto_minmax_index_a were touched by the DDL. When we run the second show create table t;, why is ClickHouse not able to figure out from the index name prefix that it is an implicit index (so that it can avoid printing it in show create table?
the check in formatIndices was missing, which is added in this PR.
Hope that answers your question.
There was a problem hiding this comment.
Also it was failing to even verify that index starting with IMPLICITLY_ADDED_MINMAX_INDEX_PREFIX prefix should not be allowed, as it was not copying those indices (even before verification step so this test failed. https://github.com/ClickHouse/ClickHouse/actions/runs/18264843109/job/51997742195#step:5:9763
I see and I could reproduce locally what you describe.
As of today, we reject indexes with names starting with the implicit prefix in CREATE and ALTER TABLE if the setting is on.
I then asked myself the question what is the difference between a normal, explicitly created minmax index and an implicitly created minmax index.
Difference 1: Implicit minmax indexes should be created implicitly if the setting is on.
Difference 2: We want to hide implicit minmax indexes in SHOW CREATE TABLE (but they may appear in system.data_skipping_indices).
Are there other significant differences? I couldn't think of any.
My next thought was ...
- why don't we allow the user to create minmax indexes with the implicit prefix (difference 1). I doubt that anyone will do this in practice, but if someone does the only "damage" will be that the new index won't be shown in
SHOW CREATE TABLE. We could document this behavior. (*) - to make aforementioned situation a little bit harder to happen, we could prefix the prefix with an underscore, i.e.
_auto_minmax_index_. This makes its name similar to existing internal virtual columns that follow the same naming scheme.
The question is what should happen in the case of (*), i.e. the user creates an index with implicit prefix and the setting is on. Note that is a very edge case.
- if the explicit index name is equal to an index name which the setting would create, then we are done
- if the explicit index name is different to the index name which the setting would create (e.g. the user creates index
auto_minmax_index_XYZon a columna, then we'd create two indexes - the explicitly created one and the implicit one.
The benefit of this will be that the code will only ever need to check for the prefix in the index name, and not maintain extra boolean state (like in this PR).
There was a problem hiding this comment.
why don't we allow the user to create minmax indexes with the implicit prefix (difference 1). I doubt that anyone will do this in practice, but if someone does the only "damage" will be that the new index won't be shown in SHOW CREATE TABLE. We could document this behavior. (*)
In my opinion, it seems like this could lead to more confusion.
But more importantly I am not clear how we will even implement this. We will implement isImplicitlyCreated function to just check the prefix and return true or false. And we will keep the check in formatIndices i.e dont copy implicitly created indices. And remove all the other changes.
With just this change, we will still have this fail
https://github.com/ClickHouse/ClickHouse/actions/runs/18264843109/job/51997742195#step:5:9763
Correct me if I am wrong.
Feel free to update the code in this branch according to your suggestion, if we dont like it we can revert back. Then it will be more clear how you are suggesting to implement.
|
Closed in favour of ##91429 |
Resolves #87286
cc @rschu1ze @CurtizJ
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
When using implicitly created minmax indices (MergeTree settings
add_minmax_index_for_numeric_columnsandadd_minmax_index_for_string_columns), statementSHOW CREATE TABLEnow always shows the correct output, even afterALTER TABLE. (issue #87286)