Skip to content

Fix create statement when using default minmax indicies#87723

Closed
SmitaRKulkarni wants to merge 5 commits intomasterfrom
Fix_create_with_indices
Closed

Fix create statement when using default minmax indicies#87723
SmitaRKulkarni wants to merge 5 commits intomasterfrom
Fix_create_with_indices

Conversation

@SmitaRKulkarni
Copy link
Copy Markdown
Member

@SmitaRKulkarni SmitaRKulkarni commented Sep 26, 2025

Resolves #87286

cc @rschu1ze @CurtizJ

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

When using implicitly created minmax indices (MergeTree settings add_minmax_index_for_numeric_columns and add_minmax_index_for_string_columns), statement SHOW CREATE TABLE now always shows the correct output, even after ALTER TABLE. (issue #87286)

@SmitaRKulkarni SmitaRKulkarni marked this pull request as draft September 26, 2025 19:59
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Sep 26, 2025

Workflow [PR], commit [d9b11ae]

Summary:

job_name test_name status info comment
Unit tests (tsan) failure
Performance Comparison (amd_release, master_head, 2/3) failure
Configure failure
Performance Comparison (amd_release, master_head, 3/3) failure
Start failure

@clickhouse-gh clickhouse-gh bot added the pr-bugfix Pull request with bugfix, not backported by default label Sep 26, 2025
@rschu1ze rschu1ze marked this pull request as ready for review October 5, 2025 21:53
Copy link
Copy Markdown
Member

@rschu1ze rschu1ze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SmitaRKulkarni LGTM. Lets merge from master and merge?

@SmitaRKulkarni
Copy link
Copy Markdown
Member Author

@rschu1ze : Thanks for checking. the fails look related, fixing

/// if something change in columns.
void recalculateWithNewColumns(const ColumnsDescription & new_columns, ContextPtr context);

bool isImplicitlyCreated() const { return name.starts_with(IMPLICITLY_ADDED_MINMAX_INDEX_PREFIX); }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I liked the previous approach better (= an index is implicit if and only if its name has prefix IMPLICITLY_ADDED_MINMAX_INDEX_PREFIX). It seems easier to reason about than the additonal bool bit.

What I don't understand in the original repro is this: After alter table t drop column s;, only column a with min-max index auto_minmax_index_a remains. Neither a nor auto_minmax_index_a were touched by the DDL. When we run the second show create table t;, why is ClickHouse not able to figure out from the index name prefix that it is an implicit index (so that it can avoid printing it in show create table?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I liked the previous approach better (= an index is implicit if and only if its name has prefix IMPLICITLY_ADDED_MINMAX_INDEX_PREFIX). It seems easier to reason about than the additonal bool bit.

The condition is only true if minmax index flags are enabled. I mean if both minmax index flags are disabled, then a user can create index with IMPLICITLY_ADDED_MINMAX_INDEX_PREFIX prefix.
Also it was failing to even verify that index starting with IMPLICITLY_ADDED_MINMAX_INDEX_PREFIX prefix should not be allowed, as it was not copying those indices (even before verification step)
(

ASTPtr new_indices = InterpreterCreateQuery::formatIndices(metadata.secondary_indices);
) so this test failed.
https://github.com/ClickHouse/ClickHouse/actions/runs/18264843109/job/51997742195#step:5:9763

What I don't understand in the original repro is this: After alter table t drop column s;, only column a with min-max index auto_minmax_index_a remains. Neither a nor auto_minmax_index_a were touched by the DDL. When we run the second show create table t;, why is ClickHouse not able to figure out from the index name prefix that it is an implicit index (so that it can avoid printing it in show create table?

the check in formatIndices was missing, which is added in this PR.

Hope that answers your question.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it was failing to even verify that index starting with IMPLICITLY_ADDED_MINMAX_INDEX_PREFIX prefix should not be allowed, as it was not copying those indices (even before verification step so this test failed. https://github.com/ClickHouse/ClickHouse/actions/runs/18264843109/job/51997742195#step:5:9763

I see and I could reproduce locally what you describe.

As of today, we reject indexes with names starting with the implicit prefix in CREATE and ALTER TABLE if the setting is on.

I then asked myself the question what is the difference between a normal, explicitly created minmax index and an implicitly created minmax index.

Difference 1: Implicit minmax indexes should be created implicitly if the setting is on.

Difference 2: We want to hide implicit minmax indexes in SHOW CREATE TABLE (but they may appear in system.data_skipping_indices).

Are there other significant differences? I couldn't think of any.
My next thought was ...

  • why don't we allow the user to create minmax indexes with the implicit prefix (difference 1). I doubt that anyone will do this in practice, but if someone does the only "damage" will be that the new index won't be shown in SHOW CREATE TABLE. We could document this behavior. (*)
  • to make aforementioned situation a little bit harder to happen, we could prefix the prefix with an underscore, i.e. _auto_minmax_index_. This makes its name similar to existing internal virtual columns that follow the same naming scheme.

The question is what should happen in the case of (*), i.e. the user creates an index with implicit prefix and the setting is on. Note that is a very edge case.

  • if the explicit index name is equal to an index name which the setting would create, then we are done
  • if the explicit index name is different to the index name which the setting would create (e.g. the user creates index auto_minmax_index_XYZ on a column a, then we'd create two indexes - the explicitly created one and the implicit one.

The benefit of this will be that the code will only ever need to check for the prefix in the index name, and not maintain extra boolean state (like in this PR).

Copy link
Copy Markdown
Member Author

@SmitaRKulkarni SmitaRKulkarni Nov 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rschu1ze

why don't we allow the user to create minmax indexes with the implicit prefix (difference 1). I doubt that anyone will do this in practice, but if someone does the only "damage" will be that the new index won't be shown in SHOW CREATE TABLE. We could document this behavior. (*)

In my opinion, it seems like this could lead to more confusion.

But more importantly I am not clear how we will even implement this. We will implement isImplicitlyCreated function to just check the prefix and return true or false. And we will keep the check in formatIndices i.e dont copy implicitly created indices. And remove all the other changes.
With just this change, we will still have this fail
https://github.com/ClickHouse/ClickHouse/actions/runs/18264843109/job/51997742195#step:5:9763

Correct me if I am wrong.
Feel free to update the code in this branch according to your suggestion, if we dont like it we can revert back. Then it will be more clear how you are suggesting to implement.

@SmitaRKulkarni
Copy link
Copy Markdown
Member Author

Closed in favour of ##91429

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix Pull request with bugfix, not backported by default

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CREATE STATEMENT altered after ALTER when using add_minmax_index_for_xxx flags

2 participants