Be more graceful with existing tables with inverted indexes#64656
Be more graceful with existing tables with inverted indexes#64656rschu1ze merged 4 commits intoClickHouse:masterfrom
inverted indexes#64656Conversation
|
This is an automated comment for commit ef0bff6 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page
Successful checks
|
eef96bb to
344314d
Compare
344314d to
d5ec72d
Compare
| /// Index type 'inverted' was renamed to 'full_text' in May 2024. | ||
| /// Tables with old indexes can be loaded during a transition period. We still want let users know that they should drop existing | ||
| /// indexes and re-create them. Function `createIndexGranule` is called whenever the index is used by queries. Reject the query if we | ||
| /// are an old index. |
There was a problem hiding this comment.
| /// are an old index. | |
| /// have an old index. |
| SET allow_experimental_inverted_index = 0; | ||
| CREATE TABLE tab(k UInt64, s String, INDEX idx(s) TYPE inverted(2)) ENGINE = MergeTree() ORDER BY k; |
There was a problem hiding this comment.
This looks bad in my opinion. It is not documented, but if somebody is sloppy, they can create a table with inverted indices, even though they should use full_text. Cannot we have the backdoor only for attach queries? Then if I am not mistaken we should be able to attach tables with inverted indices, but won't be able to create new ones.
There was a problem hiding this comment.
@antaljanosbenjamin I fully agree with you but only in a perfect world.
Consider this: Inverted indexes are experimental, enabling the corresponding setting is only fine for testing. Even while testing, anything can - in theory - happen, data loss, corruption, etc. From a product perspective, breaking experimental features is acceptable (it is not nice though).
My expectation is that nobody in their right mind uses inverted indexes in production. The number of affected users should therefore converge towards zero. This PR accommodates for the handful of users where this assumption does not hold. Consider this PR more of a quickfix in a situation where a real fix (i.e. what you propose) isn't worth the trouble as the addressed problems happens very rarely right now and ceases to happen in future. It will cease to happen in future because the backdoor isn't documented and it will be disabled after a transition period at the end of the year.
So in some sense, this PR isn't perfect but it improves over simply breaking old persistences ...
There was a problem hiding this comment.
this PR isn't perfect but it improves over simply breaking old persistences
Okay, this is true, so I am okay merging this as is.
There was a problem hiding this comment.
@rschu1ze I came across this PR and I saw that you already discussed this, but I have one suggestion, not sure have you considered it or not, so what do you think about the following:
- make
allow_experimental_inverted_indexan alias forallow_experimental_full_text_index(that way clickhouse-client will report it as a warning, though not a lot of users look into them, but if we are talking about development env, then maybe the situation is different) - keep throwing exception on SELECT that uses "inverted" indexes (so that users have to migrate to "fulltext")
- allow attaching "inverted" indexes (to make server start)
- forbid creating new tables with "inverted" indexes (to avoid such tables in future)
Since right now it is still possible to create this deprecated index (and only with allow_experimental_inverted_index=0) which at least looks odd (even though SELECT is forbidden).
Plus if your workaround will be kept till the end of the year, this time should be enough for all users of this experimental feature to migrate to new index (for whose who do update ClickHouse). And also that way it will be done in the same fashion as for other futures.
Thoughts?
There was a problem hiding this comment.
@azat I pushed a follow-up to this yesterday. The situation based on this PR and the follow-up is actually similar to what you describe:
- creating new
invertedindexes is still possible (assumingallow_experimental_inverted_index = 1) invertedindexes can be attached (again assumingallow_experimental_inverted_index = 1)- to encourage migration: 1. an exception is thrown during usage (making newly created and attached indexes useless) and 2. the docs no longer mention
allow_experimental_inverted_index.
About warnings shown by clickhouse-client: Yes, developers and CLI people may check. Any GUI or driver that connects to ClickHouse via HTTP will not see the warning and therefore not bother 😢
I really appreciate your thoughts but in sum, I guess we think too much about gracefully migrating an experimental feature. The number of production use cases is expected to converge to zero. The main reason for pushing this PR and the follow-up is that ClickHouse server refused to start when it encounters old inverted indexes (see the PR message) - as long as this blocker is removed I'd be good.
There was a problem hiding this comment.
creating new inverted indexes is still possible (assuming allow_experimental_inverted_index = 1)
This still looks odd, what is the reason to allow this?
There was a problem hiding this comment.
No good reason except that I made this PR in a bit of a hurry and did not find out quickly where to distinguish table "creation" vs. "attaching" in the code. For a non-experimental feature, that's imho okay.
|
Integration tests (aarch64) [5/6]: #64451 A sync is green. Lgtm, forcing MergeableCheck green now. |
Conflicts between ClickHouse#64720 and ClickHouse#64656 Signed-off-by: Azat Khuzhin <[email protected]>
Backport #64656 to 24.5: Be more graceful with existing tables with `inverted` indexes
Index types 'annoy' and 'usearch' were removed and replaced by 'vector_similarity' indexes in an earlier commit. This means unfortuantely, that if customers have tables with these indexes and upgrade, their database might not start anymore - the system loads the metadata at startup, thinks something is wrong with such tables, and halts immediately. This commit adds support for loading and attaching such indexes back. Data insert or use (search) return an error which recommends a migration to 'vector_similarity' indexes. The implementation is generally similar to what has recently been implemented for 'full_text' indexes [1, 2]. [1] ClickHouse#64656 [2] ClickHouse#64846
Index types 'annoy' and 'usearch' were removed and replaced by 'vector_similarity' indexes in an earlier commit. This means unfortuantely, that if customers have tables with these indexes and upgrade, their database might not start anymore - the system loads the metadata at startup, thinks something is wrong with such tables, and halts immediately. This commit adds support for loading and attaching such indexes back. Data insert or use (search) return an error which recommends a migration to 'vector_similarity' indexes. The implementation is generally similar to what has recently been implemented for 'full_text' indexes [1, 2]. [1] ClickHouse#64656 [2] ClickHouse#64846
#62884 renamed index type
invertedtofull_text. As a result, tables with existing indexes of typeindexcould no longer be loaded. When the load happens during database startup (this is the default behavior), the database throws an exceptionUnknown Index type 'inverted'and immediately shut down due to corrupt metadata. One could argue that these indexes are experimental, but the behavior is still painful.This PR re-enables loading of tables with
invertedindexes but these indexes will throw an exception (Please drop the index and re-create it) as soon as they are used by a SELECT query.Changelog category (leave one):