Add new index data skipping minmax index format for proper Nullable support#27250
Add new index data skipping minmax index format for proper Nullable support#27250akuzm merged 1 commit intoClickHouse:masterfrom
Conversation
|
…upport Note, that it cannot be done w/o new extension, since index does not have any header. v2: use IDisk interface for existence check v3: remove extra file existence check v4: fix MATERIALIZE INDEX
ab0712b to
038241b
Compare
|
|
||
| // NULL_LAST | ||
| if (min_val.isNull()) | ||
| min_val = PositiveInfinity(); |
There was a problem hiding this comment.
Ohhh so this questionable PR that added infinities into the variant type was merged...
By the way, can you serialize it as PositiveInfinity right away? Having to convert it afterwards looks weird.
There was a problem hiding this comment.
I guess we can have a small function deserializeBinaryNullAsPositiveInfinity() and put the logic there. These infinity varants share the same serialization bits as NULL.
There was a problem hiding this comment.
I guess we can have a small function deserializeBinaryNullAsPositiveInfinity() and put the logic the
@amosbird You mean MergeTreeIndexgranuleMinMax::deserializeBinaryNullAsPositiveInfinity() right?
There was a problem hiding this comment.
Oh, I just realized that we don't use writeBinary(Field) here, but instead for some reason we use the data type serialization to serialize a Field, which obviously can't serialize the special Field values. Never mind, it's all equally awkward, let's just keep it as is.
There was a problem hiding this comment.
So instead we have to serialize what we can, and then guess what it was. At least it's always NULLS LAST, for now.
| if (command.type == MutationCommand::Type::DROP_INDEX) | ||
| { | ||
| if (source_part->checksums.has(INDEX_FILE_PREFIX + command.column_name + ".idx")) | ||
| if (source_part->checksums.has(INDEX_FILE_PREFIX + command.column_name + ".idx2")) |
There was a problem hiding this comment.
I wonder when we are going to have proper metadata instead of this. With columns, the situation is already becoming unmanageable, I think Anton is going to add something like "serialization.txt" in his PR for sparse columns, which has the necessary metadata. Not sure it applies to the indexes though.
There was a problem hiding this comment.
I was thinking about header. But I decided not to put it here since this will complicate the patch.
Note that serialization.txt (looked briefly at #22535) will not solve such hunks automatically (you still need to know all the extension here, or add some interface into serializer to return them), while header will.
| } | ||
| else | ||
| { | ||
| min_val = Null(); |
There was a problem hiding this comment.
So the difference of the new format is that it can serialize null values for min and max separately? And the old format serialized nulls for both because "one is enough", as the comment said. But now we want more precise nullable support so we want to serialize both.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Add new index data skipping minmax index format for proper Nullable support
Detailed description / Documentation draft:
Note, that right now it includes #27197 will be rebased once it will be merged
Cc: @amosbird
Cc: @alexey-milovidov
Refs: #12455