Skip to content

Add new index data skipping minmax index format for proper Nullable support#27250

Merged
akuzm merged 1 commit intoClickHouse:masterfrom
azat:minmax-nullable-new-format
Aug 11, 2021
Merged

Add new index data skipping minmax index format for proper Nullable support#27250
akuzm merged 1 commit intoClickHouse:masterfrom
azat:minmax-nullable-new-format

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented Aug 5, 2021

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Add new index data skipping minmax index format for proper Nullable support

Detailed description / Documentation draft:
Note, that right now it includes #27197 will be rebased once it will be merged

Cc: @amosbird
Cc: @alexey-milovidov

Refs: #12455

@robot-clickhouse robot-clickhouse added the pr-improvement Pull request with some product improvements label Aug 5, 2021
@azat
Copy link
Copy Markdown
Member Author

azat commented Aug 5, 2021

AST fuzzer (ASan) — 0x604001408d10 is located 0 bytes inside of 48-byte region [0x604001408d10,0x604001408d40)

0x604001408d10 is located 0 bytes inside of 48-byte region [0x604001408d10,0x604001408d40) SUMMARY: AddressSanitizer: heap-use-after-free obj-x86_64-linux-gnu/../src/Interpreters/AsynchronousMetrics.cpp in DB::AsynchronousMetrics::update(std::__1::chrono::time_point > >) Received signal -3 Received signal Unknown signal (-3)

#27266

…upport

Note, that it cannot be done w/o new extension, since index does not
have any header.

v2: use IDisk interface for existence check
v3: remove extra file existence check
v4: fix MATERIALIZE INDEX
@azat azat force-pushed the minmax-nullable-new-format branch from ab0712b to 038241b Compare August 8, 2021 16:30

// NULL_LAST
if (min_val.isNull())
min_val = PositiveInfinity();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhh so this questionable PR that added infinities into the variant type was merged...
By the way, can you serialize it as PositiveInfinity right away? Having to convert it afterwards looks weird.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can have a small function deserializeBinaryNullAsPositiveInfinity() and put the logic there. These infinity varants share the same serialization bits as NULL.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can have a small function deserializeBinaryNullAsPositiveInfinity() and put the logic the

@amosbird You mean MergeTreeIndexgranuleMinMax::deserializeBinaryNullAsPositiveInfinity() right?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I just realized that we don't use writeBinary(Field) here, but instead for some reason we use the data type serialization to serialize a Field, which obviously can't serialize the special Field values. Never mind, it's all equally awkward, let's just keep it as is.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So instead we have to serialize what we can, and then guess what it was. At least it's always NULLS LAST, for now.

if (command.type == MutationCommand::Type::DROP_INDEX)
{
if (source_part->checksums.has(INDEX_FILE_PREFIX + command.column_name + ".idx"))
if (source_part->checksums.has(INDEX_FILE_PREFIX + command.column_name + ".idx2"))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder when we are going to have proper metadata instead of this. With columns, the situation is already becoming unmanageable, I think Anton is going to add something like "serialization.txt" in his PR for sparse columns, which has the necessary metadata. Not sure it applies to the indexes though.

Copy link
Copy Markdown
Member Author

@azat azat Aug 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about header. But I decided not to put it here since this will complicate the patch.
Note that serialization.txt (looked briefly at #22535) will not solve such hunks automatically (you still need to know all the extension here, or add some interface into serializer to return them), while header will.

}
else
{
min_val = Null();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the difference of the new format is that it can serialize null values for min and max separately? And the old format serialized nulls for both because "one is enough", as the comment said. But now we want more precise nullable support so we want to serialize both.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right.

Copy link
Copy Markdown
Contributor

@akuzm akuzm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants