Skip to content

Nullable primary key with correct KeyCondition#12455

Merged
alexey-milovidov merged 3 commits intoClickHouse:masterfrom
amosbird:npc
Jul 18, 2021
Merged

Nullable primary key with correct KeyCondition#12455
alexey-milovidov merged 3 commits intoClickHouse:masterfrom
amosbird:npc

Conversation

@amosbird
Copy link
Copy Markdown
Collaborator

@amosbird amosbird commented Jul 13, 2020

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Now KeyConditions can correctly skip nullable keys, including isNull and isNotNull. #12433 Cond.

PS. this pr also piggybacks a test for Field comparison. Let's see if CI complains.

-- Update

It seems Null comparison in Field.h can have the same semantic as in SQL.

Detailed description / Documentation draft:

Nothing special. NOTE, after creating a MergeTree table with nullable primary keys, ClickHouse won't be able to rollabck to previous versions. (potential backward incompatibility)

@amosbird
Copy link
Copy Markdown
Collaborator Author

@robot-clickhouse robot-clickhouse added the pr-improvement Pull request with some product improvements label Jul 13, 2020
@amosbird amosbird force-pushed the npc branch 3 times, most recently from abc2ba5 to cbafe08 Compare July 14, 2020 02:57
@amosbird
Copy link
Copy Markdown
Collaborator Author

After futher thoughts, two special Field types are added (-Inf and +Inf) which explicitly helps special comparisons such as NULL_LAST. Field Null should not be used as comparison argument.

@amosbird amosbird force-pushed the npc branch 2 times, most recently from f081a12 to 345db99 Compare July 14, 2020 13:37
@Akazz Akazz self-assigned this Jul 14, 2020
@amosbird amosbird force-pushed the npc branch 4 times, most recently from bfb1212 to b820ace Compare July 21, 2020 02:35
@amosbird amosbird force-pushed the npc branch 2 times, most recently from e257ee2 to 0824256 Compare July 28, 2020 15:37
@zlobober
Copy link
Copy Markdown
Contributor

@amosbird, Sorry for the late reply. I like idea of introducing special fields '-inf' and '+inf' very much, it would make much easier expressing open/half-open intervals and rays in some situations. Do you have any plans on finishing this PR?

To make it clear: I am not a ClickHouse maintainer, I am developing an internal Yandex software using ClickHouse as computational engine. Your proposed change could actually help in my case :)

@amosbird
Copy link
Copy Markdown
Collaborator Author

amosbird commented Jul 31, 2020

@zlobober Hi, this pr is already finished. I've covered almost all cases related to Nullable primary keys. It's currently waiting for review.

@azat
Copy link
Copy Markdown
Member

azat commented Aug 1, 2020

FWIW I've applied this PR into my build and I did not find any problems yet

src/Core/Field.h Outdated
Comment on lines 244 to 257
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amosbird can you update your PR after recent changes?

2020-09-03 05:55:34 /ClickHouse/src/Core/Field.h:462:17: error: enumeration values 'Null', 'NegativeInfinity', and 'PositiveInfinity' not handled in switch [-Werror,-Wswitch]
2020-09-03 05:55:34         switch (which)
2020-09-03 05:55:34                 ^
2020-09-03 05:55:34 /ClickHouse/src/Core/Field.h:497:17: error: enumeration values 'Null', 'NegativeInfinity', and 'PositiveInfinity' not handled in switch [-Werror,-Wswitch]
2020-09-03 05:55:34         switch (which)
2020-09-03 05:55:34                 ^

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@azat
Copy link
Copy Markdown
Member

azat commented Oct 13, 2020

@alexey-milovidov what is the status of this PR? Are there some concerns about it?

@alexey-milovidov
Copy link
Copy Markdown
Member

@azat I did not review this PR and no one from my team decided to take it.

@alexey-milovidov
Copy link
Copy Markdown
Member

Yandex projects require manual fixing from our side due to changed interfaces...

@amosbird
Copy link
Copy Markdown
Collaborator Author

Weird CI errors...

@azat
Copy link
Copy Markdown
Member

azat commented Apr 20, 2021

after creating a MergeTree table with nullable primary keys, ClickHouse won't be able to rollabck to previous versions. (potential backward incompatibility)

@amosbird BTW right now if you will run newer clickhouse and then rollback to the version w/o this patch you will not get any errors, but simply index analysis will work incorrectly (for Nullable), maybe it worth adding some guard against this?

@amosbird
Copy link
Copy Markdown
Collaborator Author

after creating a MergeTree table with nullable primary keys, ClickHouse won't be able to rollabck to previous versions. (potential backward incompatibility)

@amosbird BTW right now if you will run newer clickhouse and then rollback to the version w/o this patch you will not get any errors, but simply index analysis will work incorrectly (for Nullable), maybe it worth adding some guard against this?

Hmm, what kind of guards do you suggest?

@azat
Copy link
Copy Markdown
Member

azat commented Apr 21, 2021

Hmm, what kind of guards do you suggest?

I guess new setting for MergeTree will be enough

@alexey-milovidov alexey-milovidov self-assigned this Jun 21, 2021
@alexey-milovidov alexey-milovidov merged commit b52411a into ClickHouse:master Jul 18, 2021
Comment on lines -41 to +43

if (!type->isNullable())
{
serialization->serializeBinary(hyperrectangle[i].left, ostr);
serialization->serializeBinary(hyperrectangle[i].right, ostr);
}
else
{
bool is_null = hyperrectangle[i].left.isNull() || hyperrectangle[i].right.isNull(); // one is enough
writeBinary(is_null, ostr);
if (!is_null)
{
serialization->serializeBinary(hyperrectangle[i].left, ostr);
serialization->serializeBinary(hyperrectangle[i].right, ostr);
}
}
serialization->serializeBinary(hyperrectangle[i].left, ostr);
serialization->serializeBinary(hyperrectangle[i].right, ostr);
Copy link
Copy Markdown
Member

@azat azat Aug 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change breaks on disk format (and the relevant change in the MergeTreeIndexGranuleMinMax::deserializeBinary)

Too bad that I checked this patchset only for compatibility after reverting this patch - #12455 (comment) (use case: I've applied it manually, then revert it, and data skipping indexes over Nullable column had been broken)

But this patchset actually breaks compatibility with older versions of clickhouse for Nullable data skipping indexes after simple upgrade:

Here is a simple reproducer:

--
-- run this with 21.6 or similar (i.e. w/o this patch)
--

CREATE TABLE data
(
    `key` Int,
    `value` Nullable(Int),
    INDEX value_index value TYPE minmax GRANULARITY 1
)
ENGINE = MergeTree
ORDER BY key;

INSERT INTO data SELECT
    number,
    number
FROM numbers(10000);

SELECT * FROM data WHERE value = 20000 SETTINGS force_data_skipping_indices = 'value_index' SETTINGS force_data_skipping_indices = 'value_index', max_rows_to_read=1; 
-- now upgrade and run the query again
SELECT * FROM data WHERE value = 20000 SETTINGS force_data_skipping_indices = 'value_index' SETTINGS force_data_skipping_indices = 'value_index', max_rows_to_read=1;

And it will fail because of on disk format changes:

$ ll --time-style=+ data/*/data/all_1_1_0/skp*.idx
-rw-r----- 1 azat azat 36  data/with_nullable_patch/data/all_1_1_0/skp_idx_value_index.idx
-rw-r----- 1 azat azat 37  data/without_nullable_patch/data/all_1_1_0/skp_idx_value_index.idx

$ md5sum data/*/data/all_1_1_0/skp*.idx
a19c95c4a14506c65665a1e30ab404bf  data/with_nullable_patch/data/all_1_1_0/skp_idx_value_index.idx
e50e2fcfa873b232196623d56ab26105  data/without_nullable_patch/data/all_1_1_0/skp_idx_value_index.idx

Nice that there is no stable release with this patch included yet.
But personally I, don't use stable releases and how I wish this patchset created new on disk format...

Note that you may create data skipping indexes over Nullable column even before this patchset

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind sending a PR restoring old on disk format?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

azat added a commit to azat/ClickHouse that referenced this pull request Aug 4, 2021
[1] breaks on disk format (and the relevant change in the:

  [1]: ClickHouse#12455 (comment)

Too bad that I checked this patchset only for compatibility after
reverting this patch [2] (use case: I've applied it manually, then
revert it, and data skipping indexes over Nullable column had been
broken)

  [2]: ClickHouse#12455 (comment)

But this patchset actually breaks compatibility with older versions of
clickhouse for Nullable data skipping indexes after simple upgrade:

Here is a simple reproducer:

    --
    -- run this with 21.6 or similar (i.e. w/o this patch)
    --

    CREATE TABLE data
    (
        `key` Int,
        `value` Nullable(Int),
        INDEX value_index value TYPE minmax GRANULARITY 1
    )
    ENGINE = MergeTree
    ORDER BY key;

    INSERT INTO data SELECT
        number,
        number
    FROM numbers(10000);

    SELECT * FROM data WHERE value = 20000 SETTINGS force_data_skipping_indices = 'value_index' SETTINGS force_data_skipping_indices = 'value_index', max_rows_to_read=1;

Now upgrade and run the query again:

    SELECT * FROM data WHERE value = 20000 SETTINGS force_data_skipping_indices = 'value_index' SETTINGS force_data_skipping_indices = 'value_index', max_rows_to_read=1;

And it will fail because of on disk format changes:

    $ ll --time-style=+ data/*/data/all_1_1_0/skp*.idx
    -rw-r----- 1 azat azat 36  data/with_nullable_patch/data/all_1_1_0/skp_idx_value_index.idx
    -rw-r----- 1 azat azat 37  data/without_nullable_patch/data/all_1_1_0/skp_idx_value_index.idx

    $ md5sum data/*/data/all_1_1_0/skp*.idx
    a19c95c4a14506c65665a1e30ab404bf  data/with_nullable_patch/data/all_1_1_0/skp_idx_value_index.idx
    e50e2fcfa873b232196623d56ab26105  data/without_nullable_patch/data/all_1_1_0/skp_idx_value_index.idx

Note, that there is no stable release with this patch included yet, so
no need to backport.

Also note that you may create data skipping indexes over Nullable
column even before [3].

  [3]: ClickHouse#12455

v2: break cases when granulas has Null in values due to backward
compatibility
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants