-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Secondary indexes should be used for queries with FINAL #70292
Description
Describe the situation
Filtering by secondary indexes is disabled for queries with FINAL, because skipping a granule can change the presence or values of records in other granules for the same sorting key as for skipped records.
See #34243
How to solve
If a query contains FINAL, secondary indices should retain granules that could contain records with the same value of the sorting key as in the records in granules that are not going to be skipped.
As a result, they will skip less amount of granules because skipped ranges will be smaller.
Example
Let's suppose we have a table
CREATE TABLE table
(
id String,
payload String,
PRIMARY KEY (id),
INDEX ix (payload) TYPE = set(100)
)
ENGINE = ReplacingMergeTree
containing the following granules:
granule 1: primary key: A ix: (abc, hello)
id payload
---
A abc
A hello
B abc
granule 2: primary key: B ix: (world, xyz)
---
B world
B xyz
granule 3: primary key: B ix: (hello)
---
B hello
granule 4: primary key: C ix: (abc)
---
C abc
granule 5: primary key: C ix: (def)
---
C def
granule 6: primary key: D ix: (world)
---
D world
If we do SELECT * FROM table, it returns:
A abc
A hello
B abc
B world
B xyz
B hello
C abc
C def
D world
If we do SELECT * FROM table FINAL, it returns:
A hello
B hello
C def
D world
If we do SELECT * FROM table WHERE payload = 'world', it returns:
B world
D world
If we do SELECT * FROM table FINAL WHERE payload = 'world' with the setting use_skip_indexes_if_final, it returns:
B world
D world
while it should return
D world
because the index skipped the granule 3 which contained a replacement for key B, while it should not.
Let's look at the granules, under the filter payload = 'world':
SKIPPED granule 1: primary key: A possible values of PK: [A..B] ix: (abc, hello)
PASSED granule 2: primary key: B possible values of PK: [B..B] ix: (world, xyz)
SKIPPED granule 3: primary key: B possible values of PK: [B..C] ix: (hello)
SKIPPED granule 4: primary key: C possible values of PK: [C..C] ix: (abc)
SKIPPED granule 5: primary key: D possible values of PK: [C..D] ix: (def)
PASSED granule 6: primary key: D possible values of PK: [D..?] ix: (world)
Granule 1 is skipped. But it could contain the same values of PK (B) as in the passed granule 2 - then it should not be skipped.
Granule 3 is skipped. But it could contain the same values of PK (B) as in the passed granule 2 - then it should not be skipped.
Granule 4 is skipped. It is ok because the values of PK in this granule (C) don't intersect with any of the passed granules.
Granule 5 is skipped. But it could contain the same values of PK (D) as in the passed granule 6 - then it should not be skipped.
The correct application of the index should look like this:
PASSED granule 1: primary key: A possible values of PK: [A..B] ix: (abc, hello)
PASSED granule 2: primary key: B possible values of PK: [B..B] ix: (world, xyz)
PASSED granule 3: primary key: B possible values of PK: [B..C] ix: (hello)
SKIPPED granule 4: primary key: C possible values of PK: [C..C] ix: (abc)
PASSED granule 5: primary key: D possible values of PK: [C..D] ix: (def)
PASSED granule 6: primary key: D possible values of PK: [D..?] ix: (world)