Skip to content

Disable data skipping indexes by default for queries with FINAL#34243

Merged
kitaisreal merged 1 commit intoClickHouse:masterfrom
azat:use_skip_indexes_if_final
Feb 2, 2022
Merged

Disable data skipping indexes by default for queries with FINAL#34243
kitaisreal merged 1 commit intoClickHouse:masterfrom
azat:use_skip_indexes_if_final

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented Feb 2, 2022

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Apply data skipping indexes for queries with FINAL may produce incorrect result. Disable data skipping indexes by default for queries with FINAL (introduce new use_skip_indexes_if_final setting and disable it by default)

NOTE: maybe this should be marked as backward incompatible, since this may decrease the performance. Thoughts?

Refs: #29405 (cc @kitaisreal )
Refs: #25940 (cc @kssenii )
Refs: #8684 (cc @CurtizJ @alexey-milovidov )

@robot-clickhouse robot-clickhouse added the pr-improvement Pull request with some product improvements label Feb 2, 2022
@kitaisreal kitaisreal self-assigned this Feb 2, 2022
This patch adds use_skip_indexes_if_final setting, that is OFF by
default. Since skipping data for queries with FINAL may produce
incorrect result.

Signed-off-by: Azat Khuzhin <[email protected]>
@azat azat force-pushed the use_skip_indexes_if_final branch from e90993c to 1d19851 Compare February 2, 2022 10:31
@kitaisreal
Copy link
Copy Markdown
Contributor

@azat could you please provide example, when using skip indexes in final can introduce different results. Maybe it is better to try fix this without setting.

@azat
Copy link
Copy Markdown
Member Author

azat commented Feb 2, 2022

could you please provide example, when using skip indexes in final can introduce different results. Maybe it is better to try fix this without setting.

There is example in the tests, I've added few comments into it, hope it helps:

-- This tests will show the difference in data with use_skip_indexes_if_final and w/o

CREATE TABLE data_02201 (
    key Int,
    value_max SimpleAggregateFunction(max, Int),
    INDEX idx value_max TYPE minmax GRANULARITY 1
)
Engine=AggregatingMergeTree()
ORDER BY key
PARTITION BY key;

SYSTEM STOP MERGES data_02201;

INSERT INTO data_02201 SELECT number, number FROM numbers(10);
INSERT INTO data_02201 SELECT number, number+1 FROM numbers(10);

--
-- So here we have index for value_max which is SimpleAggregateFunction(max), so it depends on merging the data:
--
SELECT * FROM data_02201 FINAL WHERE value_max = 1 ORDER BY key, value_max SETTINGS use_skip_indexes=1, use_skip_indexes_if_final=0;
-- 0	1
SELECT * FROM data_02201 FINAL WHERE value_max = 1 ORDER BY key, value_max SETTINGS use_skip_indexes=1, use_skip_indexes_if_final=1;
-- 0	1
-- 1	1

So the problem is kind of similar to the PREWHERE case (there is optimize_move_to_prewhere_if_final setting for it, discussion is in #8684).

@kitaisreal
Copy link
Copy Markdown
Contributor

@azat make sense to introduce such setting for consistency with optimize_move_to_prewhere_if_final .

@kitaisreal kitaisreal merged commit 57d16ba into ClickHouse:master Feb 2, 2022
@azat azat deleted the use_skip_indexes_if_final branch February 3, 2022 04:40
@beda42
Copy link
Copy Markdown

beda42 commented Jun 7, 2022

Hello, sorry for commenting on an already merged pull request. I just got bitten by this change of behavior where my queries using skip index got suddenly very much slower after Clickhouse upgrade.

I would like to understand the conditions where the problem with using a skip index can and cannot occur. Is it only relevant for indexes where aggregation is involved? I am using a set(0) index on plain integer values. Is it safe to turn use_skip_indexes_if_final on in such a case?

@azat
Copy link
Copy Markdown
Member Author

azat commented Jun 30, 2022

Hi @beda42 , sorry for the delay, and no problem on commenting here.

I would like to understand the conditions where the problem with using a skip index can and cannot occur. Is it only relevant for indexes where aggregation is involved?

Yes, and not just aggregation, but SimpleAggregateFunction.

I am using a set(0) index on plain integer values. Is it safe to turn use_skip_indexes_if_final on in such a case?

Your case should be fine.

@x0st
Copy link
Copy Markdown

x0st commented Apr 28, 2024

@beda42

CREATE TABLE test
(
    `phone` String,
    `arr` Array(UInt32),
    `v` UInt16,
    INDEX arr_idx arr TYPE set(0) GRANULARITY 3
) ENGINE = ReplacingMergeTree(v) ORDER BY phone SETTINGS index_granularity = 8192;
insert into test (phone, arr, v) values ('+123', [1, 2], 1);
insert into test (phone, arr, v) values ('+123', [1, 2, 3], 2);
set use_skip_indexes_if_final = 1;
select * from test final where has(arr, 3) = 0;

produces

┌─phone─┬─arr───┬─v─┐
│ +123  │ [1,2] │ 1 │
└───────┴───────┴───┘

which is incorrect :)

so, use_skip_indexes_if_final = 1 won't work correctly with set(0)

@gaurang101197
Copy link
Copy Markdown

Does it make sense to enable skipping index on column which does not change ? Like high cardinality id columns which does not change. Is it possible to selectively enable specific skipping index ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants