Allow any deterministic or injective functions in PK to prune granules in KeyCondition#92952
Conversation
amosbird
left a comment
There was a problem hiding this comment.
Overall LGTM. The idea is clear.
| /// however, there could be other functions that can throw on some inputs. | ||
| try | ||
| { | ||
| dag.actions->execute(block, false, true); |
There was a problem hiding this comment.
IIRC, we don't allow using exceptions as part of normal control flow.
There was a problem hiding this comment.
I am not very proud of the try-catch block, but I do not see a good alternative: we cannot know in advance whether a given constant can be transformed safely. ClickHouse has too many deterministic functions and many of them throw on invalid input. We also cannot let those errors propagate, because no pruning is better than the query failing to execute due to an exception during index analysis.
Let me know if you have any ideas on how to avoid this.
| if (out_column->isNullable()) | ||
| { | ||
| const auto & n = assert_cast<const ColumnNullable &>(*out_column); | ||
| for (char8_t b : n.getNullMapData()) |
There was a problem hiding this comment.
I don't understand the null-handling logic here.
There was a problem hiding this comment.
This null-handling is to check if the dag-transformed column contains any NULL or not. If it contains any NULL, then we reject and do not create the atom.
I have refactored the function to make it more readable. Let me know what you think.
|
|
||
| /// Looking for possible transformation of `column = constant` into `partition_expr = function(constant)` | ||
| bool KeyCondition::canConstantBeWrappedByFunctions( | ||
| bool KeyCondition::extractDeterministicFunctionsDagFromKey( |
There was a problem hiding this comment.
A comment with an example would help clarify what is being extracted.
| } | ||
|
|
||
|
|
||
| bool applyDeterministicDagToColumn( |
There was a problem hiding this comment.
A comment with an example would help explain the transformation.
| Example of a deterministic (but non-injective) primary key: | ||
| ```sql | ||
| ENGINE = MergeTree() | ||
| ORDER BY cityHash64(user_id) |
There was a problem hiding this comment.
Maybe length(user_id) will make a more intuitive example (not everyone might know why cityHash isn't injective, l. 308).
There was a problem hiding this comment.
Good point. I will update it.
|
This query: SET use_statistics=1;
SET optimize_move_to_prewhere=1;
DROP TABLE IF EXISTS test;
CREATE TABLE test (x Float64, y UInt64)
ENGINE = MergeTree
ORDER BY y
SETTINGS index_granularity=1, auto_statistics_types='minmax';
INSERT INTO test
SELECT if(number % 10 = 0, toFloat64('nan'), toFloat64(number)), number
FROM numbers(100000);
SELECT count() FROM test WHERE x < toFloat64('nan') AND y > 99990;has the following UBSAN error (found by This bug is also reproducible in |
|
Hello! Could you add a performance test showing data being skipped, then the query latency is improved? |
|
@PedroTadim The
|
Ok, maybe next time you could add a snapshot like that one to show the performance increase. |
…in-pk Allow any deterministic or injective functions in PK to prune granules in KeyCondition




Only deterministic expression in PK:
Example of a deterministic (but non-injective) primary key:
Example predicates that can use the index:
Deterministic + Injective expression in PK:
Example of an injective primary key:
Example predicates that can effectively use the index:
See
tests/queries/0_stateless/03778_deterministic_injective_functions_key_condition.sqlfor concrete examples.Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Allow any deterministic expression in Primary Key to be used for data skipping (e.g.
ORDER BY cityHash64(user_id)/ORDER BY length(user_id)). For deterministic expressions, ClickHouse can apply the expression to query constants and use the result in the primary key index for predicates like=,IN, andhas. If the expression is also injective (e.g.ORDER BY hex(p)orORDER BY reverse(tuple(reverse(p), hex(p)))), we can effectively use the index for the negated forms:!=,NOT IN, andNOT has. Closes #10685. Closes #82161.Documentation entry for user-facing changes