Rewrite LIKE expressions with perfect prefix or suffix#85920
Rewrite LIKE expressions with perfect prefix or suffix#85920Ergus merged 41 commits intoClickHouse:masterfrom
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
|
@zheguang This is a nice optimization and the right direction, thanks. About these two cases:
It only makes sense (in my view) to implement this case if the speedup by the range search is bigger than the slowdown by the LOWER wrapper. Before implementing it, I can recommend testing the speed difference in a synthetic SQL test.
So NOT can be treated by simply inverting the range, right? For example, According to your comment, the overall speedup for TPC-H is indeed small but that is expected. ClickBench unfortunately doesn't have queries which would benefit from the optimization, so no speedup is expected there as well. Anyways, I am pretty sure that |
|
Workflow [PR], commit [b28218b] Summary: ❌
|
|
Wouldn't it be simpler to rewrite perfect prefix like as |
This comment was marked as resolved.
This comment was marked as resolved.
Thanks for the suggestion. My understanding is a range condition works better with an index, for example currently for primary keys. |
Thanks for the suggestion! So for recursion, I lean keeping. My thinking is, for one, this specific usage is in unit test only, and it's almost identical to one in another unit test, https://github.com/ClickHouse/ClickHouse/pull/85920/files#diff-508f54672d863c147616450f9419522b9a267e3ec04f352674a1085ff220408a. (So I ended up with a minor refactoring). As the test cases appear to have short expressions, I think the depth limited parsing currently in place is sufficient to prevent a stack overflow? Incidentally I tracked down the query analyzer's ClickHouse/src/Analyzer/InDepthQueryTreeVisitor.h Lines 195 to 205 in 9bb762c Conceptually though, rewriting this unit-test recursion (and also this query analyzer recursion) would entail a stack that grows until a subtree is fully resolved. |
|
@Ergus Could you double-check / review this PR, please? |
Ergus
left a comment
There was a problem hiding this comment.
LGTM. Still would like to see a simple benchmark comparing performance with and without this optimization enabled, but that's is not an stopper IMHO.
"tests/performance/like_perfect_affix_rewrite.xml" is doing exactly this, no? <!-- Test prefix matching rewrite -->
<query>
SELECT count() FROM tab
WHERE str LIKE {prefix_pattern}
SETTINGS optimize_rewrite_like_perfect_affix=0
</query>
<query>
SELECT count() FROM tab
WHERE str LIKE {prefix_pattern}
SETTINGS optimize_rewrite_like_perfect_affix=1
</query> |
|
@Ergus: Like to review and merge this PR? Thanks! |
I already approved this |
|
Hi @zheguang I just tried to merge this, but there is a new failure in The error looks like a pure format difference. Could you check it please? |
f302e55
This PR addresses Issue #71421.
This PR adds a query tree analyzer pass to rewrite
LIKEexpression with perfect affix (prefix or suffix) intostartsWithorendsWithfunctions. This rewrite can improve query performance on saving compute.The scope of this rewrite is
LIKE: rewrite intostartsWithfor perfect prefix andendsWithfor perfect suffixILIKE: do not uselower()based approach because it's suboptimal: ideally rewrite withstarts/endsWithCaseInsensitive(to be implemented in separate PR first, spike results based on https://github.com/zheguang/ClickHouse/tree/starts-endswith-case)StringorFixedString, nested or not withinLowCardinalityorNullableor both.FixedString(nested or not) is not rewritten, due to discrepancy betweenLIKEandendsWithon null-terminated strings: https://fiddle.clickhouse.com/519ac26c-1f77-4533-a892-bbaa2e4f669dOnce rewritten, the query tree with string functions will be taken up by the query execution plan. Here both the primary key analysis and regular scan will work with the string functions instead.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Improves performance of
LIKEwith prefix or suffix by using the new default settingoptimize_rewrite_like_perfect_affix.Documentation entry for user-facing changes