Skip to content

Conversation

@gortiz
Copy link
Contributor

@gortiz gortiz commented Jul 15, 2024

We have recently upgrade our Calcite dependency from 1.31 to 1.37, which impacted some queries negatively.

Specifically multi-stage queries using IN with a very large set are spending too much time on broker trying to optimize the query. When using close to 500 entries in the IN set my personal computer was spending close to 6 seconds just to optimize the query.

That was due to some new optimizations added in Calcite 1.32 but mainly due to the way we were using Calcite. Specifically, we were expanding all SEARCH expressions into ORs. That may have been needed in the past when PIPELINE_BREAKER was not implemented in Pinot, but right now it doesn't seem to be needed. But even if we need it sometimes, it is not acceptable to spent so much time on optimization phase.

Given it doesn't look like Calcite optimizations can be turned off, this PR changes Pinot to:

  1. Use Calcite default inSubQueryThreshold, which is 20.
  2. Modify PinotFilterExpandSearchRule so SEARCH expressions are not expanded when they are not range based and the number of elements is larger than 20.

We may add in the future a way to configure that threshold with some config parameter.

@gortiz gortiz requested review from Jackie-Jiang and xiangfu0 July 15, 2024 14:30
@codecov-commenter
Copy link

codecov-commenter commented Jul 15, 2024

Codecov Report

Attention: Patch coverage is 60.00000% with 4 lines in your changes missing coverage. Please review.

Project coverage is 61.98%. Comparing base (59551e4) to head (8356f88).
Report is 758 commits behind head on master.

Files Patch % Lines
...calcite/rel/rules/PinotFilterExpandSearchRule.java 55.55% 1 Missing and 3 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #13605      +/-   ##
============================================
+ Coverage     61.75%   61.98%   +0.23%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2555     +119     
  Lines        133233   140625    +7392     
  Branches      20636    21822    +1186     
============================================
+ Hits          82274    87167    +4893     
- Misses        44911    46848    +1937     
- Partials       6048     6610     +562     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 61.91% <60.00%> (+0.20%) ⬆️
java-21 61.84% <60.00%> (+0.21%) ⬆️
skip-bytebuffers-false 61.93% <60.00%> (+0.18%) ⬆️
skip-bytebuffers-true 61.82% <60.00%> (+34.09%) ⬆️
temurin 61.98% <60.00%> (+0.23%) ⬆️
unittests 61.98% <60.00%> (+0.23%) ⬆️
unittests1 46.46% <60.00%> (-0.43%) ⬇️
unittests2 27.73% <0.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Jackie-Jiang Jackie-Jiang added enhancement multi-stage Related to the multi-stage query engine labels Jul 15, 2024
@Jackie-Jiang
Copy link
Contributor

Jackie-Jiang commented Jul 16, 2024

I've made another attempt in #13614 which is trying to keep the same behavior but avoid the expensive operation.

IMO the best solution is to directly keep IN/NOT IN to:

  • Avoid the overhead of pipeline breaker
  • Avoid the overhead of OR/AND

Need more time to explore if this is doable

@gortiz
Copy link
Contributor Author

gortiz commented Jul 17, 2024

Even we may need to apply something like this in the future, we are going to let it be in standby for now given we don't know the impact that join main have, specially in NOT IN case

@gortiz gortiz closed this Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement multi-stage Related to the multi-stage query engine

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants