Skip to content

Conversation

@yashmayya
Copy link
Contributor

@yashmayya yashmayya commented Dec 6, 2024

  • 2x performance improvement by eliminating optimization overhead in CoreRules.FILTER_REDUCE_EXPRESSIONS from huge number of filter predicates created by PinotFilterExpandSearchRule (IN -> OR, see Multi-stage: Perf issue when IN expression has a lot of entries #13617).
  • This doesn't completely eliminate all the query planning overhead since there's still the issue from the SqlToRelConverter where we're converting IN to OR no matter the size of the IN list (see discussion in https://issues.apache.org/jira/browse/CALCITE-6467). The default Calcite threshold is 20 beyond which the IN is converted to a join with a static table; however, this causes query execution overhead for us since we aren't currently optimizing such joins and might need to pay unnecessary data shuffling cost.
  • The SqlToRelConverter part of the inefficiency will be fixed by upgrading Calcite to 1.39.0 which is due to be released soon.
  • The query compilation test added here took ~16s to run locally prior to this change and ~8s after this change. Here are the CPU flamegraphs:

flamegraph1

flamegraph2

  • We don't really need the PinotFilterExpandSearchRule because we already have logic in the RelNode -> PlanNode conversion phase that safely converts the SEARCH operators without the risk of incurring optimization overhead from other rules due to the conversion to a bunch of OR'd = expressions (we convert directly to IN / NOT IN instead). The logic has been updated to also be able to handle literal only SEARCH expressions that can be generated by Calcite's FILTER_REDUCE_EXPRESSIONS rule.

@yashmayya yashmayya added performance multi-stage Related to the multi-stage query engine labels Dec 6, 2024
@codecov-commenter
Copy link

codecov-commenter commented Dec 6, 2024

Codecov Report

Attention: Patch coverage is 41.66667% with 14 lines in your changes missing coverage. Please review.

Project coverage is 63.61%. Comparing base (59551e4) to head (30bdc03).
Report is 2287 commits behind head on master.

Files with missing lines Patch % Lines
...inot/query/planner/logical/RexExpressionUtils.java 41.66% 10 Missing and 4 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #14615      +/-   ##
============================================
+ Coverage     61.75%   63.61%   +1.86%     
- Complexity      207     1461    +1254     
============================================
  Files          2436     2772     +336     
  Lines        133233   156247   +23014     
  Branches      20636    23981    +3345     
============================================
+ Hits          82274    99404   +17130     
- Misses        44911    49355    +4444     
- Partials       6048     7488    +1440     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 63.59% <41.66%> (+1.88%) ⬆️
java-21 63.51% <41.66%> (+1.88%) ⬆️
skip-bytebuffers-false 63.61% <41.66%> (+1.86%) ⬆️
skip-bytebuffers-true 63.49% <41.66%> (+35.76%) ⬆️
temurin 63.61% <41.66%> (+1.86%) ⬆️
unittests 63.61% <41.66%> (+1.86%) ⬆️
unittests1 56.15% <41.66%> (+9.26%) ⬆️
unittests2 34.18% <0.00%> (+6.45%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@yashmayya yashmayya force-pushed the optimize-large-in-clauses branch from 037795c to 50c41d1 Compare March 10, 2025 14:13
@yashmayya yashmayya marked this pull request as ready for review March 10, 2025 15:36
@yashmayya yashmayya merged commit 2c7b0dd into apache:master Mar 11, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

multi-stage Related to the multi-stage query engine performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants