-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Fixes filter aggs perf #10356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes filter aggs perf #10356
Conversation
Codecov Report
@@ Coverage Diff @@
## master #10356 +/- ##
=============================================
+ Coverage 28.01% 70.37% +42.35%
- Complexity 58 5223 +5165
=============================================
Files 2021 2033 +12
Lines 109834 110191 +357
Branches 16710 16750 +40
=============================================
+ Hits 30772 77543 +46771
+ Misses 76033 27221 -48812
- Partials 3029 5427 +2398
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
You might need to modify some numDocsScanned numbers in some tests since we will scan less docs with the change
...src/main/java/org/apache/pinot/core/query/aggregation/function/AggregationFunctionUtils.java
Outdated
Show resolved
Hide resolved
|
Tests all passing (with decreased docs scanned 🎉) |
When creating the list of aggs (filtered and non-filtered)[1], function is checked for nullness and only if non-null, added to filteredAggregations list. Later, in AggregationFunctionUtils, .getLeft() is checked for nullness and used as a way to partition aggs with/without filters applied[2]. getLeft is never null because of the null check performed in [1]. Therefore no aggs are registered as being paired with the main predicate, instead causing the main filter predicate plan to .run() for each iteration of the loop[3][4] (buildFilterOperator -> new FilterPlanNode with accidental null filterContext i.e. use main filter predicate). Then, a CombinedFilterOperator is created using effectively 2 copies of mainPredicate AND’d together, and a pre-scan of the duplicated main predicate is run for each block[5][6]
I believe that the intent was to use getRight() (i.e. the filter context) for nullness check in [2] rather than [1]. I have this as a draft because BenchmarkQueries.java still seems to indicate that FILTER is significantly slower than CASE in some test scenarios, so there may be more to it.
Separately, an extra empty
AggregationFunction[]was being added to any query with only filtered aggs and no non-filtered aggs, causing many docs to be scanned without any purpose.BenchmarkQueries.javanow shows that FITLER is always as fast as CASE, often faster (just depends on index usage, mostly, but at least it’s never slower).[1]
pinot/pinot-core/src/main/java/org/apache/pinot/core/query/request/context/QueryContext.java
Line 593 in 135f13e
[2]
pinot/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/AggregationFunctionUtils.java
Line 226 in 135f13e
[3]
pinot/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/AggregationFunctionUtils.java
Line 233 in 135f13e
[4]
pinot/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/AggregationFunctionUtils.java
Line 190 in 135f13e
[5]
pinot/pinot-core/src/main/java/org/apache/pinot/core/operator/filter/CombinedFilterOperator.java
Line 62 in 246f4db
[6]
pinot/pinot-core/src/main/java/org/apache/pinot/core/operator/docidsets/FilterBlockDocIdSet.java
Line 48 in 246f4db