Skip to content

Conversation

@egalpin
Copy link
Member

@egalpin egalpin commented Feb 28, 2023

When creating the list of aggs (filtered and non-filtered)[1], function is checked for nullness and only if non-null, added to filteredAggregations list. Later, in AggregationFunctionUtils, .getLeft() is checked for nullness and used as a way to partition aggs with/without filters applied[2]. getLeft is never null because of the null check performed in [1]. Therefore no aggs are registered as being paired with the main predicate, instead causing the main filter predicate plan to .run() for each iteration of the loop[3][4] (buildFilterOperator -> new FilterPlanNode with accidental null filterContext i.e. use main filter predicate). Then, a CombinedFilterOperator is created using effectively 2 copies of mainPredicate AND’d together, and a pre-scan of the duplicated main predicate is run for each block[5][6]

I believe that the intent was to use getRight() (i.e. the filter context) for nullness check in [2] rather than [1]. I have this as a draft because BenchmarkQueries.java still seems to indicate that FILTER is significantly slower than CASE in some test scenarios, so there may be more to it.

Separately, an extra empty AggregationFunction[] was being added to any query with only filtered aggs and no non-filtered aggs, causing many docs to be scanned without any purpose.

BenchmarkQueries.java now shows that FITLER is always as fast as CASE, often faster (just depends on index usage, mostly, but at least it’s never slower).

[1]


[2]
[3]
buildFilterOperator(indexSegment, queryContext, currentFilterExpression);

[4]
[5]
FilterBlockDocIdSet mainFilterDocIdSet = _mainFilterOperator.nextBlock().getNonScanFilterBLockDocIdSet();

[6]

@codecov-commenter
Copy link

codecov-commenter commented Mar 1, 2023

Codecov Report

Merging #10356 (b16d8bc) into master (4258824) will increase coverage by 42.35%.
The diff coverage is 100.00%.

@@              Coverage Diff              @@
##             master   #10356       +/-   ##
=============================================
+ Coverage     28.01%   70.37%   +42.35%     
- Complexity       58     5223     +5165     
=============================================
  Files          2021     2033       +12     
  Lines        109834   110191      +357     
  Branches      16710    16750       +40     
=============================================
+ Hits          30772    77543    +46771     
+ Misses        76033    27221    -48812     
- Partials       3029     5427     +2398     
Flag Coverage Δ
integration1 24.44% <0.00%> (-0.17%) ⬇️
integration2 24.56% <0.00%> (+0.12%) ⬆️
unittests1 67.80% <100.00%> (?)
unittests2 13.76% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...t/core/operator/filter/CombinedFilterOperator.java 84.61% <100.00%> (+84.61%) ⬆️
...aggregation/function/AggregationFunctionUtils.java 80.20% <100.00%> (+44.03%) ⬆️
...pache/pinot/core/query/optimizer/filter/Range.java 87.75% <0.00%> (-4.09%) ⬇️
...pache/pinot/core/query/utils/idset/EmptyIdSet.java 25.00% <0.00%> (ø)
...ot/common/function/scalar/ComparisonFunctions.java 42.85% <0.00%> (ø)
...anager/realtime/SegmentBuildTimeLeaseExtender.java 63.23% <0.00%> (ø)
...nction/DistinctCountBitmapAggregationFunction.java 54.92% <0.00%> (ø)
...apache/pinot/ingestion/jobs/SegmentTarPushJob.java 70.27% <0.00%> (ø)
...org/apache/pinot/ingestion/utils/PushLocation.java 72.72% <0.00%> (ø)
...ker/api/resources/BrokerEchoWithAutoDiscovery.java 0.00% <0.00%> (ø)
... and 1402 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@egalpin egalpin changed the title Fixes mixup of getRight/getLeft resulting in many non-filter aggs Fixes filter aggs perf Mar 1, 2023
@egalpin egalpin marked this pull request as ready for review March 1, 2023 19:59
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

You might need to modify some numDocsScanned numbers in some tests since we will scan less docs with the change

@egalpin
Copy link
Member Author

egalpin commented Mar 1, 2023

Tests all passing (with decreased docs scanned 🎉)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants