Enhance `and` filter predicate evaluation efficiency #9336

jasperjiaguo · 2022-09-05T01:59:20Z

For and connected predicates, evaluate the bitmap and with the lowest number of matching docIds first, so that we limit the number of roaring bitmap containers for comparison from the beginning, this will minimize the effort of and application

Running BenchmarkAndDocIdIterator:

speedup: around 3x after reordering suboptimal bitmap array of size 5: matching % [5/10, 4/10, 3/10, 2/10, 1/10] -> [1/10, 2/10, 3/10, 4/10, 5/10]
sorting overhead: 2.5%
please take a look at BenchmarkAndDocIdIterator.java for details

codecov-commenter · 2022-09-05T02:50:32Z

Codecov Report

Merging #9336 (1c56dd7) into master (0f4bcfc) will decrease coverage by 1.33%.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##             master    #9336      +/-   ##
============================================
- Coverage     69.80%   68.47%   -1.34%     
- Complexity     4777     5069     +292     
============================================
  Files          1875     1884       +9     
  Lines         99860   100283     +423     
  Branches      15194    15253      +59     
============================================
- Hits          69706    68667    -1039     
- Misses        25231    26681    +1450     
- Partials       4923     4935      +12

Flag	Coverage Δ
integration1	`26.18% <100.00%> (-0.02%)`	⬇️
integration2	`?`
unittests1	`66.94% <100.00%> (-0.16%)`	⬇️
unittests2	`15.30% <0.00%> (+0.04%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...che/pinot/core/operator/docidsets/AndDocIdSet.java	`100.00% <100.00%> (ø)`
...t/core/plan/StreamingInstanceResponsePlanNode.java	`0.00% <0.00%> (-100.00%)`	⬇️
...ore/operator/streaming/StreamingResponseUtils.java	`0.00% <0.00%> (-100.00%)`	⬇️
...server/starter/helix/SegmentReloadStatusValue.java	`0.00% <0.00%> (-100.00%)`	⬇️
.../operator/blocks/results/MetadataResultsBlock.java	`0.00% <0.00%> (-100.00%)`	⬇️
...ager/realtime/PeerSchemeSplitSegmentCommitter.java	`0.00% <0.00%> (-100.00%)`	⬇️
...urces/ServerReloadControllerJobStatusResponse.java	`0.00% <0.00%> (-100.00%)`	⬇️
...ator/streaming/StreamingSelectionOnlyOperator.java	`0.00% <0.00%> (-90.00%)`	⬇️
...he/pinot/core/plan/StreamingSelectionPlanNode.java	`0.00% <0.00%> (-88.89%)`	⬇️
...che/pinot/common/auth/StaticTokenAuthProvider.java	`0.00% <0.00%> (-87.50%)`	⬇️
... and 195 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Jackie-Jiang

This is a great optimization. Can you please share some numbers of the gain? We need to evaluate the overhead of cardinality computation vs the gain. We can also explore if there are some methods we can use to do AND on multiple bitmaps efficiently

jasperjiaguo · 2022-09-09T03:02:39Z

@Jackie-Jiang I have added a benchmark case and we can try with different bitmap array setups. This will be the first PR of the AND optimization. Still thinking how to optimize the AndDocIdSet for mixed bitmap/scanning with column cardinality.

siddharthteotia · 2022-09-09T05:24:58Z

@jasperjiaguo you might want to capture a Flamegraph as well (attach yourkit to local IDE)...

@Jackie-Jiang - we plan to do a couple of optimizations including this one in multiple PRs. For example:

making decision using the cardinality (higher first) and reordering the tree potentially (but could be less useful than this since this is accurate)
skipping inverted index based evaluation for very low cardinality. We have a Flamegraph for this that shows the clear overhead.

@jasperjiaguo - also add the applyAnd to this list.

siddharthteotia · 2022-09-09T05:32:18Z

pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkAndDocIdIterator.java

+
+@State(Scope.Benchmark)
+public class BenchmarkAndDocIdIterator {
+  private static final int NUM_DOCS = 10_000;


May be try on few different cardinalities / selectivities ?

The getDocIds().getCardinality() performance will vary accordingly imo.

getCardinality() doesn't really count the cardinality every time from scratch, instead it sums the saved cardinality counter from each container, I wouldn't be too worried about it's performance, but will definitely try on a few different cardinalities / selectivities setups.

That depends if you have runs or not, if there are runs the cardinality needs to be computed by summing the lengths of the runs. In general computing the cardinality is quite cheap though.

siddharthteotia

LGTM. We can also put this under a flag for now with disabled. Can then do more testing on perf cluster using production data using our respective use cases and enable in OSS eventually.

richardstartin · 2022-09-09T11:33:14Z

Please take a look at BufferFastAggregation.and which eliminates most work in an intersection by exploiting the bitmaps' tree structure to eliminate work and should produce a very sparse bitmap except in degenerate cases.

Jackie-Jiang · 2022-09-09T21:41:50Z

Can we evaluate the solution by @richardstartin and see if that gives better result? I feel it should give better performance because cardinality itself cannot represent the distribution of the values.

jasperjiaguo · 2022-09-09T22:13:41Z

@richardstartin @Jackie-Jiang Yes I have done some experiments with BufferFastAggregation.and, code see #9370. However, it didn't give very promising results:

using BufferFastAggregation alone without sorting yields an 1.1x speedup while 60% overhead in degenerate case
sorting on top of this gives marginal speedup

jasperjiaguo · 2022-09-09T23:17:23Z

new results with more randomized degenerate test input, can see that the overhead of getting cardinality and sorting is really small (below the margine of error)

However the overhead of BufferFastAggregation is really large (~160% more !!!) in this randomized degenerate case. We probably want to evaluate BufferFastAggregation.and more

jasperjiaguo · 2022-09-10T00:23:57Z

another selectivity setup: [5/25, 4/25, 3/25, 2/25/, 1/25]

richardstartin · 2022-09-10T01:50:49Z

At a glance not using the stream API to convert the iterators to bitmaps and using the array overload instead of the iterator overload should improve the results. I seem to recall BufferFastAggregation.and being more efficient for larger numbers of bitmaps, so it may not shine in this particular context.

jasperjiaguo · 2022-09-10T05:34:48Z

At a glance not using the stream API to convert the iterators to bitmaps and using the array overload instead of the iterator overload should improve the results. I seem to recall BufferFastAggregation.and being more efficient for larger numbers of bitmaps, so it may not shine in this particular context.

Modified to use array and blackhole. Updated the results. Briefly looked at the workShyAnd and it seems to use the similar idea of intersecting the containers prior to full intersection to minimize the range. It could be the extra preparation work is not worth it when number of bitmaps are small.

jasperjiaguo · 2022-09-10T05:55:00Z

Another set up with low speedup, but the overhead of degeneration remains small

siddharthteotia · 2022-09-12T21:45:52Z

I think the gain here is reasonable. Since BufferFastAggregation based intersection didn't show a clear benefit here, I think we should investigate that separately for larger number of bitmaps. @jasperjiaguo will follow-up with a separate PR on that and continue on #9370

siddharthteotia · 2022-09-16T23:10:36Z

@jasperjiaguo you might want to capture a Flamegraph as well (attach yourkit to local IDE)...

@Jackie-Jiang - we plan to do a couple of optimizations including this one in multiple PRs. For example:

making decision using the cardinality (higher first) and reordering the tree potentially (but could be less useful than this since this is accurate)

skipping inverted index based evaluation for very low cardinality. We have a Flamegraph for this that shows the clear overhead.

@jasperjiaguo - also add the applyAnd to this list.

It seems like point (2) above about skipping index based eval and use scan instead is same as suggested by @richardstartin in #7600

Let me try to find the profile we had for this. Want to see if we are talking about similar or two different potential optimizations

jasperjiaguo added 2 commits September 4, 2022 18:54

Improve the efficiency of and evaluation

03237f9

Improve the efficiency of and evaluation

f268f99

Improve the efficiency of and evaluation

44623b1

Jackie-Jiang reviewed Sep 7, 2022

View reviewed changes

jasperjiaguo added 3 commits September 8, 2022 19:42

Improve the efficiency of and evaluation

2dc306e

Improve the efficiency of and evaluation

584e946

Improve the efficiency of and evaluation

ce03a31

Improve the efficiency of and evaluation

4cd639d

jasperjiaguo marked this pull request as ready for review September 9, 2022 03:15

jasperjiaguo changed the title ~~[WIP][Not ready for review] Enhance and filter predicate evaluation efficiency~~ Enhance and filter predicate evaluation efficiency Sep 9, 2022

siddharthteotia reviewed Sep 9, 2022

View reviewed changes

siddharthteotia approved these changes Sep 9, 2022

View reviewed changes

jasperjiaguo mentioned this pull request Sep 9, 2022

[DO NOT MERGE][Benchmark] And evaluation enhancement benchmark with BufferFastAggregation.and #9370

Closed

jasperjiaguo added 4 commits September 9, 2022 15:40

Improve the efficiency of and evaluation

23be2b0

Improve the efficiency of and evaluation

436cad7

Improve the efficiency of and evaluation

43ccf75

Improve the efficiency of and evaluation

acaaa84

Improve the efficiency of and evaluation

6891374

Improve the efficiency of and evaluation

1c56dd7

siddharthteotia merged commit 7648a96 into apache:master Sep 12, 2022

Enhance and filter predicate evaluation efficiency #9336

Enhance and filter predicate evaluation efficiency #9336

Uh oh!

Conversation

jasperjiaguo commented Sep 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Sep 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Jackie-Jiang left a comment

Choose a reason for hiding this comment

Uh oh!

jasperjiaguo commented Sep 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

siddharthteotia commented Sep 9, 2022

Uh oh!

siddharthteotia Sep 9, 2022

Choose a reason for hiding this comment

Uh oh!

jasperjiaguo Sep 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

richardstartin Sep 10, 2022

Choose a reason for hiding this comment

Uh oh!

siddharthteotia left a comment

Choose a reason for hiding this comment

Uh oh!

richardstartin commented Sep 9, 2022

Uh oh!

Jackie-Jiang commented Sep 9, 2022

Uh oh!

jasperjiaguo commented Sep 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jasperjiaguo commented Sep 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jasperjiaguo commented Sep 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

richardstartin commented Sep 10, 2022

Uh oh!

jasperjiaguo commented Sep 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jasperjiaguo commented Sep 10, 2022

Uh oh!

siddharthteotia commented Sep 12, 2022

Uh oh!

siddharthteotia commented Sep 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Enhance `and` filter predicate evaluation efficiency #9336

Enhance `and` filter predicate evaluation efficiency #9336

jasperjiaguo commented Sep 5, 2022 •

edited

Loading

codecov-commenter commented Sep 5, 2022 •

edited

Loading

jasperjiaguo commented Sep 9, 2022 •

edited

Loading

jasperjiaguo Sep 9, 2022 •

edited

Loading

jasperjiaguo commented Sep 9, 2022 •

edited

Loading

jasperjiaguo commented Sep 9, 2022 •

edited

Loading

jasperjiaguo commented Sep 10, 2022 •

edited

Loading

jasperjiaguo commented Sep 10, 2022 •

edited

Loading