-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Enhance and filter predicate evaluation efficiency
#9336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance and filter predicate evaluation efficiency
#9336
Conversation
Codecov Report
@@ Coverage Diff @@
## master #9336 +/- ##
============================================
- Coverage 69.80% 68.47% -1.34%
- Complexity 4777 5069 +292
============================================
Files 1875 1884 +9
Lines 99860 100283 +423
Branches 15194 15253 +59
============================================
- Hits 69706 68667 -1039
- Misses 25231 26681 +1450
- Partials 4923 4935 +12
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great optimization. Can you please share some numbers of the gain? We need to evaluate the overhead of cardinality computation vs the gain. We can also explore if there are some methods we can use to do AND on multiple bitmaps efficiently
|
@Jackie-Jiang I have added a benchmark case and we can try with different bitmap array setups. This will be the first PR of the AND optimization. Still thinking how to optimize the AndDocIdSet for mixed bitmap/scanning with column cardinality. |
and filter predicate evaluation efficiency and filter predicate evaluation efficiency
|
@jasperjiaguo you might want to capture a Flamegraph as well (attach yourkit to local IDE)... @Jackie-Jiang - we plan to do a couple of optimizations including this one in multiple PRs. For example:
@jasperjiaguo - also add the |
|
|
||
| @State(Scope.Benchmark) | ||
| public class BenchmarkAndDocIdIterator { | ||
| private static final int NUM_DOCS = 10_000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be try on few different cardinalities / selectivities ?
The getDocIds().getCardinality() performance will vary accordingly imo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getCardinality() doesn't really count the cardinality every time from scratch, instead it sums the saved cardinality counter from each container, I wouldn't be too worried about it's performance, but will definitely try on a few different cardinalities / selectivities setups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That depends if you have runs or not, if there are runs the cardinality needs to be computed by summing the lengths of the runs. In general computing the cardinality is quite cheap though.
siddharthteotia
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. We can also put this under a flag for now with disabled. Can then do more testing on perf cluster using production data using our respective use cases and enable in OSS eventually.
|
Please take a look at |
|
Can we evaluate the solution by @richardstartin and see if that gives better result? I feel it should give better performance because cardinality itself cannot represent the distribution of the values. |
|
@richardstartin @Jackie-Jiang Yes I have done some experiments with
|
|
new results with more randomized degenerate test input, can see that the overhead of getting cardinality and sorting is really small (below the margine of error) However the overhead of |
|
At a glance not using the stream API to convert the iterators to bitmaps and using the array overload instead of the iterator overload should improve the results. I seem to recall BufferFastAggregation.and being more efficient for larger numbers of bitmaps, so it may not shine in this particular context. |
Modified to use array and blackhole. Updated the results. Briefly looked at the |
|
I think the gain here is reasonable. Since |
It seems like point (2) above about skipping index based eval and use scan instead is same as suggested by @richardstartin in #7600 Let me try to find the profile we had for this. Want to see if we are talking about similar or two different potential optimizations |




For and connected predicates, evaluate the bitmap and with the lowest number of matching docIds first, so that we limit the number of roaring bitmap containers for comparison from the beginning, this will minimize the effort of
andapplicationRunning BenchmarkAndDocIdIterator:

speedup: around 3x after reordering suboptimal bitmap array of size 5: matching % [5/10, 4/10, 3/10, 2/10, 1/10] -> [1/10, 2/10, 3/10, 4/10, 5/10]
sorting overhead: 2.5%
please take a look at BenchmarkAndDocIdIterator.java for details