-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Optimize reading data from postings/impacts enums. #12664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This implements the following optimizations: - `ImpactsEnum#advance()` and `PostingsEnum#advance()` now need a single comparison instead of two when the target has already been decoded in the current block. - `ImpactsEnum#nextDoc()` now gets a dedicated implementation instead of delegating to `advance()`. Initial support for dynamic pruning via WAND always used `advance()` in Lucene, but we are now using `nextDoc()` since we switched to `MAXSCORE`. - `ImpactsEnum#nextDoc()` and `ImpactsEnum#advance()` now decode frequencies lazily. The code was already assuming frequencies would get decoded lazily, it was just missing a small change to actually work.
|
All |
| pforUtil.decodeAndPrefixSum(docIn, accum, docBuffer); | ||
| if (indexHasFreqs) { | ||
| pforUtil.decode(docIn, freqBuffer); | ||
| isFreqsRead = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the change needed to decode freqs lazily.
|
|
||
| while (level >= 0) { | ||
| if (target > skipDoc[level]) { | ||
| if (target > skipDoc[level] || skipDoc[level] == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit annoying I had to change skip lists, but it was required so that skip lists and postings blocks remain aligned. Otherwise advanceShallow(0) would set nextSkipDoc equal to 0 instead of the last doc ID of the first block.
|
I tried to re-inline advance() but it still suggested a small slowdown for The slowdown on I'm tempted to not think too hard about |
|
Another potential reason for the slowdowns is that they're caused by the changes to |
|
I'll split this PR into smaller ones, so that it's easier to understand the impact of each change. |
This implements the following optimizations:
ImpactsEnum#advance()andPostingsEnum#advance()now need a single comparison instead of two when the target has already been decoded in the current block.ImpactsEnum#nextDoc()now gets a dedicated implementation instead of delegating toadvance(). Initial support for dynamic pruning via WAND always usedadvance()in Lucene, but we are now usingnextDoc()since we switched toMAXSCORE.ImpactsEnum#nextDoc()andImpactsEnum#advance()now decode frequencies lazily. The code was already assuming frequencies would get decoded lazily, it was just missing a small change to actually work.