-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Implement #docIDRunEnd() on PostingsEnum. #14390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This implements `BlockPostingsEnum#docIDRunEnd()` by comparing the delta between doc IDs and between doc counts on the various skip levels.
gf2121
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I left some minor comments.
|
|
||
| @Override | ||
| public int docIDRunEnd() throws IOException { | ||
| // Note: this assumes that BLOCK_SIZE == 128, this bit of the code would need to be changed if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Can we assert this assumption here? As i tried to change BLOCK_SIZE before, i personally prefer an explicit AssertionError rather than comments :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to do this but this gives a compile error that fails the build, I'm unsure how to do it otherwise
1. ERROR in lucene/core/src/java/org/apache/lucene/codecs/lucene101/Lucene101PostingsReader.java (at line 1066)
assert BLOCK_SIZE == 2 * Long.SIZE;
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Comparing identical expressions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This trick seems work:
boolean level0IsDense =
encoding == DeltaEncoding.UNARY
&& docBitSet.getBits()[0] == -1L
&& docBitSet.getBits()[1] == -1L;
if (level0IsDense) {
assert Long.bitCount(docBitSet.getBits()[0]) + Long.bitCount(docBitSet.getBits()[1])
== BLOCK_SIZE : "We are assuming BLOCK_SIZE == 128 here.";
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I ended up using another trick that looked a bit more natural by extracting the block size to a variable.
| private static void checkDocIDRuns(DocIdSetIterator iterator) throws IOException { | ||
| int prevDoc = -1; | ||
| int runEnd = 0; | ||
| for (int doc = iterator.nextDoc(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also need to test docIDRunEnd works properly against operations other than nextDoc? E.g. advance, intoBitset, advanceShallow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wondered about it. We already have tests that test nextDoc vs. advance or nextDoc vs. intoBitSet. And "sane" impls of docIDRun should be free of side effects. So I thought that it would be enough this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for explanation, that sounds good enough today.
| // Note: this assumes that BLOCK_SIZE == 128, this bit of the code would need to be changed if | ||
| // the block size was changed. | ||
| // Hack to avoid compiler warning that both sides of the equal sign are identical. | ||
| long blockSize = BLOCK_SIZE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
This implements `BlockPostingsEnum#docIDRunEnd()` by comparing the delta between doc IDs and between doc counts on the various skip levels.
This implements `BlockPostingsEnum#docIDRunEnd()` by comparing the delta between doc IDs and between doc counts on the various skip levels.
This implements
BlockPostingsEnum#docIDRunEnd()by comparing the delta between doc IDs and between doc counts on the various skip levels.