Skip to content

Conversation

@jpountz
Copy link
Contributor

@jpountz jpountz commented Mar 21, 2025

This implements BlockPostingsEnum#docIDRunEnd() by comparing the delta between doc IDs and between doc counts on the various skip levels.

This implements `BlockPostingsEnum#docIDRunEnd()` by comparing the delta
between doc IDs and between doc counts on the various skip levels.
Copy link
Contributor

@gf2121 gf2121 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I left some minor comments.


@Override
public int docIDRunEnd() throws IOException {
// Note: this assumes that BLOCK_SIZE == 128, this bit of the code would need to be changed if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can we assert this assumption here? As i tried to change BLOCK_SIZE before, i personally prefer an explicit AssertionError rather than comments :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to do this but this gives a compile error that fails the build, I'm unsure how to do it otherwise

1. ERROR in lucene/core/src/java/org/apache/lucene/codecs/lucene101/Lucene101PostingsReader.java (at line 1066)
        assert BLOCK_SIZE == 2 * Long.SIZE;
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
Comparing identical expressions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This trick seems work:

boolean level0IsDense =
    encoding == DeltaEncoding.UNARY
        && docBitSet.getBits()[0] == -1L
        && docBitSet.getBits()[1] == -1L;

if (level0IsDense) {
  assert Long.bitCount(docBitSet.getBits()[0]) + Long.bitCount(docBitSet.getBits()[1])
      == BLOCK_SIZE : "We are assuming BLOCK_SIZE == 128 here.";
  ...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I ended up using another trick that looked a bit more natural by extracting the block size to a variable.

private static void checkDocIDRuns(DocIdSetIterator iterator) throws IOException {
int prevDoc = -1;
int runEnd = 0;
for (int doc = iterator.nextDoc();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to test docIDRunEnd works properly against operations other than nextDoc? E.g. advance, intoBitset, advanceShallow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wondered about it. We already have tests that test nextDoc vs. advance or nextDoc vs. intoBitSet. And "sane" impls of docIDRun should be free of side effects. So I thought that it would be enough this way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for explanation, that sounds good enough today.

// Note: this assumes that BLOCK_SIZE == 128, this bit of the code would need to be changed if
// the block size was changed.
// Hack to avoid compiler warning that both sides of the equal sign are identical.
long blockSize = BLOCK_SIZE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jpountz jpountz merged commit db00348 into apache:main Mar 24, 2025
7 checks passed
@jpountz jpountz deleted the postings_docIDRunEnd branch March 24, 2025 17:19
jpountz added a commit that referenced this pull request Mar 24, 2025
This implements `BlockPostingsEnum#docIDRunEnd()` by comparing the delta
between doc IDs and between doc counts on the various skip levels.
gf2121 pushed a commit to gf2121/lucene that referenced this pull request May 21, 2025
This implements `BlockPostingsEnum#docIDRunEnd()` by comparing the delta
between doc IDs and between doc counts on the various skip levels.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants