Skip to content

Conversation

@benwtrent
Copy link
Member

When we utilize the underlying approximation here, its possible that its provided by a TwoPhase iterator. Meaning, we never actually call matches() after advancing the approximation and thus the two can get out of sync. This can result in EOFs as described in the related issue.

The other way I could see to fix this is to force calling matches() on every item in the essential queue later in the while loop. However, since this was the ONLY place in all of MaxScoreBulkScorer that used the approximation for the scorers, I opted to just not use it and use an iterator.

I welcome discussion, this took a long time to figure out as this part of the code is very interdependent with various other parts of the code :/ and have had many changes over the last couple of minor releases.

Still working on a test...maybe we should add assertions to the TwoPhase iterator that ensures matches is called...I am not sure.

closes: #15324

@benwtrent benwtrent added this to the 10.4.0 milestone Oct 29, 2025
@github-actions
Copy link
Contributor

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

Comment on lines 108 to 116
private boolean assertVerifiedMatchesApproximationDocID() {
int docID = docID();
for (DisiWrapper w = verifiedMatches; w != null; w = w.next) {
if (w.approximation.docID() != docID) {
return false;
}
}
return true;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This SEEMS like correct behavior right? If we have a verified match, all their approximations should be on the current docID?

Or do I misunderstand what "approximation" means? Maybe we need to assert against w.doc? However, even then, I would expect w.doc == w.approximation once things are all verified right?

@jpountz has basically rewritten this entire class over its life time LOL. I am not sure if he is available for feedback here.

Maybe @romseygeek ?

@benwtrent
Copy link
Member Author

benwtrent commented Oct 29, 2025

Adding some additional context. It seems that verifiedMatches, since its used in the disjunction scorers need to all be on the same doc.

See:

  @Override
  protected float score(DisiWrapper topList) throws IOException {
    double score = 0;

    for (DisiWrapper w = topList; w != null; w = w.next) {
      score += w.scorable.score();
    }
    return (float) score;
  }

That just assumes the topList (which might be coming from verifiedMatches), are all on the same doc, and that doc matches the assumed iteration position.

We only saw this desynchronizing as an issue when we saw an EOF. But it honestly seems like we are just scoring things incorrectly, fairly often.

Here are some screen shots (sorry for the quality, but wanted to provide some tangible proof of what we are seeing).

Here is one where we are obviously scoring a different document as the verifiedMatches point to a different doc than the iterator. (expected: 13283, verifiedMatch: 15554)

image

Here is one that trips an EOF as the verifiedMatches is iterated to NO_MORE_DOCs, but the iterator assumes its on a valid document (expected: 138666).

image

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for digging this one! Indeed this scorer cannot work with two-phase iterators... It's a pity that tests didn't catch it, I wonder if we could randomly wrap queries with RandomApproximationQuery in the test framework to catch such problems.

@benwtrent benwtrent merged commit 27abe1c into apache:main Oct 30, 2025
12 checks passed
@benwtrent benwtrent deleted the fix-issue-15324 branch October 30, 2025 12:22
benwtrent added a commit that referenced this pull request Oct 30, 2025
* Utilize iterator and not approximation in MaxScoreBulkScorer

* CHANGES

* add assert on verifiedMatches

* adjust assertion

* adding some random approx query wrapping

* iter
benwtrent added a commit that referenced this pull request Oct 30, 2025
* Utilize iterator and not approximation in MaxScoreBulkScorer

* CHANGES

* add assert on verifiedMatches

* adjust assertion

* adding some random approx query wrapping

* iter
@thecoop thecoop modified the milestones: 10.4.0, 10.3.2 Nov 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Strange EOF in 10.1.0 with MaxScoreBulkScorer#scoreInnerWindowWithFilter

3 participants