-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Utilize iterator and not approximation in MaxScoreBulkScorer #15380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
| private boolean assertVerifiedMatchesApproximationDocID() { | ||
| int docID = docID(); | ||
| for (DisiWrapper w = verifiedMatches; w != null; w = w.next) { | ||
| if (w.approximation.docID() != docID) { | ||
| return false; | ||
| } | ||
| } | ||
| return true; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This SEEMS like correct behavior right? If we have a verified match, all their approximations should be on the current docID?
Or do I misunderstand what "approximation" means? Maybe we need to assert against w.doc? However, even then, I would expect w.doc == w.approximation once things are all verified right?
@jpountz has basically rewritten this entire class over its life time LOL. I am not sure if he is available for feedback here.
Maybe @romseygeek ?
jpountz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for digging this one! Indeed this scorer cannot work with two-phase iterators... It's a pity that tests didn't catch it, I wonder if we could randomly wrap queries with RandomApproximationQuery in the test framework to catch such problems.
* Utilize iterator and not approximation in MaxScoreBulkScorer * CHANGES * add assert on verifiedMatches * adjust assertion * adding some random approx query wrapping * iter
* Utilize iterator and not approximation in MaxScoreBulkScorer * CHANGES * add assert on verifiedMatches * adjust assertion * adding some random approx query wrapping * iter


When we utilize the underlying approximation here, its possible that its provided by a TwoPhase iterator. Meaning, we never actually call
matches()after advancing the approximation and thus the two can get out of sync. This can result in EOFs as described in the related issue.The other way I could see to fix this is to force calling
matches()on every item in the essential queue later in the while loop. However, since this was the ONLY place in all ofMaxScoreBulkScorerthat used the approximation for the scorers, I opted to just not use it and use an iterator.I welcome discussion, this took a long time to figure out as this part of the code is very interdependent with various other parts of the code :/ and have had many changes over the last couple of minor releases.
Still working on a test...maybe we should add assertions to the TwoPhase iterator that ensures
matchesis called...I am not sure.closes: #15324