-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Correct TermOrdValComparator competitive iterator intoBitSet implementation #14523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct TermOrdValComparator competitive iterator intoBitSet implementation #14523
Conversation
…re similar to other impls
lucene/core/src/java/org/apache/lucene/search/DisjunctionDISIApproximation.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/comparators/TermOrdValComparator.java
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/comparators/TermOrdValComparator.java
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java
Show resolved
Hide resolved
ChrisHegarty
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/DisjunctionDISIApproximation.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/DocIdSetIterator.java
Outdated
Show resolved
Hide resolved
| public void intoBitSet(int upTo, FixedBitSet bitSet, int offset) throws IOException { | ||
| // The competitive iterator is usually a BitSetIterator, which has an optimized | ||
| // implementation of #intoBitSet. | ||
| if (competitiveIterator.docID() < doc) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is strange. All other intoBitSet impls that I can find call the competitiveIterator to advance to at least offset, not to some internal doc. I suppose NumericComparator is unique?
lucene/core/src/java/org/apache/lucene/search/DisjunctionDISIApproximation.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/DisjunctionDISIApproximation.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/comparators/TermOrdValComparator.java
Show resolved
Hide resolved
gf2121
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you
| LeafCollector leafCollector = collector.getLeafCollector(context); | ||
| BulkScorer bulkScorer = weight.bulkScorer(context); | ||
| // We need to split on this specific doc ID so that the current doc of the competitive | ||
| // iterator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: maybe format this single-line word better?
…tation (#14523) Similar to @ChrisHegarty 's change for the count fix, this will move up the max check to before the into bit set. It seems like we could be calling intobitset erroneously on some edge cases. Additionally, TermOrdValComparator's competitive iterator will ensure that the docsWithField iterator (used when not all docs have a given field), is at least the baseline of the scoring window when doing intoBitSet. closes: #14517
…tation (#14523) Similar to @ChrisHegarty 's change for the count fix, this will move up the max check to before the into bit set. It seems like we could be calling intobitset erroneously on some edge cases. Additionally, TermOrdValComparator's competitive iterator will ensure that the docsWithField iterator (used when not all docs have a given field), is at least the baseline of the scoring window when doing intoBitSet. closes: #14517
| if (disjunction == null) { | ||
| if (docsWithField != null) { | ||
| // we need to be absolutely sure that the iterator is at least at offset | ||
| if (docsWithField.docID() < offset) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The contract from DocIdSetIterator#intoBitSet requires starting from the current doc rather than the first doc on or after offset, so it would be more correct to advance to doc here. In practice, it wouldn't make a difference today since all call sites first advance iterators to offset before calling intoBitSet, but it may not always be the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The contract from DocIdSetIterator#intoBitSet requires starting from the current doc rather than the first doc on or after offset, so it would be more correct to advance to doc here.
So things might got broken when current doc is after the first doc like
CompetitiveIterator competitiveIterator = CompetitiveIterator.of(1, 2, 3, 4);
competitiveIterator.advance(2); // advance current doc to 2
competitiveIterator.update(0, 2000); // update to doc value
final int offset = 1;
if (competitiveIterator.docID() < offset) {
competitiveIterator.advance(offset);
}
competitiveIterator.intoBitSet(upTo, bitset, offset); // we should start from 2 instead of 1.
? Nice catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly.
…tation (apache#14523) Similar to @ChrisHegarty 's change for the count fix, this will move up the max check to before the into bit set. It seems like we could be calling intobitset erroneously on some edge cases. Additionally, TermOrdValComparator's competitive iterator will ensure that the docsWithField iterator (used when not all docs have a given field), is at least the baseline of the scoring window when doing intoBitSet. closes: apache#14517
Similar to @ChrisHegarty 's change for the count fix, this will move up the
maxcheck to before the into bit set. It seems like we could be calling intobitset erroneously on some edge cases.Additionally, TermOrdValComparator's competitive iterator will ensure that the docsWithField iterator (used when not all docs have a given field), is at least the baseline of the scoring window when doing intoBitSet.
closes: #14517