-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Speed up TermQuery #14709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up TermQuery #14709
Conversation
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR. |
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR. |
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR. |
jpountz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice speedup! Term queries are fast, though a term query on the is one of the slowest queries in the Tantivy benchmark, so it's nice to get it optimized.
| public int score(LeafCollector collector, Bits acceptDocs, int min, int max) throws IOException { | ||
| if (collector.competitiveIterator() != null) { | ||
| return new Weight.DefaultBulkScorer(scorer).score(collector, acceptDocs, min, max); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should be an implementation detail of DefaultBulkScorer instead of a different class. Doing something like
if (scoreMode == TOP_SCORES && competitiveIterator == null) {
// new optimization
} else {
// existing DefaultBulkScorer code
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for feedback! I moved the impl into DefaultBulkScorer.
if (scoreMode == TOP_SCORES && competitiveIterator == null)
As description showing, exhaustive execution get optimized as well so i use scoreMode.needsScores instead.
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR. |
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR. |
jpountz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about my last suggestion, I had missed that DefaultBulkScorer had no way to know if scores are needed or not yet, so I think I like your previous approach a bit better to keep DefaultBulkScorer clean.
|
|
||
| if (impactsDisi != null) { | ||
| impactsDisi.ensureCompetitive(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should rather put it at the beginning of the below for loop. For instance, imagine that the first block of docs returned only has deleted docs, then it will fetch a new block. It would be good to check if this block is competitive before fetching this new block as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, nice catch!
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR. |
// nightly-benchmarks-results-changed //
* main: (32 commits) update os.makedirs with pathlib mkdir (apache#14710) Optimize AbstractKnnVectorQuery#createBitSet with intoBitset (apache#14674) Implement #docIDRunEnd() on PostingsEnum. (apache#14693) Speed up TermQuery (apache#14709) Refactor main top-n bulk scorers to evaluate hits in a more term-at-a-time fashion. (apache#14701) Fix WindowsFS test failure seen on Policeman Jenkins (apache#14706) Use a temporary repository location to download certain ecj versions ("drops") (apache#14703) Add assumption to ignore occasional test failures due to disconnected graphs (apache#14696) Return MatchNoDocsQuery when IndexOrDocValuesQuery::rewrite does not match (apache#14700) Minor access modifier adjustment to a couple of lucene90 backward compat types (apache#14695) Speed up exhaustive evaluation. (apache#14679) Specify and test that IOContext is immutable (apache#14686) deps(java): bump org.gradle.toolchains.foojay-resolver-convention (apache#14691) deps(java): bump org.eclipse.jgit:org.eclipse.jgit (apache#14692) Clean up how the test framework creates asserting scorables. (apache#14452) Make competitive iterators more robust. (apache#14532) Remove DISIDocIdStream. (apache#14550) Implement AssertingPostingsEnum#intoBitSet. (apache#14675) Fix patience knn queries to work with seeded knn queries (apache#14688) Added toString() method to BytesRefBuilder (apache#14676) ...
|
This change yielded a good speedup on nightly benchmarks, I pushed an annotation. https://benchmarks.mikemccandless.com/Term.html |
This tries to speed up
TermQuerywith the new APIScorer#nextDocsAndScoresTopN
Exhaustive