-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Remove DISIDocIdStream. #14550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove DISIDocIdStream. #14550
Conversation
I had initially introduced `DISIDocIdStream` to avoid introducing regressions when `DenseConjunctionBulkScorer` started accepting single clauses. However, benchmarks on apache#14532 suggested that going through `DISIDocIdStream` is slower than loading docs into a bit set first and then iterating the bit set, when the postings list has many of its blocks encoded as bit sets. This makes sense, the way how `BitSetDocIdStream` iterates set bits saves a number of operations compared with calling `FixedBitSet#nextSetBit` in a loop. So I'm suggesting removing `DISIDocIdStream` for now for simplicity.
gf2121
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excited to see a combination of simplification and optimization!
|
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution! |
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR. |
ChrisHegarty
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I had initially introduced `DISIDocIdStream` to avoid introducing regressions when `DenseConjunctionBulkScorer` started accepting single clauses. However, benchmarks on #14532 suggested that going through `DISIDocIdStream` is slower than loading docs into a bit set first and then iterating the bit set, when the postings list has many of its blocks encoded as bit sets. This makes sense, the way how `BitSetDocIdStream` iterates set bits saves a number of operations compared with calling `FixedBitSet#nextSetBit` in a loop. So I'm suggesting removing `DISIDocIdStream` for now for simplicity.
* main: (32 commits) update os.makedirs with pathlib mkdir (apache#14710) Optimize AbstractKnnVectorQuery#createBitSet with intoBitset (apache#14674) Implement #docIDRunEnd() on PostingsEnum. (apache#14693) Speed up TermQuery (apache#14709) Refactor main top-n bulk scorers to evaluate hits in a more term-at-a-time fashion. (apache#14701) Fix WindowsFS test failure seen on Policeman Jenkins (apache#14706) Use a temporary repository location to download certain ecj versions ("drops") (apache#14703) Add assumption to ignore occasional test failures due to disconnected graphs (apache#14696) Return MatchNoDocsQuery when IndexOrDocValuesQuery::rewrite does not match (apache#14700) Minor access modifier adjustment to a couple of lucene90 backward compat types (apache#14695) Speed up exhaustive evaluation. (apache#14679) Specify and test that IOContext is immutable (apache#14686) deps(java): bump org.gradle.toolchains.foojay-resolver-convention (apache#14691) deps(java): bump org.eclipse.jgit:org.eclipse.jgit (apache#14692) Clean up how the test framework creates asserting scorables. (apache#14452) Make competitive iterators more robust. (apache#14532) Remove DISIDocIdStream. (apache#14550) Implement AssertingPostingsEnum#intoBitSet. (apache#14675) Fix patience knn queries to work with seeded knn queries (apache#14688) Added toString() method to BytesRefBuilder (apache#14676) ...
I had initially introduced
DISIDocIdStreamto avoid introducing regressions whenDenseConjunctionBulkScorerstarted accepting single clauses. However, benchmarks on #14532 suggested that going throughDISIDocIdStreamis slower than loading docs into a bit set first and then iterating the bit set, when the postings list has many of its blocks encoded as bit sets.This makes sense, the way how
BitSetDocIdStreamiterates set bits saves a number of operations compared with callingFixedBitSet#nextSetBitin a loop.So I'm suggesting removing
DISIDocIdStreamfor now for simplicity.