Skip to content

Conversation

@jpountz
Copy link
Contributor

@jpountz jpountz commented Apr 24, 2025

I had initially introduced DISIDocIdStream to avoid introducing regressions when DenseConjunctionBulkScorer started accepting single clauses. However, benchmarks on #14532 suggested that going through DISIDocIdStream is slower than loading docs into a bit set first and then iterating the bit set, when the postings list has many of its blocks encoded as bit sets.

This makes sense, the way how BitSetDocIdStream iterates set bits saves a number of operations compared with calling FixedBitSet#nextSetBit in a loop.

So I'm suggesting removing DISIDocIdStream for now for simplicity.

I had initially introduced `DISIDocIdStream` to avoid introducing regressions
when `DenseConjunctionBulkScorer` started accepting single clauses. However,
benchmarks on apache#14532 suggested that going through `DISIDocIdStream` is slower
than loading docs into a bit set first and then iterating the bit set, when the
postings list has many of its blocks encoded as bit sets.

This makes sense, the way how `BitSetDocIdStream` iterates set bits saves a
number of operations compared with calling `FixedBitSet#nextSetBit` in a loop.

So I'm suggesting removing `DISIDocIdStream` for now for simplicity.
Copy link
Contributor

@gf2121 gf2121 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excited to see a combination of simplification and optimization!

@github-actions
Copy link
Contributor

github-actions bot commented May 9, 2025

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label May 9, 2025
@github-actions
Copy link
Contributor

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR.

@github-actions github-actions bot removed the Stale label May 16, 2025
Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jpountz jpountz added this to the 10.3.0 milestone May 20, 2025
@jpountz jpountz merged commit 26b3a15 into apache:main May 20, 2025
7 checks passed
@jpountz jpountz deleted the remove_disi_docidstream_opto branch May 20, 2025 21:11
jpountz added a commit that referenced this pull request May 20, 2025
I had initially introduced `DISIDocIdStream` to avoid introducing regressions
when `DenseConjunctionBulkScorer` started accepting single clauses. However,
benchmarks on #14532 suggested that going through `DISIDocIdStream` is slower
than loading docs into a bit set first and then iterating the bit set, when the
postings list has many of its blocks encoded as bit sets.

This makes sense, the way how `BitSetDocIdStream` iterates set bits saves a
number of operations compared with calling `FixedBitSet#nextSetBit` in a loop.

So I'm suggesting removing `DISIDocIdStream` for now for simplicity.
weizijun added a commit to weizijun/lucene that referenced this pull request May 27, 2025
* main: (32 commits)
  update os.makedirs with pathlib mkdir (apache#14710)
  Optimize AbstractKnnVectorQuery#createBitSet with intoBitset (apache#14674)
  Implement #docIDRunEnd() on PostingsEnum. (apache#14693)
  Speed up TermQuery (apache#14709)
  Refactor main top-n bulk scorers to evaluate hits in a more term-at-a-time fashion. (apache#14701)
  Fix WindowsFS test failure seen on Policeman Jenkins (apache#14706)
  Use a temporary repository location to download certain ecj versions ("drops") (apache#14703)
  Add assumption to ignore occasional test failures due to disconnected graphs (apache#14696)
  Return MatchNoDocsQuery when IndexOrDocValuesQuery::rewrite does not match (apache#14700)
  Minor access modifier adjustment to a couple of lucene90 backward compat types (apache#14695)
  Speed up exhaustive evaluation. (apache#14679)
  Specify and test that IOContext is immutable (apache#14686)
  deps(java): bump org.gradle.toolchains.foojay-resolver-convention (apache#14691)
  deps(java): bump org.eclipse.jgit:org.eclipse.jgit (apache#14692)
  Clean up how the test framework creates asserting scorables. (apache#14452)
  Make competitive iterators more robust. (apache#14532)
  Remove DISIDocIdStream. (apache#14550)
  Implement AssertingPostingsEnum#intoBitSet. (apache#14675)
  Fix patience knn queries to work with seeded knn queries (apache#14688)
  Added toString() method to BytesRefBuilder (apache#14676)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants