Skip to content

MOD-12223: Fix BM25STD underflow wraparound#7223

Merged
raz-mon merged 7 commits intomasterfrom
razmon-fix_scoring
Nov 9, 2025
Merged

MOD-12223: Fix BM25STD underflow wraparound#7223
raz-mon merged 7 commits intomasterfrom
razmon-fix_scoring

Conversation

@raz-mon
Copy link
Collaborator

@raz-mon raz-mon commented Nov 5, 2025

  1. Current: Upon having more documents in the inverted-index of a term than the number of documents in the index, an underflow occurs in our BM25STD scorer, causing the score to jump rapidly.
  2. Change: We settle for a temporary fix of setting termDocs = totalDocs in this case, such that we do not underflow.
  3. Outcome: The score does not jump rapidly in this case.

Do note that this "discrepancy" will be fixed once the GC runs, such that the correction is temporary only, yet needed until then.

Also, we currently do not test that after the GC runs we get the same scores, due to other metrics that are not appropriately updated (e.g., totalDocLen). Once they will be fixed we will be able to test that as well.


Note

Prevent BM25STD IDF underflow by clamping doc counts; add targeted underflow test, adjust a skip for expiration test, and remove an outdated scorer test.

  • Scoring (BM25STD):
    • Clamp totalDocs to MAX(totalDocs, termDocs) in CalculateIDF_BM25 to prevent underflow when totalDocs < termDocs (e.g., before GC) in src/iterators/inverted_index_iterator.c.
  • Tests:
    • Add testBM25STDUnderflow validating stable BM25STD scores without underflow in tests/pytests/test_scorers.py.
    • Remove testScoreReplace from tests/pytests/test_scorers.py.
    • Broaden skip to include cluster for testLazyTextFieldExpiration in tests/pytests/test_expire.py.

Written by Cursor Bugbot for commit 4b26ae0. This will update automatically on new commits. Configure here.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes an underflow issue in the BM25STD IDF calculation that occurred during deletion/update operations before garbage collection runs. The fix prevents negative values in the IDF computation when termDocs temporarily exceeds totalDocs.

  • Adds a guard condition in CalculateIDF_BM25 to handle cases where totalDocs < termDocs
  • Introduces a test case that validates the fix by simulating the problematic scenario
  • Uses a ternary operator to apply a fallback calculation when the underflow condition is detected

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/iterators/inverted_index_iterator.c Adds underflow protection in BM25 IDF calculation with conditional logic
tests/pytests/test_scorers.py Adds regression test for BM25STD underflow scenario during updates

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions github-actions bot added the size:S label Nov 5, 2025
Copy link
Collaborator

@GuyAv46 GuyAv46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome

GuyAv46
GuyAv46 previously approved these changes Nov 5, 2025
Copy link
Collaborator

@GuyAv46 GuyAv46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎣

alonre24
alonre24 previously approved these changes Nov 5, 2025
@raz-mon raz-mon enabled auto-merge November 5, 2025 09:28
@raz-mon raz-mon dismissed stale reviews from alonre24 and GuyAv46 via db32217 November 5, 2025 12:12
GuyAv46
GuyAv46 previously approved these changes Nov 5, 2025
@codecov
Copy link

codecov bot commented Nov 5, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.01%. Comparing base (7184778) to head (4b26ae0).
⚠️ Report is 7 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7223      +/-   ##
==========================================
- Coverage   85.24%   85.01%   -0.24%     
==========================================
  Files         341      341              
  Lines       52593    51662     -931     
  Branches    12526    12526              
==========================================
- Hits        44833    43919     -914     
+ Misses       7568     7551      -17     
  Partials      192      192              
Flag Coverage Δ
flow 84.25% <100.00%> (-0.01%) ⬇️
unit 51.57% <100.00%> (-0.73%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@raz-mon raz-mon added this pull request to the merge queue Nov 6, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Nov 6, 2025
@raz-mon raz-mon added this pull request to the merge queue Nov 6, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 6, 2025
@raz-mon raz-mon added this pull request to the merge queue Nov 6, 2025
github-merge-queue bot pushed a commit that referenced this pull request Nov 6, 2025
* Fix BM25STD underflow wraparound

* Fix comment

* Fix documentation and tweak test

* Address review

* Remove old test

* Skip cluster on expiration test

* Fix leak in test
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 6, 2025
@raz-mon raz-mon added this pull request to the merge queue Nov 8, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 8, 2025
@raz-mon raz-mon added this pull request to the merge queue Nov 8, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 8, 2025
@raz-mon raz-mon added this pull request to the merge queue Nov 8, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 8, 2025
@raz-mon raz-mon added this pull request to the merge queue Nov 9, 2025
Merged via the queue into master with commit d95056f Nov 9, 2025
19 checks passed
@raz-mon raz-mon deleted the razmon-fix_scoring branch November 9, 2025 08:45
@redisearch-backport-pull-request
Copy link
Contributor

Backport failed for 2.8, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 2.8
git worktree add -d .worktree/backport-7223-to-2.8 origin/2.8
cd .worktree/backport-7223-to-2.8
git switch --create backport-7223-to-2.8
git cherry-pick -x d95056f383377d2c88badbb7da7909e85163b077

@redisearch-backport-pull-request
Copy link
Contributor

Backport failed for 2.10, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 2.10
git worktree add -d .worktree/backport-7223-to-2.10 origin/2.10
cd .worktree/backport-7223-to-2.10
git switch --create backport-7223-to-2.10
git cherry-pick -x d95056f383377d2c88badbb7da7909e85163b077

redisearch-backport-pull-request bot pushed a commit that referenced this pull request Nov 9, 2025
* Fix BM25STD underflow wraparound

* Fix comment

* Fix documentation and tweak test

* Address review

* Remove old test

* Skip cluster on expiration test

* Fix leak in test

(cherry picked from commit d95056f)
@redisearch-backport-pull-request
Copy link
Contributor

Successfully created backport PR for 8.2:

redisearch-backport-pull-request bot pushed a commit that referenced this pull request Nov 9, 2025
* Fix BM25STD underflow wraparound

* Fix comment

* Fix documentation and tweak test

* Address review

* Remove old test

* Skip cluster on expiration test

* Fix leak in test

(cherry picked from commit d95056f)
@redisearch-backport-pull-request
Copy link
Contributor

Successfully created backport PR for 8.4:

github-merge-queue bot pushed a commit that referenced this pull request Nov 9, 2025
MOD-12223: Fix BM25STD underflow wraparound (#7223)

* Fix BM25STD underflow wraparound

* Fix comment

* Fix documentation and tweak test

* Address review

* Remove old test

* Skip cluster on expiration test

* Fix leak in test

(cherry picked from commit d95056f)

Co-authored-by: Raz Monsonego <[email protected]>
github-merge-queue bot pushed a commit that referenced this pull request Nov 9, 2025
* MOD-12223: Fix BM25STD underflow wraparound (#7223)

* Fix BM25STD underflow wraparound

* Fix comment

* Fix documentation and tweak test

* Address review

* Remove old test

* Skip cluster on expiration test

* Fix leak in test

(cherry picked from commit d95056f)

* Enable unstable features for default-scorer config

* Fix in duplicate function (actually used)

* Change the default scorer after enabling unstable features

* Fix comment

---------

Co-authored-by: Raz Monsonego <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants