MOD-12223: Fix BM25STD underflow wraparound#7223
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR fixes an underflow issue in the BM25STD IDF calculation that occurred during deletion/update operations before garbage collection runs. The fix prevents negative values in the IDF computation when termDocs temporarily exceeds totalDocs.
- Adds a guard condition in
CalculateIDF_BM25to handle cases wheretotalDocs < termDocs - Introduces a test case that validates the fix by simulating the problematic scenario
- Uses a ternary operator to apply a fallback calculation when the underflow condition is detected
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/iterators/inverted_index_iterator.c | Adds underflow protection in BM25 IDF calculation with conditional logic |
| tests/pytests/test_scorers.py | Adds regression test for BM25STD underflow scenario during updates |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #7223 +/- ##
==========================================
- Coverage 85.24% 85.01% -0.24%
==========================================
Files 341 341
Lines 52593 51662 -931
Branches 12526 12526
==========================================
- Hits 44833 43919 -914
+ Misses 7568 7551 -17
Partials 192 192
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
* Fix BM25STD underflow wraparound * Fix comment * Fix documentation and tweak test * Address review * Remove old test * Skip cluster on expiration test * Fix leak in test
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin 2.8
git worktree add -d .worktree/backport-7223-to-2.8 origin/2.8
cd .worktree/backport-7223-to-2.8
git switch --create backport-7223-to-2.8
git cherry-pick -x d95056f383377d2c88badbb7da7909e85163b077 |
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin 2.10
git worktree add -d .worktree/backport-7223-to-2.10 origin/2.10
cd .worktree/backport-7223-to-2.10
git switch --create backport-7223-to-2.10
git cherry-pick -x d95056f383377d2c88badbb7da7909e85163b077 |
* Fix BM25STD underflow wraparound * Fix comment * Fix documentation and tweak test * Address review * Remove old test * Skip cluster on expiration test * Fix leak in test (cherry picked from commit d95056f)
|
Successfully created backport PR for |
* Fix BM25STD underflow wraparound * Fix comment * Fix documentation and tweak test * Address review * Remove old test * Skip cluster on expiration test * Fix leak in test (cherry picked from commit d95056f)
|
Successfully created backport PR for |
MOD-12223: Fix BM25STD underflow wraparound (#7223) * Fix BM25STD underflow wraparound * Fix comment * Fix documentation and tweak test * Address review * Remove old test * Skip cluster on expiration test * Fix leak in test (cherry picked from commit d95056f) Co-authored-by: Raz Monsonego <[email protected]>
* MOD-12223: Fix BM25STD underflow wraparound (#7223) * Fix BM25STD underflow wraparound * Fix comment * Fix documentation and tweak test * Address review * Remove old test * Skip cluster on expiration test * Fix leak in test (cherry picked from commit d95056f) * Enable unstable features for default-scorer config * Fix in duplicate function (actually used) * Change the default scorer after enabling unstable features * Fix comment --------- Co-authored-by: Raz Monsonego <[email protected]>
BM25STDscorer, causing the score to jump rapidly.termDocs = totalDocsin this case, such that we do not underflow.Do note that this "discrepancy" will be fixed once the GC runs, such that the correction is temporary only, yet needed until then.
Also, we currently do not test that after the GC runs we get the same scores, due to other metrics that are not appropriately updated (e.g.,
totalDocLen). Once they will be fixed we will be able to test that as well.Note
Prevent BM25STD IDF underflow by clamping doc counts; add targeted underflow test, adjust a skip for expiration test, and remove an outdated scorer test.
totalDocstoMAX(totalDocs, termDocs)inCalculateIDF_BM25to prevent underflow whentotalDocs < termDocs(e.g., before GC) insrc/iterators/inverted_index_iterator.c.testBM25STDUnderflowvalidating stable BM25STD scores without underflow intests/pytests/test_scorers.py.testScoreReplacefromtests/pytests/test_scorers.py.testLazyTextFieldExpirationintests/pytests/test_expire.py.Written by Cursor Bugbot for commit 4b26ae0. This will update automatically on new commits. Configure here.