[2.10] MOD-7896: Add weight factor to BM25 scoring calculation #5704
[2.10] MOD-7896: Add weight factor to BM25 scoring calculation #5704Itzikvaknin merged 1 commit into2.10from
Conversation
* Update BM25 scoring calculations to incorporate weight factor in explanations * Add weight to explainscore Expand testTFIDFScorerExplanation in test_scorers * Add testBM25STDScoreWithWeight and testBM25ScoreWithWeight to test_scorers * Fix testOptionalAndWildcardScoring * Fix testBM25STDScorerExplanation * Add docs to scorer_with_weight_test Fix BM25STD calculation * Remove unnecessary isCluster condition from test_scorers * Add hash seed to force docs distribution * Name change * Fix test_scorers:testBM25ScorerExplanation and test_scorers:testBM25STDScorerExplanation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## 2.10 #5704 +/- ##
=======================================
Coverage 87.65% 87.66%
=======================================
Files 196 196
Lines 34257 34256 -1
=======================================
+ Hits 30029 30031 +2
+ Misses 4228 4225 -3 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
PR Overview
This PR backports changes from an upstream pull request to add a weight factor to the BM25 scoring calculation. Key changes include updating document key names (by appending {hash_tag}) in tests, revising expected scoring explanation strings to include the weight factor, and adding new tests to verify proper weight handling.
Reviewed Changes
| File | Description |
|---|---|
| tests/pytests/test_scorers.py | Adjustments to document keys and expected scoring strings for BM25 and BM25STD, plus new tests for weighted scoring |
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
tests/pytests/test_scorers.py:146
- The removal of the cluster mode conditional block may cause test failures in cluster environments if the output differs from non-cluster setups. Consider reinstituting cluster-specific assertions or verifying that cluster outputs match the current expectations.
if env.isCluster():
| conn.execute_command('HSET', 'doc1', 'title', 'hello world') | ||
| conn.execute_command('HSET', 'doc2', 'title', 'hello world cat dog') | ||
|
|
There was a problem hiding this comment.
In the scorer_with_weight_test, document keys are not appended with {hash_tag} unlike in the other tests, which may lead to cross-shard issues in cluster mode. Consider using consistent document key naming (with {hash_tag}) to ensure proper sharding.
| conn.execute_command('HSET', 'doc1', 'title', 'hello world') | |
| conn.execute_command('HSET', 'doc2', 'title', 'hello world cat dog') | |
| conn.execute_command('HSET', 'doc1{hash_tag}', 'title', 'hello world') | |
| conn.execute_command('HSET', 'doc2{hash_tag}', 'title', 'hello world cat dog') |
Backport #5670 to 2.10