Skip to content

MOD-7896: Add weight factor to BM25 scoring calculation#5670

Merged
Itzikvaknin merged 11 commits intomasterfrom
ivaknin-bm25-weights-issue
Feb 26, 2025
Merged

MOD-7896: Add weight factor to BM25 scoring calculation#5670
Itzikvaknin merged 11 commits intomasterfrom
ivaknin-bm25-weights-issue

Conversation

@Itzikvaknin
Copy link
Collaborator

@Itzikvaknin Itzikvaknin commented Feb 19, 2025

This PR fixes an issue where the weight attribute was not applied when querying a single term using the BM25 scorer, as demonstrated in the example below:

DB:

JSON.SET key:1 $ '{"tag": "foo", "content": "The cat is on the mat", "description": "The cat is on the mat", "content_vector": [1,1], "description_vector": [1,1]}'
JSON.SET key:2 $ '{"tag": "foo", "content": "My dog and cat are the best animals to ever exist in the universe.  My cat\'s name is Fred", "description": "My dog and cat are the best animals to ever exist in the universe.  My cat\'s name is Fred", "content_vector": [1,2], "description_vector": [1,2]}'
JSON.SET key:3 $ '{"tag": "bar", "content": "There was a solar eclipse yesterday", "description": "There was a solar eclipse yesterday", "content_vector": [1,3], "description_vector": [1,3]}'
FT.CREATE idx ON JSON PREFIX 1 key: SCHEMA $.tag AS tag TAG $.content AS content TEXT $.description AS description TEXT $.content_vector AS content_vector VECTOR FLAT 6 TYPE FLOAT32 DIM 2 DISTANCE_METRIC COSINE $.description_vector AS description_vector VECTOR FLAT 6 TYPE FLOAT32 DIM 2 DISTANCE_METRIC COSINE

Behavior before the fix:

> FT.SEARCH idx "@content:cat" SCORER BM25 WITHSCORES
1) "2"
2) "key:2"
3) "0.33898304194357526"
4) 1) "$"
   2) "{\"tag\":\"foo\",\"content\":\"My dog and cat are the best animals to ever exist in the universe.  My cat's name is Fred\",\"description\":\"My dog and cat are the best animals to ever exist in the universe.  My cat's name is Fred\",\"content_vector\":[1,2],\"description_vector\":[1,2]}"
5) "key:1"
6) "0.20408162619858095"
7) 1) "$"
   2) "{\"tag\":\"foo\",\"content\":\"The cat is on the mat\",\"description\":\"The cat is on the mat\",\"content_vector\":[1,1],\"description_vector\":[1,1]}"


> FT.SEARCH idx "((@content:cat) => {$weight: 0.3;})" SCORER BM25 WITHSCORES
1) "2"
2) "key:2"
3) "0.33898304194357526"
4) 1) "$"
   2) "{\"tag\":\"foo\",\"content\":\"My dog and cat are the best animals to ever exist in the universe.  My cat's name is Fred\",\"description\":\"My dog and cat are the best animals to ever exist in the universe.  My cat's name is Fred\",\"content_vector\":[1,2],\"description_vector\":[1,2]}"
5) "key:1"
6) "0.20408162619858095"
7) 1) "$"
   2) "{\"tag\":\"foo\",\"content\":\"The cat is on the mat\",\"description\":\"The cat is on the mat\",\"content_vector\":[1,1],\"description_vector\":[1,1]}"

Behavior after the fix:

> FT.SEARCH idx "@content:cat" SCORER BM25 WITHSCORES
1) (integer) 2
2) "key:2"
3) "0.33898304194357526"
4) 1) "$"
   2) "{\"tag\":\"foo\",\"content\":\"My dog and cat are the best animals to ever exist in the universe.  My cat's name is Fred\",\"description\":\"My dog and cat are the best animals to ever exist in the universe.  My cat's name is Fred\",\"content_vector\":[1,2],\"description_vector\":[1,2]}"
5) "key:1"
6) "0.20408162619858095"
7) 1) "$"
   2) "{\"tag\":\"foo\",\"content\":\"The cat is on the mat\",\"description\":\"The cat is on the mat\",\"content_vector\":[1,1],\"description_vector\":[1,1]}"


> FT.SEARCH idx "((@content:cat) => {$weight: 0.3;})" SCORER BM25 WITHSCORES
1) (integer) 2
2) "key:2"
3) "0.10169491258307257"
4) 1) "$"
   2) "{\"tag\":\"foo\",\"content\":\"My dog and cat are the best animals to ever exist in the universe.  My cat's name is Fred\",\"description\":\"My dog and cat are the best animals to ever exist in the universe.  My cat's name is Fred\",\"content_vector\":[1,2],\"description_vector\":[1,2]}"
5) "key:1"
6) "0.06122448785957428"
7) 1) "$"
   2) "{\"tag\":\"foo\",\"content\":\"The cat is on the mat\",\"description\":\"The cat is on the mat\",\"content_vector\":[1,1],\"description_vector\":[1,1]}"

EXPLAIN was updated accordingly:

> FT.SEARCH idx '((@content:(cat => {$weight: 0.5;}|dog) => {$weight: 0.7;}) => {$weight: 0.3;})' WITHSCORES EXPLAINSCORE SCORER BM25 NOCONTENT
1) (integer) 2
2) "key:2"
3) 1) "0.17329643201068487"
   2) 1) Final BM25 : words BM25 0.17 * document score 1.00 / slop 1
      2) 1) 1) (Weight 0.30 * children BM25 0.58)
            2) 1) (0.17 = Weight 0.50 * IDF 1.00 * F 4 / (F 4 + k1 1.2 * (1 - b 0.5 + b 0.5 * Average Len 12.00)))
               2) (0.41 = Weight 1.00 * IDF 2.00 * F 2 / (F 2 + k1 1.2 * (1 - b 0.5 + b 0.5 * Average Len 12.00)))
4) "key:1"
5) 1) "0.03061224392978714"
   2) 1) Final BM25 : words BM25 0.03 * document score 1.00 / slop 1
      2) 1) 1) (Weight 0.30 * children BM25 0.10)
            2) 1) (0.10 = Weight 0.50 * IDF 1.00 * F 2 / (F 2 + k1 1.2 * (1 - b 0.5 + b 0.5 * Average Len 12.00)))

@codecov
Copy link

codecov bot commented Feb 19, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.02%. Comparing base (c2bab6e) to head (22a17df).
Report is 8 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5670      +/-   ##
==========================================
- Coverage   88.04%   88.02%   -0.03%     
==========================================
  Files         197      197              
  Lines       35577    35576       -1     
==========================================
- Hits        31322    31314       -8     
- Misses       4255     4262       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@raz-mon
Copy link
Collaborator

raz-mon commented Feb 20, 2025

@Itzikvaknin Update the PR comment please.
Remove the default part you did not use, and add some details as to when the issue happened and what is the new behavior. An example would be great

Copy link
Collaborator

@raz-mon raz-mon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice
See a few comments

Fix BM25STD calculation
@Itzikvaknin Itzikvaknin requested a review from raz-mon February 20, 2025 18:27
@Itzikvaknin Itzikvaknin requested a review from raz-mon February 23, 2025 10:01
raz-mon
raz-mon previously approved these changes Feb 23, 2025
raz-mon
raz-mon previously approved these changes Feb 23, 2025
Copy link
Collaborator

@raz-mon raz-mon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😎

@Itzikvaknin Itzikvaknin deleted the ivaknin-bm25-weights-issue branch February 26, 2025 15:28
@Itzikvaknin
Copy link
Collaborator Author

/backport

@redisearch-backport-pull-request
Copy link
Contributor

Backport failed for 2.10, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 2.10
git worktree add -d .worktree/backport-5670-to-2.10 origin/2.10
cd .worktree/backport-5670-to-2.10
git switch --create backport-5670-to-2.10
git cherry-pick -x 65a4c9e396228d024d7ca4d34a9e99bccbcced31

@redisearch-backport-pull-request
Copy link
Contributor

Successfully created backport PR for 8.0:

redisearch-backport-pull-request bot pushed a commit that referenced this pull request Feb 27, 2025
* Update BM25 scoring calculations to incorporate weight factor in explanations

* Add weight to explainscore
Expand testTFIDFScorerExplanation in test_scorers

* Add testBM25STDScoreWithWeight and testBM25ScoreWithWeight
to test_scorers

* Fix testOptionalAndWildcardScoring

* Fix testBM25STDScorerExplanation

* Add docs to scorer_with_weight_test
Fix BM25STD calculation

* Remove unnecessary isCluster condition from test_scorers

* Add hash seed to force docs distribution

* Name change

* Fix test_scorers:testBM25ScorerExplanation and test_scorers:testBM25STDScorerExplanation

(cherry picked from commit 65a4c9e)
Itzikvaknin added a commit that referenced this pull request Feb 27, 2025
* Update BM25 scoring calculations to incorporate weight factor in explanations

* Add weight to explainscore
Expand testTFIDFScorerExplanation in test_scorers

* Add testBM25STDScoreWithWeight and testBM25ScoreWithWeight
to test_scorers

* Fix testOptionalAndWildcardScoring

* Fix testBM25STDScorerExplanation

* Add docs to scorer_with_weight_test
Fix BM25STD calculation

* Remove unnecessary isCluster condition from test_scorers

* Add hash seed to force docs distribution

* Name change

* Fix test_scorers:testBM25ScorerExplanation and test_scorers:testBM25STDScorerExplanation
Itzikvaknin added a commit that referenced this pull request Feb 27, 2025
* Update BM25 scoring calculations to incorporate weight factor in explanations

* Add weight to explainscore
Expand testTFIDFScorerExplanation in test_scorers

* Add testBM25STDScoreWithWeight and testBM25ScoreWithWeight
to test_scorers

* Fix testOptionalAndWildcardScoring

* Fix testBM25STDScorerExplanation

* Add docs to scorer_with_weight_test
Fix BM25STD calculation

* Remove unnecessary isCluster condition from test_scorers

* Add hash seed to force docs distribution

* Name change

* Fix test_scorers:testBM25ScorerExplanation and test_scorers:testBM25STDScorerExplanation
github-merge-queue bot pushed a commit that referenced this pull request Feb 27, 2025
MOD-7896: Add weight factor to BM25 scoring calculation (#5670)

* Update BM25 scoring calculations to incorporate weight factor in explanations

* Add weight to explainscore
Expand testTFIDFScorerExplanation in test_scorers

* Add testBM25STDScoreWithWeight and testBM25ScoreWithWeight
to test_scorers

* Fix testOptionalAndWildcardScoring

* Fix testBM25STDScorerExplanation

* Add docs to scorer_with_weight_test
Fix BM25STD calculation

* Remove unnecessary isCluster condition from test_scorers

* Add hash seed to force docs distribution

* Name change

* Fix test_scorers:testBM25ScorerExplanation and test_scorers:testBM25STDScorerExplanation

(cherry picked from commit 65a4c9e)

Co-authored-by: Itzikvaknin <[email protected]>
github-merge-queue bot pushed a commit that referenced this pull request Feb 27, 2025
MOD-7896: Add weight factor to BM25 scoring calculation (#5670)

* Update BM25 scoring calculations to incorporate weight factor in explanations

* Add weight to explainscore
Expand testTFIDFScorerExplanation in test_scorers

* Add testBM25STDScoreWithWeight and testBM25ScoreWithWeight
to test_scorers

* Fix testOptionalAndWildcardScoring

* Fix testBM25STDScorerExplanation

* Add docs to scorer_with_weight_test
Fix BM25STD calculation

* Remove unnecessary isCluster condition from test_scorers

* Add hash seed to force docs distribution

* Name change

* Fix test_scorers:testBM25ScorerExplanation and test_scorers:testBM25STDScorerExplanation
@Itzikvaknin
Copy link
Collaborator Author

/backport

Itzikvaknin added a commit that referenced this pull request Mar 2, 2025
* Update BM25 scoring calculations to incorporate weight factor in explanations

* Add weight to explainscore
Expand testTFIDFScorerExplanation in test_scorers

* Add testBM25STDScoreWithWeight and testBM25ScoreWithWeight
to test_scorers

* Fix testOptionalAndWildcardScoring

* Fix testBM25STDScorerExplanation

* Add docs to scorer_with_weight_test
Fix BM25STD calculation

* Remove unnecessary isCluster condition from test_scorers

* Add hash seed to force docs distribution

* Name change

* Fix test_scorers:testBM25ScorerExplanation and test_scorers:testBM25STDScorerExplanation
Itzikvaknin added a commit that referenced this pull request Mar 2, 2025
* Update BM25 scoring calculations to incorporate weight factor in explanations

* Add weight to explainscore
Expand testTFIDFScorerExplanation in test_scorers

* Add testBM25STDScoreWithWeight and testBM25ScoreWithWeight
to test_scorers

* Fix testOptionalAndWildcardScoring

* Fix testBM25STDScorerExplanation

* Add docs to scorer_with_weight_test
Fix BM25STD calculation

* Remove unnecessary isCluster condition from test_scorers

* Add hash seed to force docs distribution

* Name change

* Fix test_scorers:testBM25ScorerExplanation and test_scorers:testBM25STDScorerExplanation
Itzikvaknin added a commit that referenced this pull request Mar 2, 2025
* Update BM25 scoring calculations to incorporate weight factor in explanations

* Add weight to explainscore
Expand testTFIDFScorerExplanation in test_scorers

* Add testBM25STDScoreWithWeight and testBM25ScoreWithWeight
to test_scorers

* Fix testOptionalAndWildcardScoring

* Fix testBM25STDScorerExplanation

* Add docs to scorer_with_weight_test
Fix BM25STD calculation

* Remove unnecessary isCluster condition from test_scorers

* Add hash seed to force docs distribution

* Name change

* Fix test_scorers:testBM25ScorerExplanation and test_scorers:testBM25STDScorerExplanation
github-merge-queue bot pushed a commit that referenced this pull request Mar 3, 2025
…#5712)

* MOD-7896: Add weight factor to BM25 scoring calculation (#5670)

* Update BM25 scoring calculations to incorporate weight factor in explanations

* Add weight to explainscore
Expand testTFIDFScorerExplanation in test_scorers

* Add testBM25STDScoreWithWeight and testBM25ScoreWithWeight
to test_scorers

* Fix testOptionalAndWildcardScoring

* Fix testBM25STDScorerExplanation

* Add docs to scorer_with_weight_test
Fix BM25STD calculation

* Remove unnecessary isCluster condition from test_scorers

* Add hash seed to force docs distribution

* Name change

* Fix test_scorers:testBM25ScorerExplanation and test_scorers:testBM25STDScorerExplanation

* Add test with scoring
github-merge-queue bot pushed a commit that referenced this pull request Mar 3, 2025
…#5709)

* MOD-7896: Add weight factor to BM25 scoring calculation (#5670)

* Update BM25 scoring calculations to incorporate weight factor in explanations

* Add weight to explainscore
Expand testTFIDFScorerExplanation in test_scorers

* Add testBM25STDScoreWithWeight and testBM25ScoreWithWeight
to test_scorers

* Fix testOptionalAndWildcardScoring

* Fix testBM25STDScorerExplanation

* Add docs to scorer_with_weight_test
Fix BM25STD calculation

* Remove unnecessary isCluster condition from test_scorers

* Add hash seed to force docs distribution

* Name change

* Fix test_scorers:testBM25ScorerExplanation and test_scorers:testBM25STDScorerExplanation

* Add Env import
LukeMathWalker pushed a commit to LukeMathWalker/RediSearch that referenced this pull request Mar 3, 2025
)

* Update BM25 scoring calculations to incorporate weight factor in explanations

* Add weight to explainscore
Expand testTFIDFScorerExplanation in test_scorers

* Add testBM25STDScoreWithWeight and testBM25ScoreWithWeight
to test_scorers

* Fix testOptionalAndWildcardScoring

* Fix testBM25STDScorerExplanation

* Add docs to scorer_with_weight_test
Fix BM25STD calculation

* Remove unnecessary isCluster condition from test_scorers

* Add hash seed to force docs distribution

* Name change

* Fix test_scorers:testBM25ScorerExplanation and test_scorers:testBM25STDScorerExplanation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants