Skip to content

[2.10] MOD-7896: Add weight factor to BM25 scoring calculation #5704

Merged
Itzikvaknin merged 1 commit into2.10from
manual-backport-5670-to-2.10
Feb 27, 2025
Merged

[2.10] MOD-7896: Add weight factor to BM25 scoring calculation #5704
Itzikvaknin merged 1 commit into2.10from
manual-backport-5670-to-2.10

Conversation

@Itzikvaknin
Copy link
Collaborator

Backport #5670 to 2.10

* Update BM25 scoring calculations to incorporate weight factor in explanations

* Add weight to explainscore
Expand testTFIDFScorerExplanation in test_scorers

* Add testBM25STDScoreWithWeight and testBM25ScoreWithWeight
to test_scorers

* Fix testOptionalAndWildcardScoring

* Fix testBM25STDScorerExplanation

* Add docs to scorer_with_weight_test
Fix BM25STD calculation

* Remove unnecessary isCluster condition from test_scorers

* Add hash seed to force docs distribution

* Name change

* Fix test_scorers:testBM25ScorerExplanation and test_scorers:testBM25STDScorerExplanation
@codecov
Copy link

codecov bot commented Feb 27, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 87.66%. Comparing base (ff81729) to head (d6fd3a8).
Report is 1 commits behind head on 2.10.

Additional details and impacted files
@@           Coverage Diff           @@
##             2.10    #5704   +/-   ##
=======================================
  Coverage   87.65%   87.66%           
=======================================
  Files         196      196           
  Lines       34257    34256    -1     
=======================================
+ Hits        30029    30031    +2     
+ Misses       4228     4225    -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Overview

This PR backports changes from an upstream pull request to add a weight factor to the BM25 scoring calculation. Key changes include updating document key names (by appending {hash_tag}) in tests, revising expected scoring explanation strings to include the weight factor, and adding new tests to verify proper weight handling.

Reviewed Changes

File Description
tests/pytests/test_scorers.py Adjustments to document keys and expected scoring strings for BM25 and BM25STD, plus new tests for weighted scoring

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

tests/pytests/test_scorers.py:146

  • The removal of the cluster mode conditional block may cause test failures in cluster environments if the output differs from non-cluster setups. Consider reinstituting cluster-specific assertions or verifying that cluster outputs match the current expectations.
if env.isCluster():

Comment on lines +335 to +337
conn.execute_command('HSET', 'doc1', 'title', 'hello world')
conn.execute_command('HSET', 'doc2', 'title', 'hello world cat dog')

Copy link

Copilot AI Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the scorer_with_weight_test, document keys are not appended with {hash_tag} unlike in the other tests, which may lead to cross-shard issues in cluster mode. Consider using consistent document key naming (with {hash_tag}) to ensure proper sharding.

Suggested change
conn.execute_command('HSET', 'doc1', 'title', 'hello world')
conn.execute_command('HSET', 'doc2', 'title', 'hello world cat dog')
conn.execute_command('HSET', 'doc1{hash_tag}', 'title', 'hello world')
conn.execute_command('HSET', 'doc2{hash_tag}', 'title', 'hello world cat dog')

Copilot uses AI. Check for mistakes.
@Itzikvaknin Itzikvaknin added this pull request to the merge queue Feb 27, 2025
Merged via the queue into 2.10 with commit 6e95550 Feb 27, 2025
10 checks passed
@Itzikvaknin Itzikvaknin deleted the manual-backport-5670-to-2.10 branch February 27, 2025 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants