Skip to content

[8.2] [MOD-10037] add default scorer in configuration (#7065)#7083

Merged
JoanFM merged 7 commits into8.2from
backport-7065-to-8.2
Oct 28, 2025
Merged

[8.2] [MOD-10037] add default scorer in configuration (#7065)#7083
JoanFM merged 7 commits into8.2from
backport-7065-to-8.2

Conversation

@JoanFM
Copy link
Collaborator

@JoanFM JoanFM commented Oct 17, 2025

  • feat: add default scorer in configuration

  • fix and test config setters and getters

  • test: add flow tests checking scorer behavior

  • fix: fix as per cursor comments

  • fix: fix as per code review

  • fix: change the init config code

  • test: fix the tests

  • test: refactor tests

  • fix memory leak in getter

  • test: add explicit test BM25STD is the default since start

  • validate default scorer at module init

  • fix validation against extensions

  • apply changes as per review

  • Update tests/cpptests/test_cpp_forkgc.cpp

  • refactor small setter

  • try to fix sanitizer leak

  • test: Known leaks when onModuleLoad fails

  • Update src/module-init/module-init.c


Backport of #7065 to 8.2.
(cherry picked from commit b61c075)

Describe the changes in the pull request

A clear and concise description of what the PR is solving, including:

  1. Current: The current state briefly
  2. Change: What is the change
  3. Outcome: Adding the outcome

Which additional issues this PR fixes

  1. MOD-...
  2. #...

Main objects this PR modified

  1. ...

Mark if applicable

  • This PR introduces API changes
  • This PR introduces serialization changes

Note

Introduce configurable default scorer (CONFIG search-default-scorer / FT.CONFIG DEFAULT_SCORER) used when none is specified, with validation, exposure in INFO, lifecycle management, and comprehensive tests.

  • Config:
    • Add DEFAULT_SCORER (FT.CONFIG) and search-default-scorer (CONFIG) with runtime getters/setters, gated by ENABLE_UNSTABLE_FEATURES, validated against registered scorers, persisted in RSGlobalConfig, included in INFO and config dumps, and registered via Module Config API.
    • Validate default scorer at module init; assert/panic on invalid value. Manage allocated strings and free on unload.
  • Execution/Query:
    • Use RSGlobalConfig.defaultScorer when no SCORER is specified in SEARCH/AGGREGATE and C-API iterators; add assertions. Minor optimizer tweak to treat BM25STD default consistently.
  • Extensions:
    • Expose Extensions_InitDone() to defer validation until extensions are ready.
  • Tests/Mocks:
    • Add extensive pytests for default scorer config and behavior (including extension scorer), and cpptests/init hooks to set default scorer; update redismock to init/cleanup defaultScorer.

Written by Cursor Bugbot for commit d44191e. This will update automatically on new commits. Configure here.

* feat: add default scorer in configuration

* fix and test config setters and getters

* test: add flow tests checking scorer behavior

* fix: fix as per cursor comments

* fix: fix as per code review

* fix: change the init config code

* test: fix the tests

* test: refactor tests

* fix memory leak in getter

* test: add explicit test BM25STD is the default since start

* validate default scorer at module init

* fix validation against extensions

* apply changes as per review

* Update tests/cpptests/test_cpp_forkgc.cpp

* refactor small setter

* try to fix sanitizer leak

* test: Known leaks when onModuleLoad fails

* Update src/module-init/module-init.c

Co-authored-by: kei-nan <[email protected]>

---------

Co-authored-by: kei-nan <[email protected]>
(cherry picked from commit b61c075)
@JoanFM JoanFM changed the title [MOD-10037] add default scorer in configuration (#7065) [8.2] [MOD-10037] add default scorer in configuration (#7065) Oct 17, 2025
@codecov
Copy link

codecov bot commented Oct 17, 2025

Codecov Report

❌ Patch coverage is 97.18310% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.39%. Comparing base (a2e0e10) to head (d44191e).
⚠️ Report is 5 commits behind head on 8.2.

Files with missing lines Patch % Lines
src/aggregate/aggregate_request.c 66.66% 1 Missing ⚠️
src/config.c 98.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##              8.2    #7083      +/-   ##
==========================================
- Coverage   89.46%   89.39%   -0.07%     
==========================================
  Files         253      253              
  Lines       41667    40808     -859     
  Branches     3725     3725              
==========================================
- Hits        37277    36481     -796     
+ Misses       4341     4278      -63     
  Partials       49       49              
Flag Coverage Δ
flow 82.06% <92.95%> (-0.12%) ⬇️
unit 47.35% <11.26%> (-0.23%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@JoanFM JoanFM requested review from Itzikvaknin and kei-nan October 17, 2025 07:36
kei-nan
kei-nan previously approved these changes Oct 19, 2025
@JoanFM JoanFM added this pull request to the merge queue Oct 20, 2025
@JoanFM JoanFM removed this pull request from the merge queue due to a manual request Oct 20, 2025
cursor[bot]

This comment was marked as outdated.

@JoanFM JoanFM force-pushed the backport-7065-to-8.2 branch from 00f18f7 to 1038d34 Compare October 20, 2025 15:26
@JoanFM JoanFM requested review from kei-nan and oshadmi October 21, 2025 10:01
@JoanFM JoanFM closed this Oct 21, 2025
@JoanFM JoanFM reopened this Oct 21, 2025
cursor[bot]

This comment was marked as outdated.

@JoanFM JoanFM requested a review from oshadmi October 22, 2025 07:56
cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

@JoanFM JoanFM requested a review from nafraf October 28, 2025 13:14

int set_default_scorer_config(const char *name, RedisModuleString *val, void *privdata, RedisModuleString **err) {
REDISMODULE_NOT_USED(name);
bool is_enabled = RSGlobalConfig.defaultScorer == NULL || RSGlobalConfig.enableUnstableFeatures;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check RSGlobalConfig.defaultScorer == NULL and report error ?
RSGlobalConfig.defaultScorer is set later, on line 206.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the thing is that the first time it is run, it is the time where config is being read or default value is used

@JoanFM JoanFM added this pull request to the merge queue Oct 28, 2025
Merged via the queue into 8.2 with commit b4d17c4 Oct 28, 2025
19 checks passed
@JoanFM JoanFM deleted the backport-7065-to-8.2 branch October 28, 2025 18:01
alonre24 added a commit to alonre24/redis that referenced this pull request Jan 26, 2026
**Bug Fixes:**

* [redis#7219](RediSearch/RediSearch#7219) Fix a concurrency issue on Reducer in `FT.AGGREGATE`
* [redis#7255](RediSearch/RediSearch#7255) Fix `BM25STD` underflow wraparound
* [redis#7275](RediSearch/RediSearch#7275) Report used memory as `unsigned long long` to avoid underflows
* [redis#7264](RediSearch/RediSearch#7264) Fix `totalDocsLen` updates
* [redis#6995](RediSearch/RediSearch#6995) Do not fanout `FT.INFO` to replicas
* [redis#7350](RediSearch/RediSearch#7350) Fix `FT.CREATE` failure with LeanVec parameters on non-Intel architectures
* [redis#7384](RediSearch/RediSearch#7384) Fix index load from RDB temporary memory overhead
* [redis#7459](RediSearch/RediSearch#7459) Fix Fork GC potential double-free on error path
* [redis#7458](RediSearch/RediSearch#7458) Fix a GC performence regression
* [redis#7470](RediSearch/RediSearch#7470) Avoid draining worker thread pool from FLUSH callback to avoid deadlocks
* [redis#7554](RediSearch/RediSearch#7554) Handle the case where `SCORE` is sent alone without extra fields (coordinator)
* [redis#7685](RediSearch/RediSearch#7685) Fix cursor logical leak
* [redis#7794](RediSearch/RediSearch#7794) Fix `cmp_strings()` to correctly handle binary data with embedded NULLs in TOLIST reducer in FT.AGGREGATE
* [redis#7873](RediSearch/RediSearch#7873) Handle warnings in empty `FT.AGGREGATE` replies (cluster)
* [redis#7886](RediSearch/RediSearch#7886) Remove non-TEXT fields from the spec keys dictionary
* [redis#7904](RediSearch/RediSearch#7904) Refactor keys dictionary handling
* [redis#7901](RediSearch/RediSearch#7901) Support multiple warnings in reply
* [redis#8083](RediSearch/RediSearch#8083) Fix incorrect FULLTEXT field metric counts
* [redis#8153](RediSearch/RediSearch#8153) Fix configuration registration issues

**Improvements:**

* [redis#7154](RediSearch/RediSearch#7154) `FT.AGGREGATE` can return Background Indexing OOM warnings
* [redis#7083](RediSearch/RediSearch#7083) Add the default text scorer as a configuration option
* [redis#7341](RediSearch/RediSearch#7341) Rename `FT.PROFILE` counter fields
* [redis#7436](RediSearch/RediSearch#7436) Enhance `FT.PROFILE` with vector search execution details
* [redis#7435](RediSearch/RediSearch#7435) Ensure full `FT.PROFILE` output on timeout with RETURN policy
* [redis#7534](RediSearch/RediSearch#7534) Reduce the number of worker threads asynchronously to avoid deadlocks during queries
* [redis#7614](RediSearch/RediSearch#7614) Track timeout warnings and errors in INFO
* [redis#7646](RediSearch/RediSearch#7646) Track `maxprefixexpansions` warnings and errors in INFO
* [redis#7577](RediSearch/RediSearch#7577) Track query syntax/argument errors (basis for query error tracking)
* [redis#7737](RediSearch/RediSearch#7737) Add `Internal cursor reads` metric to cluster `FT.PROFILE` output
* [redis#7759](RediSearch/RediSearch#7759) Extend indexing metrics
* [redis#7710](RediSearch/RediSearch#7710) Support `WITHCOUNT` keyword in `FT.AGGREGATE`
* [redis#7957](RediSearch/RediSearch#7957) Persist query warnings across cursor reads
* [redis#8054](RediSearch/RediSearch#8054) Add logging for index-related commands
* [redis#8151](RediSearch/RediSearch#8151) Fix shard total profile time reporting in `FT.PROFILE`
* [redis#8103](RediSearch/RediSearch#8103) Output current thread IndexSpec information on crash
YaacovHazan pushed a commit to redis/redis that referenced this pull request Jan 26, 2026
**Bug Fixes:**

* [#7219](RediSearch/RediSearch#7219) Fix a
concurrency issue on Reducer in `FT.AGGREGATE`
* [#7255](RediSearch/RediSearch#7255) Fix
`BM25STD` underflow wraparound
* [#7275](RediSearch/RediSearch#7275) Report
used memory as `unsigned long long` to avoid underflows
* [#7264](RediSearch/RediSearch#7264) Fix
`totalDocsLen` updates
* [#6995](RediSearch/RediSearch#6995) Do not
fanout `FT.INFO` to replicas
* [#7350](RediSearch/RediSearch#7350) Fix
`FT.CREATE` failure with LeanVec parameters on non-Intel architectures
* [#7694](RediSearch/RediSearch#7694) Use
asynchronous jobs for GC in SVS to accelerate execution
* [#7384](RediSearch/RediSearch#7384) Fix index
load from RDB temporary memory overhead
* [#7459](RediSearch/RediSearch#7459) Fix Fork
GC potential double-free on error path
* [#7458](RediSearch/RediSearch#7458) Fix a GC
performence regression
* [#7470](RediSearch/RediSearch#7470) Avoid
draining worker thread pool from FLUSH callback to avoid deadlocks
* [#7554](RediSearch/RediSearch#7554) Handle the
case where `SCORE` is sent alone without extra fields (coordinator)
* [#7685](RediSearch/RediSearch#7685) Fix cursor
logical leak
* [#7794](RediSearch/RediSearch#7794) Fix
`cmp_strings()` to correctly handle binary data with embedded NULLs in
TOLIST reducer in FT.AGGREGATE
* [#7873](RediSearch/RediSearch#7873) Handle
warnings in empty `FT.AGGREGATE` replies (cluster)
* [#7886](RediSearch/RediSearch#7886) Remove
non-TEXT fields from the spec keys dictionary
* [#7904](RediSearch/RediSearch#7904) Refactor
keys dictionary handling
* [#7901](RediSearch/RediSearch#7901) Support
multiple warnings in reply
* [#8083](RediSearch/RediSearch#8083) Fix
incorrect FULLTEXT field metric counts
* [#8153](RediSearch/RediSearch#8153) Fix
configuration registration issues

**Improvements:**

* [#7154](RediSearch/RediSearch#7154)
`FT.AGGREGATE` can return Background Indexing OOM warnings
* [#7083](RediSearch/RediSearch#7083) Add the
default text scorer as a configuration option
* [#7341](RediSearch/RediSearch#7341) Rename
`FT.PROFILE` counter fields
* [#7436](RediSearch/RediSearch#7436) Enhance
`FT.PROFILE` with vector search execution details
* [#7435](RediSearch/RediSearch#7435) Ensure
full `FT.PROFILE` output on timeout with RETURN policy
* [#7534](RediSearch/RediSearch#7534) Reduce the
number of worker threads asynchronously to avoid deadlocks during
queries
* [#7614](RediSearch/RediSearch#7614) Track
timeout warnings and errors in INFO
* [#7646](RediSearch/RediSearch#7646) Track
`maxprefixexpansions` warnings and errors in INFO
* [#7577](RediSearch/RediSearch#7577) Track
query syntax/argument errors (basis for query error tracking)
* [#7737](RediSearch/RediSearch#7737) Add
`Internal cursor reads` metric to cluster `FT.PROFILE` output
* [#7759](RediSearch/RediSearch#7759) Extend
indexing metrics
* [#7710](RediSearch/RediSearch#7710) Support
`WITHCOUNT` keyword in `FT.AGGREGATE`
* [#7957](RediSearch/RediSearch#7957) Persist
query warnings across cursor reads
* [#8054](RediSearch/RediSearch#8054) Add
logging for index-related commands
* [#8151](RediSearch/RediSearch#8151) Fix shard
total profile time reporting in `FT.PROFILE`
* [#8103](RediSearch/RediSearch#8103) Output
current thread IndexSpec information on crash
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants