Skip to content

Handle ASM Notifications - [MOD-11120]#7022

Merged
GuyAv46 merged 39 commits intomasterfrom
guyav-ASM_notifications
Oct 23, 2025
Merged

Handle ASM Notifications - [MOD-11120]#7022
GuyAv46 merged 39 commits intomasterfrom
guyav-ASM_notifications

Conversation

@GuyAv46
Copy link
Collaborator

@GuyAv46 GuyAv46 commented Oct 9, 2025

Subscribe to new notifications and implement expected behavior

This PR relies on previous preparations made in:

  1. Support multiple slot-ranges per shard - [MOD-11613] #6898
  2. Serialize and Restore Schema command - [MOD-11803] #7032

The main logic required from the module to support the ASM feature is:

  1. Propagate auxiliary data to the destination node
  2. Filter documents from queries while the event is ongoing, if the document doesn't belong to an owned slot anymore

We need to propagate index schema definitions and spell-check dictionaries.

For the dictionaries, we use the existing local variant of FT.DICTADD that can handle an entire dictionary in one command, and doesn't fail if the dictionary already exists (we expect 0 to be returned in this case, indicating the entire word set was already present)

For the schema propagation, we use the new internal and local FT._RESTOREIFNX SCHEMA command (see #7032 for more info). This command gets a serialized representation of the schema (similar to Redis RESTORE command) and creates the schema at the destination node if it doesn't exist. Under a valid trigger, this command always returns "OK" (even if the index already exists)

We still rely on key notifications to index documents, and we subscribe to a new key_trim notification for the trimming

Thread Safety - Since we have the multi-threading feature for querying, we can't use the new RedisModule_ClusterCanAccessKeysInSlot as it requires holding the GIL.
Instead, we implement SharedSlotRangeArray in slot_ranges.c, which we set and update from the main thread, and take a thread-safe reference before moving to the background. Slots info may be slightly outdated, but since we currently can't guarantee full and exclusive slot coverage for a query (to be solved in a following release), this is done as a best effort. The decision to use this info to filter results depends on the migration events.

Mark if applicable

  • This PR introduces API changes
  • This PR introduces serialization changes

Note

Adds ASM event handling with slot-based query filtering and schema/dictionary propagation, introduces cached local slot ranges, and updates workers, pipeline, and tests.

  • Cluster/Slots:
    • Add slot_ranges.{c,h} for cached local slot ranges with refcounting and access checks; drop cache on topology/migration changes.
    • Subscribe to ASM events (import/migrate/trim); toggle slot filtering and manage worker threads; add KEY_TRIMMED notification.
    • Drop cached slots on topology updates.
  • Query/Pipeline:
    • Thread-safe capture of local slot ranges in AREQ and pass through QueryPipelineParams to RPQueryIterator.
    • RPQueryIterator filters results by owned slots when enabled; manages slot range lifetime.
    • Adjust pipeline construction signatures and ownership transfers.
  • Data Propagation:
    • Implement Indexes_Propagate and Dictionary_Propagate; propagate schemas (_FT._RESTOREIFNX SCHEMA) and dictionaries (_FT.DICTADD) during ASM migrate.
  • Dictionary:
    • Simplify Dictionary_Add API; fix delete arg types; add Dictionary_Size.
  • Workers:
    • Make event counter re-entrant; log pool size changes; workersThreadPool_OnEventEnd returns status and guards waiting.
  • Misc:
    • Initialize keyspace notifications after spec deserialize; minor log format fix.
    • RedisMock: add cluster APIs and plumbing to record propagated commands and slot ranges.
  • Tests/CI:
    • Add C++ tests for schema/dict propagation and slot ranges; add ASM pytests (some skipped); tweak GH workflows to use unstable Redis tag temporarily.
  • Submodules:
    • Update deps/RedisModulesSDK pointer.

Written by Cursor Bugbot for commit e8da26a. This will update automatically on new commits. Configure here.

@github-actions github-actions bot added the size:S label Oct 9, 2025
@GuyAv46 GuyAv46 force-pushed the guyav-ASM_notifications branch from 63231b1 to 6ab0481 Compare October 13, 2025 11:54
@GuyAv46 GuyAv46 removed the size:S label Oct 13, 2025
@GuyAv46 GuyAv46 changed the base branch from master to guyav-serialize_schema_to_command October 13, 2025 11:55
@GuyAv46 GuyAv46 removed the size:L label Oct 13, 2025
@GuyAv46 GuyAv46 requested a review from tezc October 13, 2025 12:42
@GuyAv46 GuyAv46 force-pushed the guyav-ASM_notifications branch 6 times, most recently from c89f550 to a538911 Compare October 13, 2025 19:52
@GuyAv46 GuyAv46 added enforce:coverage Run coverage flow even on draft pull request enforce:sanitize Run sanitizer flow even on draft pull request labels Oct 14, 2025
@GuyAv46 GuyAv46 marked this pull request as ready for review October 14, 2025 14:20
@GuyAv46 GuyAv46 marked this pull request as draft October 14, 2025 14:20
@codecov
Copy link

codecov bot commented Oct 14, 2025

Codecov Report

❌ Patch coverage is 93.12500% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.56%. Comparing base (d343ba8) to head (e8da26a).
⚠️ Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
src/notifications.c 81.63% 9 Missing ⚠️
src/dictionary.c 97.29% 1 Missing ⚠️
src/spec.c 95.23% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7022      +/-   ##
==========================================
- Coverage   85.58%   85.56%   -0.03%     
==========================================
  Files         327      328       +1     
  Lines       51985    52121     +136     
  Branches    11133    11133              
==========================================
+ Hits        44492    44597     +105     
- Misses       7325     7356      +31     
  Partials      168      168              
Flag Coverage Δ
flow 84.16% <64.37%> (-0.25%) ⬇️
unit 51.81% <56.25%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@GuyAv46 GuyAv46 force-pushed the guyav-ASM_notifications branch from c62a603 to 012f9f7 Compare October 15, 2025 15:03
Base automatically changed from guyav-serialize_schema_to_command to master October 17, 2025 23:15
@GuyAv46 GuyAv46 force-pushed the guyav-ASM_notifications branch from 012f9f7 to c8bcfc2 Compare October 19, 2025 11:14
Copy link
Collaborator

@alonre24 alonre24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SHAPO

@GuyAv46 GuyAv46 requested a review from tezc October 19, 2025 16:22
@GuyAv46 GuyAv46 removed this pull request from the merge queue due to a manual request Oct 23, 2025
@GuyAv46 GuyAv46 added this pull request to the merge queue Oct 23, 2025
@GuyAv46 GuyAv46 removed this pull request from the merge queue due to a manual request Oct 23, 2025
@GuyAv46 GuyAv46 added this pull request to the merge queue Oct 23, 2025
@GuyAv46 GuyAv46 removed this pull request from the merge queue due to a manual request Oct 23, 2025
@GuyAv46 GuyAv46 added this pull request to the merge queue Oct 23, 2025
@GuyAv46 GuyAv46 removed this pull request from the merge queue due to a manual request Oct 23, 2025
@GuyAv46 GuyAv46 added this pull request to the merge queue Oct 23, 2025
@GuyAv46 GuyAv46 removed this pull request from the merge queue due to a manual request Oct 23, 2025
@JoanFM JoanFM added this pull request to the merge queue Oct 23, 2025
@GuyAv46 GuyAv46 removed this pull request from the merge queue due to a manual request Oct 23, 2025
@cursor

This comment was marked as resolved.

@GuyAv46 GuyAv46 enabled auto-merge October 23, 2025 13:49
@GuyAv46 GuyAv46 added this pull request to the merge queue Oct 23, 2025
Merged via the queue into master with commit 3bd770e Oct 23, 2025
19 checks passed
@GuyAv46 GuyAv46 deleted the guyav-ASM_notifications branch October 23, 2025 16:57
@redisearch-backport-pull-request

This comment was marked as outdated.

@GuyAv46
Copy link
Collaborator Author

GuyAv46 commented Oct 24, 2025

/backport after also backporting #7129, which was missing.

redisearch-backport-pull-request bot pushed a commit that referenced this pull request Oct 24, 2025
* initial registration implementation

* Set up most of the notification events

* address review comment and add logs

* implement propagation

* minor improvements

* small fix

* review fixes

* [TEMP] set CI for new API

* dictionary tidy up

* add first test

* fix helper and test for duplicates

* Add missing sub-command

* add unit-tests for coverage

* change workersThreadPool_OnEvent* behavior to handle overlapping events

* rename events after API update

* implement managed and thread-safe slot ranges API

* drop cached API when needed

* rename API and use in the pipeline

* move includes to C file

* skip current flow test, and introduce a stable one

* fix initial reference count

* implement mock and add a test

* add missing key-space notification subscription

* add a test with adding a new node and propagating the schema

* review fixes

* fix assertion

* leak fix

* use malloc

* leak actual fix

* don't drop the cache on failed events

* add BG variants to flow tests

* update SDK

* Revert "[TEMP] set CI for new API"

This reverts commit 9d2d165.

* use new API RedisModule_ClusterKeySlotC

* run against redis unstable until 8.4 is released

* fix test flakiness - wait for both shards

* increase node timeout for ASM tests

* try adding a sleep

* increase to 5

(cherry picked from commit 3bd770e)
@redisearch-backport-pull-request
Copy link
Contributor

Successfully created backport PR for 8.4:

github-merge-queue bot pushed a commit that referenced this pull request Oct 24, 2025
Handle ASM Notifications - [MOD-11120] (#7022)

* initial registration implementation

* Set up most of the notification events

* address review comment and add logs

* implement propagation

* minor improvements

* small fix

* review fixes

* [TEMP] set CI for new API

* dictionary tidy up

* add first test

* fix helper and test for duplicates

* Add missing sub-command

* add unit-tests for coverage

* change workersThreadPool_OnEvent* behavior to handle overlapping events

* rename events after API update

* implement managed and thread-safe slot ranges API

* drop cached API when needed

* rename API and use in the pipeline

* move includes to C file

* skip current flow test, and introduce a stable one

* fix initial reference count

* implement mock and add a test

* add missing key-space notification subscription

* add a test with adding a new node and propagating the schema

* review fixes

* fix assertion

* leak fix

* use malloc

* leak actual fix

* don't drop the cache on failed events

* add BG variants to flow tests

* update SDK

* Revert "[TEMP] set CI for new API"

This reverts commit 9d2d165.

* use new API RedisModule_ClusterKeySlotC

* run against redis unstable until 8.4 is released

* fix test flakiness - wait for both shards

* increase node timeout for ASM tests

* try adding a sleep

* increase to 5

(cherry picked from commit 3bd770e)

Co-authored-by: GuyAv46 <[email protected]>
YaacovHazan pushed a commit to redis/redis that referenced this pull request Oct 27, 2025
(RediSearch/RediSearch#7076,
RediSearch/RediSearch#6857) - Introducing
`FT.HYBRID`, a new command that enables hybrid queries combining both
text and vector search, with support for **RRF** and **LINEAR** result
combination. This update enhances performance and reliability through a
more efficient networking layer, smoother query execution, and improved
overall stability.
(RediSearch/RediSearch#7065) - Add
`search-default-scorer` configuration to set the default text scorer
across queries (by default it is BM25).
RediSearch/RediSearch#7022 - Handle Atomic Slot
Migration events upon moving slots from one node to another in a cluster
mode.
(RediSearch/RediSearch#6769,
RediSearch/RediSearch#6828,
RediSearch/RediSearch#6877,
RediSearch/RediSearch#6921) - Introduce query
time memory guardrails by adding a new `search-on-oom` configuration
which defines the query engine behavior when OOM (Out Of Memory) is
reached. OOM checks are applied to `FT.SEARCH`, `FT.AGGREGATE`, and
`FT.HYBRID` commands. The behavior on OOM can be configured to one of
three modes: `IGNORE`, `FAIL`, or `RETURN`.
`IGNORE` - The default behavior, run queries anyway (not recommended for
heavy queries returning a large result set).
`FAIL` - Fail query execution immediately if any of the nodes are in OOM
state when query execution starts.
`RETURN` - A best effort appraoch to return partial results when OOM is
detected in only some of the nodes in a cluster mode.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 8.4 enforce:coverage Run coverage flow even on draft pull request enforce:sanitize Run sanitizer flow even on draft pull request size:L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants