Skip to content

[8.4] ASM Slots Protocol Communication - [MOD-11988, MOD-12308]#7292

Merged
redisearch-backport-pull-request[bot] merged 4 commits into8.4from
backport-7253-to-8.4
Nov 11, 2025
Merged

[8.4] ASM Slots Protocol Communication - [MOD-11988, MOD-12308]#7292
redisearch-backport-pull-request[bot] merged 4 commits into8.4from
backport-7253-to-8.4

Conversation

@redisearch-backport-pull-request
Copy link
Contributor

@redisearch-backport-pull-request redisearch-backport-pull-request bot commented Nov 10, 2025

Description

Backport of #7253 to 8.4.


Note

Implements end-to-end slot-range serialization, embeds per-shard slots into distributed commands, parses/enforces _SLOTS_INFO for internal queries, and wires slot awareness through topology, pipeline, and tests.

  • Coordinator & Command Building:
    • Add MRCommand_PrepareForSlotInfo/MRCommand_SetSlotInfo to embed _SLOTS_INFO binary into commands; maintain index on insert/copy.
    • Inject slots into _FT.SEARCH/_FT.AGGREGATE/_FT.HYBRID fanouts and iterators per target shard.
    • Require _SLOTS_INFO for internal requests; validate and deserialize in aggregate/hybrid parsers.
  • Topology & Cluster:
    • Track shard slot ranges in MRClusterTopology/MRClusterShard; clone/free safely.
    • Parse slots from CLUSTER SHARDS (OSS) and REDISE CLUSTERSET; pass local slots to MR_UpdateTopology.
  • Request/Pipeline:
    • Store querySlots/slotsVersion in AREQ and pass to RPQueryIterator; free on cleanup.
    • Hybrid optional args accept slots; propagate to subqueries.
  • Slot Ranges API:
    • Add LE serialization/deserialization, clone, and size helpers; cache local slots.
  • Misc:
    • Use uint32_t counters for timeouts/loops; minor type tidy-ups.
  • Tests:
    • Add C++/C/py tests for command slot embedding, slot (de)serialization, hybrid parsing/cursors, ACL/internal command gating, and error cases.

Written by Cursor Bugbot for commit 2e05cb6. This will update automatically on new commits. Configure here.

* initial registration implementation

* Set up most of the notification events

* address review comment and add logs

* implement propagation

* minor improvements

* small fix

* review fixes

* [TEMP] set CI for new API

* dictionary tidy up

* add first test

* fix helper and test for duplicates

* Add missing sub-command

* add unit-tests for coverage

* change workersThreadPool_OnEvent* behavior to handle overlapping events

* rename events after API update

* implement managed and thread-safe slot ranges API

* drop cached API when needed

* rename API and use in the pipeline

* move includes to C file

* skip current flow test, and introduce a stable one

* fix initial reference count

* implement mock and add a test

* add missing key-space notification subscription

* add a test with adding a new node and propagating the schema

* review fixes

* fix assertion

* leak fix

* use malloc

* leak actual fix

* don't drop the cache on failed events

* add BG variants to flow tests

* joan add comments to place relevant code places

* initial registration implementation

* Set up most of the notification events

* address review comment and add logs

* implement propagation

* minor improvements

* small fix

* review fixes

* [TEMP] set CI for new API

* dictionary tidy up

* add first test

* fix helper and test for duplicates

* Add missing sub-command

* add unit-tests for coverage

* change workersThreadPool_OnEvent* behavior to handle overlapping events

* rename events after API update

* implement managed and thread-safe slot ranges API

* drop cached API when needed

* rename API and use in the pipeline

* move includes to C file

* skip current flow test, and introduce a stable one

* fix initial reference count

* implement mock and add a test

* add missing key-space notification subscription

* add a test with adding a new node and propagating the schema

* review fixes

* fix assertion

* leak fix

* use malloc

* leak actual fix

* don't drop the cache on failed events

* add BG variants to flow tests

* add more TODO signaling where code needs to change

* add serialization and deserialization in binary format and testing

* test: add more tests

* test: some adaptation to tests

* add MRCommand method to add slot ranges info

* test: change some tests

* improve the testing of Command

* add parametrization

* test: remove some parts of the command cpp tests

* remvoe some comments

* fix: protect in case no slotRanges

* test: add tests for HumanReadable capacity

* some refactoring

* fix problems merging

* fix problems merging

* fix problems merging

* fix problems merging

* fix problems merging

* refactor do not include num_ranges into the binary serialization

* add coordSlotRanges struct attribute

* code to parse the coordSlotRanges

* add the building block to parse in private Hybrid Requests

* test: fix test

* handle proper parsing

* test: add AREQ_Compile tests to see how slot ranges are properly parsed

* fix: fix serialize issue

* test: fix memory leak in test

* fix: change serialization to little endian

* fix: remove size sending from MR build command

* test: parametrize test to cover more cases

* add parsing from CLUSTER SHARDS output

* fix: fix issue not initialized coordSlotRanges

* handle CLUSTERSET parsing

* fix: avoid sending SLOTS for FT CURSOR and fix hybrid parsing

* fix: fix double free error

* fix: fix issue with DEBUG_COUNT on Debug commands

* fix issues with DEBUG command

* remove leftover TODO

* simplify SlotRangesArray_Serialize and SlotRangesArray_Deserialize

* refactor parsing and handling slots info

* cleanup

* revert unnecessary debug command changes

* verify arg is internal and mandatory

* align cluster slot parsing and make preparations for local refresh

* cleanup cluster topology

* add MRCommand_PrepareForSlotInfo and MRCommand_SetSlotInfo

* prepare search, aggregate, and hybrid

* set slots info for search, aggregate and hybrid

* fix warnings

* fix tests to new API

* fix tests

* fix hybrid slot parsing

* fix error codes after merging master

* improve comments and fix a leak in test

* allocation fix

* add slots info to tests of internal commands

* add asserts

* extend tests

* fix MRCommand_SetSlotInfo not affecting cached commands

* minor test improvements

* move SlotRangesArray_Deserialize validation to before any allocation is made

* rename _SLOTS to _SLOTS_INFO

* add asserts and comments

* wrap memmove logic in a macro

* test slotRanges on cloning test

* rename slotRanges_ to querySlots

* minor cleanup

---------

Co-authored-by: Joan Fontanals Martinez <[email protected]>
(cherry picked from commit e2ea609)
@GuyAv46 GuyAv46 requested a review from JoanFM November 11, 2025 06:58
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Missing Slot Info Breaks Cluster Cursors

The iterCursorMappingCb function creates _FT.CURSOR READ commands but doesn't prepare or set slot information on them, unlike iterStartCb which properly calls MRCommand_PrepareForSlotInfo and MRCommand_SetSlotInfo. Since MRCommand_SetSlotInfo requires slotsInfoArgIndex to be set (asserts cmd->slotsInfoArgIndex > 0), cursor commands need MRCommand_PrepareForSlotInfo called during creation and MRCommand_SetSlotInfo called before sending to shards. Without this, cursor READ commands in cluster mode will lack required slot range information for proper shard routing and validation.

src/coord/rmr/rmr.c#L582-L640

// Separate callback for cursor mapping that creates FT.CURSOR READ commands for each shard
void iterCursorMappingCb(void *p) {
IteratorData *data = (IteratorData *)p;
MRIterator *it = data->it;
StrongRef mappingsRef = WeakRef_Promote(data->privateDataRef);
WeakRef_Release(data->privateDataRef);
CursorMappings *vsimOrSearch = StrongRef_Get(mappingsRef);
if (!vsimOrSearch) {
// Cursor mappings have been freed - cannot proceed with command dispatch.
// Release the iterator to decrement its reference count and trigger cleanup.
// This handles the case where we abort before sending commands to any shards.
MRIterator_Release(it);
rm_free(data);
return;
}
IORuntimeCtx *io_runtime_ctx = it->ctx.ioRuntime;
const size_t numShardsWithMapping = array_len(vsimOrSearch->mappings);
RS_ASSERT(numShardsWithMapping > 0);
it->len = numShardsWithMapping;
it->ctx.pending = numShardsWithMapping;
it->ctx.inProcess = numShardsWithMapping; // Initially all commands are in process
it->cbxs = rm_realloc(it->cbxs, numShardsWithMapping * sizeof(*it->cbxs));
MRCommand *cmd = &it->cbxs->cmd;
cmd->targetShard = vsimOrSearch->mappings[0].targetShard;
char buf[128];
sprintf(buf, "%lld", vsimOrSearch->mappings[0].cursorId);
MRCommand_Append(cmd, buf, strlen(buf));
// Create FT.CURSOR READ commands for each mapping
for (size_t i = 1; i < numShardsWithMapping; i++) {
it->cbxs[i].it = it;
it->cbxs[i].privateData = MRIteratorCallback_GetPrivateData(&it->cbxs[0]);
it->cbxs[i].cmd = MRCommand_Copy(cmd);
it->cbxs[i].cmd.targetShard = vsimOrSearch->mappings[i].targetShard;
it->cbxs[i].cmd.num = 4;
char buf[128];
sprintf(buf, "%lld", vsimOrSearch->mappings[i].cursorId);
MRCommand_ReplaceArg(&it->cbxs[i].cmd, 3, buf, strlen(buf));
}
// Send commands to all shards
for (size_t i = 0; i < it->len; i++) {
if (MRCluster_SendCommand(io_runtime_ctx, &it->cbxs[i].cmd,
mrIteratorRedisCB, &it->cbxs[i]) == REDIS_ERR) {
MRIteratorCallback_Done(&it->cbxs[i], 1);
}
}
//Clean up the StrongRef and allocated memory
StrongRef_Release(mappingsRef);
rm_free(data);
}

Fix in Cursor Fix in Web


JoanFM
JoanFM previously approved these changes Nov 11, 2025
JoanFM
JoanFM previously approved these changes Nov 11, 2025
@codecov
Copy link

codecov bot commented Nov 11, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.74%. Comparing base (cd118a2) to head (2e05cb6).
⚠️ Report is 6 commits behind head on 8.4.

Additional details and impacted files
@@            Coverage Diff             @@
##              8.4    #7292      +/-   ##
==========================================
+ Coverage   85.65%   85.74%   +0.09%     
==========================================
  Files         326      329       +3     
  Lines       50934    52112    +1178     
  Branches    10921    12001    +1080     
==========================================
+ Hits        43628    44686    +1058     
- Misses       7140     7259     +119     
- Partials      166      167       +1     
Flag Coverage Δ
flow 84.30% <98.21%> (-0.01%) ⬇️
unit 52.68% <60.71%> (+0.94%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@redisearch-backport-pull-request redisearch-backport-pull-request bot added this pull request to the merge queue Nov 11, 2025
Merged via the queue into 8.4 with commit 3b34aa2 Nov 11, 2025
19 checks passed
@redisearch-backport-pull-request redisearch-backport-pull-request bot deleted the backport-7253-to-8.4 branch November 11, 2025 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants