[8.6] [MOD-14732] fix race condition on hybrid by redisearch-backport-pull-request[bot] · Pull Request #9066 · RediSearch/RediSearch

redisearch-backport-pull-request · 2026-04-15T00:50:34Z

Description

Backport of #9029 to 8.6.

Describe the changes in the pull request

The race condition

ProcessHybridCursorMappings dispatches async callbacks to shards and waits for all responses. The coordinator reads numShards from the topology, allocates a shared processCursorMappingCallbackContext (ctx), and waits until responseCount == numShards before freeing ctx.

However, iterStartCb runs asynchronously on an IO thread and reads the live topology, which may have a different shard count after a migration. If the live topology has M > N shards, iterStartCb dispatches commands to all M shards, but the coordinator only waits for N responses. After receiving N responses, the coordinator frees ctx via cleanupCtx, while the remaining M - N callbacks still hold a pointer to the freed memory.

When those late callbacks fire, MRIteratorCallback_GetPrivateData(ctx) returns a dangling pointer, and the callback dereferences freed memory — reading cb_ctx->mutex, cb_ctx->searchMappings, etc.

The ASAN trace confirms this: processCursorMappingCallback (thread T4, search-uv-1) reads from the freed ctx at the very first dereference, while thread T23 (coord-5081) already freed it via cleanupCtx → rm_free.

The fix

The existing privateDataInit hook in MR_IterateWithPrivateData is called from iterStartCb on the IO thread after the actual shard count is determined from the live topology but before any commands are dispatched. We leverage this hook with a new processCursorMappingInit callback that:

Locks ctx->mutex
Updates ctx->numShards to the actual number of shards from the live topology
Signals the condition variable (in case the coordinator is already waiting)
Unlocks the mutex

The coordinator's wait loop now reads ctx->numShards (under the mutex) instead of the local numShards variable, so it dynamically picks up the updated count and waits for all dispatched callbacks before freeing ctx.

Also re-enables test_add_shard_and_migrate_hybrid_BG, which was previously skipped due to a flaky hybrid cursor migration crash (SIGSEGV) caused by this race.

Note

Medium Risk
Touches async coordination between the coordinator and shard I/O threads; mistakes could cause deadlocks or incomplete cursor mapping collection under topology changes. Change is localized to hybrid cursor mapping initialization and wait logic.

Overview
Prevents rare coordinator crashes in distributed FT.HYBRID queries during shard migration/topology changes by making cursor-mapping collection wait on the actual shard count determined on the I/O thread.

ProcessHybridCursorMappings now initializes numShards via a new private-data init callback and waits until all dispatched shard replies arrive before freeing shared callback state, eliminating a use-after-free race.

^{Reviewed by Cursor Bugbot for commit 845231d. Bugbot is set up for automated code reviews on this repo. Configure here.}

* fix race condition on hybrid * remove unrelated file * use Strong/WeakRef instead of refCount * fix the real issue, count proper numShards * fix by setting numShards from IO thread * rephrase * fix potential problem if shards can be 0 * use initialized flag * update comment * remove unused numShards (cherry picked from commit 3bb5f03)

jit-ci · 2026-04-15T00:51:58Z

🛡️ Jit Security Scan Results

✅ No security findings were detected in this PR

^{Security scan by Jit}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 845231d. Configure here.}

cursor · 2026-04-15T00:56:36Z

+    // Signal so the coordinator can re-check the wait condition.
+    pthread_cond_signal(ctx->completionCond);
+    pthread_mutex_unlock(ctx->mutex);
+}


Missing GitHub release note for this PR

Low Severity

Per project rules, a GitHub release note is needed. Suggested release note: "Fixed a race condition in hybrid queries that could cause a crash (use-after-free) when the cluster topology changes during a shard migration while a hybrid query is being processed."

^{Triggered by project rule: Please assist in writing a GitHub release note for this PR, which is concise and focused on the user impact.}

^{Reviewed by Cursor Bugbot for commit 845231d. Configure here.}

sonarqubecloud · 2026-04-15T00:59:19Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

codecov · 2026-04-15T01:20:06Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.89%. Comparing base (2b7a93a) to head (845231d).
⚠️ Report is 1 commits behind head on 8.6.

Additional details and impacted files

@@            Coverage Diff             @@
##              8.6    #9066      +/-   ##
==========================================
+ Coverage   83.85%   83.89%   +0.03%     
==========================================
  Files         367      367              
  Lines       55818    55825       +7     
  Branches    14322    14322              
==========================================
+ Hits        46805    46833      +28     
+ Misses       8851     8830      -21     
  Partials      162      162

Flag	Coverage Δ
flow	`84.90% <100.00%> (+0.07%)`	⬆️
unit	`50.69% <0.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

redisearch-backport-pull-request Bot added the size:S label Apr 15, 2026

redisearch-backport-pull-request Bot assigned JoanFM Apr 15, 2026

redisearch-backport-pull-request Bot enabled auto-merge April 15, 2026 00:50

redisearch-backport-pull-request Bot mentioned this pull request Apr 15, 2026

[MOD-14732] fix race condition on hybrid #9029

Merged

redisearch-backport-pull-request Bot requested a review from JoanFM April 15, 2026 00:50

cursor Bot reviewed Apr 15, 2026

View reviewed changes

nafraf approved these changes Apr 15, 2026

View reviewed changes

redisearch-backport-pull-request Bot added this pull request to the merge queue Apr 15, 2026

Merged via the queue into 8.6 with commit 40a7055 Apr 15, 2026
51 of 53 checks passed

redisearch-backport-pull-request Bot deleted the backport-9029-to-8.6 branch April 15, 2026 06:06

YaacovHazan mentioned this pull request May 5, 2026

Redis 8.6.3 redis/redis#15170

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[8.6] [MOD-14732] fix race condition on hybrid#9066

[8.6] [MOD-14732] fix race condition on hybrid#9066
redisearch-backport-pull-request[bot] merged 1 commit into8.6from
backport-9029-to-8.6

redisearch-backport-pull-request Bot commented Apr 15, 2026 •

edited by cursor Bot

Loading

Uh oh!

jit-ci Bot commented Apr 15, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Apr 15, 2026

Uh oh!

sonarqubecloud Bot commented Apr 15, 2026

Uh oh!

codecov Bot commented Apr 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

redisearch-backport-pull-request Bot commented Apr 15, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Describe the changes in the pull request

The race condition

The fix

Uh oh!

jit-ci Bot commented Apr 15, 2026

🛡️ Jit Security Scan Results

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Apr 15, 2026

Choose a reason for hiding this comment

Missing GitHub release note for this PR

Uh oh!

sonarqubecloud Bot commented Apr 15, 2026

Quality Gate passed

Uh oh!

codecov Bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

redisearch-backport-pull-request Bot commented Apr 15, 2026 •

edited by cursor Bot

Loading

codecov Bot commented Apr 15, 2026 •

edited

Loading