Skip to content

Fix internal cursors not being deleted immedietly in cluster mode - [MOD-12493]#7399

Merged
GuyAv46 merged 6 commits intomasterfrom
guyav-fix_command_rewrite
Nov 20, 2025
Merged

Fix internal cursors not being deleted immedietly in cluster mode - [MOD-12493]#7399
GuyAv46 merged 6 commits intomasterfrom
guyav-fix_command_rewrite

Conversation

@GuyAv46
Copy link
Collaborator

@GuyAv46 GuyAv46 commented Nov 18, 2025

Refresh Command's Cached sds on Every Modification

Since version 8.4, we have started reusing commands to avoid reallocating on every reuse (for example, internal cursor read, which remains constant between calls).
In some cases, a cursor del command is sent to clean leftover cursors at the shards when the user command is finished (or a user cursor was deleted).

Because the command modification was done on the command's argv (strs) directly, if the command was already used, the argv was not parsed again. The cached command (generated when the command is sent) was triggered instead, so if the cursors were not depleted by this read instead of del, the cursor would remain orfand.
This is not too bad, as we eventually will clean these cursors when they are idle for long enough, but they could cause us to reach the cursor limit per index faster, increase memory spikes, and it was not the intended behavior.

In this PR, we add a cache drop on every command modification (through the command API) and use it for the detected scenario. This will also ensure that no similar bugs will happen in the future.

Mark if applicable

  • This PR introduces API changes
  • This PR introduces serialization changes

Note

Invalidate MRCommand's cached sds on every argv mutation and use safe arg replacement so pending shard cursors get proper DELs; add regression test.

  • Coordinator/Command handling:
    • Introduce dropCachedCmdIfNeeded and call it from MRCommand_Free, assignStr, and MRCommand_ReplaceArgNoDup to invalidate cached cmd->cmd after any mutation in src/coord/rmr/command.c.
    • Add assertion in MRCommand_ReplaceArgSubstring to prevent in-place edits after cached command creation.
    • Remove ad-hoc cache drops (e.g., in MRCommand_SetSlotInfo) now covered centrally.
  • Cursor flow:
    • In getCursorCommand and iterator release logic, replace manual string edits (strcpy/lens tweaks) with MRCommand_ReplaceArg to switch READDEL/PROFILE, ensuring cache is refreshed (src/coord/hybrid/dist_utils.c, src/coord/rmr/rmr.c).
  • Tests:
    • Add test_mod_12493 validating internal shard cursors receive exactly one READ then a DEL, and are fully deleted; checks COMMANDSTATS and cursor_stats (tests/pytests/test_issues.py).

Written by Cursor Bugbot for commit da786d2. This will update automatically on new commits. Configure here.

@GuyAv46 GuyAv46 marked this pull request as ready for review November 18, 2025 16:20
@GuyAv46 GuyAv46 changed the title refresh cache on every modification Fix internal cursor not being deleted immedietly in cluster mode - [MOD-12493] Nov 18, 2025
@GuyAv46 GuyAv46 changed the title Fix internal cursor not being deleted immedietly in cluster mode - [MOD-12493] Fix internal cursors not being deleted immedietly in cluster mode - [MOD-12493] Nov 18, 2025
@codecov
Copy link

codecov bot commented Nov 18, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.86%. Comparing base (38408d7) to head (da786d2).
⚠️ Report is 12 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7399      +/-   ##
==========================================
+ Coverage   84.81%   84.86%   +0.04%     
==========================================
  Files         346      347       +1     
  Lines       53420    53684     +264     
  Branches    14039    14292     +253     
==========================================
+ Hits        45309    45558     +249     
- Misses       7915     7930      +15     
  Partials      196      196              
Flag Coverage Δ
flow 84.90% <100.00%> (-0.04%) ⬇️
unit 52.36% <75.00%> (+0.22%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@meiravgri meiravgri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comment + linking this pr to remember to remove the clean ups in dist_utils once merged
#7345

@GuyAv46 GuyAv46 force-pushed the guyav-fix_command_rewrite branch from fbe64f5 to 151a77e Compare November 20, 2025 06:01
@GuyAv46 GuyAv46 requested a review from meiravgri November 20, 2025 09:48
@JoanFM
Copy link
Collaborator

JoanFM commented Nov 20, 2025

Just for info, should we run a Benchmark scenario?

@fcostaoliveira
Copy link
Contributor

Automated performance analysis summary

This comment was automatically generated given there is performance data available.

In summary:

  • Detected a total of 5 stable tests between versions.

You can check a comparison in detail via the grafana link

##

Details

Tests with No Significant Changes (5 tests)

Tests with No Significant Changes

Test Case Baseline master (median obs. +- std.dev) Comparison guyav-fix_command_rewrite (median obs. +- std.dev) % change (higher-better) Note
ftsb-10K-enwiki_abstract-hashes-term-suffix-withsuffixtrie 16363 +- 1.6% (7 datapoints) 16429 0.4% No Change
ftsb-1M-nyc_taxis-ftadd-load 28739 +- 3.2% (7 datapoints) 27772 -3.4% potential REGRESSION
search-ftsb-10K-enwiki_abstract-hashes-term-withsuffix-trie 14472 +- 1.3% (7 datapoints) 14023 -3.1% potential REGRESSION
vecsim-arxiv-titles-384-angular-filters-m16-ef-128-fulltext-filter 574 +- 3.4% (7 datapoints) 570 -0.8% No Change
vecsim-arxiv-titles-384-angular-filters-m16-ef-128-tag-filter 16061 +- 1.4% (7 datapoints) 15591 -2.9% No Change

@fcostaoliveira
Copy link
Contributor

Automated performance analysis summary

This comment was automatically generated given there is performance data available.

In summary:

  • Detected a total of 6 stable tests between versions.
  • Detected a total of 1 regressions bellow the regression water line 8.0%.

You can check a comparison in detail via the grafana link

Performance Regressions and Issues - Comparison between master and guyav-fix_command_rewrite.

Time Period from 30 days ago. (environment used: oss-standalone)

Test Case Baseline master (median obs. +- std.dev) Comparison guyav-fix_command_rewrite (median obs. +- std.dev) % change (higher-better) Note
vecsim-arxiv-titles-384-angular-filters-m16-ef-128-numeric-filter 156 +- 6.0% (7 datapoints) 114 -26.8% REGRESSION
Tests with No Significant Changes (6 tests)

Tests with No Significant Changes

Test Case Baseline master (median obs. +- std.dev) Comparison guyav-fix_command_rewrite (median obs. +- std.dev) % change (higher-better) Note
ftsb-10K-enwiki_abstract-hashes-term-suffix-withsuffixtrie 16363 +- 1.6% (7 datapoints) 16429 0.4% No Change
ftsb-1M-nyc_taxis-ftadd-load 28739 +- 3.2% (7 datapoints) 27772 -3.4% potential REGRESSION
search-ftsb-10K-enwiki_abstract-hashes-term-withsuffix-trie 14472 +- 1.3% (7 datapoints) 14023 -3.1% potential REGRESSION
search-numeric-sortby-optimize 27 +- 5.6% (7 datapoints) 27 2.6% No Change
vecsim-arxiv-titles-384-angular-filters-m16-ef-128-fulltext-filter 574 +- 3.4% (7 datapoints) 570 -0.8% No Change
vecsim-arxiv-titles-384-angular-filters-m16-ef-128-tag-filter 16061 +- 1.4% (7 datapoints) 15591 -2.9% No Change

@fcostaoliveira
Copy link
Contributor

fcostaoliveira commented Nov 20, 2025

Automated performance analysis summary

This comment was automatically generated given there is performance data available.

In summary:

You can check a comparison in detail via the grafana link

Performance Improvements - Comparison between master and guyav-fix_command_rewrite.

Time Period from 30 days ago. (environment used: oss-standalone)

Test Case Baseline master (median obs. +- std.dev) Comparison guyav-fix_command_rewrite (median obs. +- std.dev) % change (higher-better) Note
ftsb-1M-enwiki_abstract-hashes-load 20855 +- 7.9% (7 datapoints) 23222 11.3% IMPROVEMENT
search-high-cardinality-negation-term-baseline 30 +- 5.8% (7 datapoints) 33 9.8% IMPROVEMENT

Performance Regressions and Issues - Comparison between master and guyav-fix_command_rewrite.

Time Period from 30 days ago. (environment used: oss-standalone)

Test Case Baseline master (median obs. +- std.dev) Comparison guyav-fix_command_rewrite (median obs. +- std.dev) % change (higher-better) Note
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-intersection-query-non-sortable 38 +- 12.7% UNSTABLE (7 datapoints) 23 -40.4% UNSTABLE (baseline high variance); server: FT.SEARCH p50 increased 22.0% (baseline CV=12.0%); client: OverallQuantiles.allCommands.q50 increased 14.9% (baseline CV=13.0%)
vecsim-arxiv-titles-384-angular-filters-m16-ef-128-numeric-filter 156 +- 6.0% (7 datapoints) 114 -26.8% REGRESSION
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-intersection-query 332 +- 11.0% UNSTABLE (7 datapoints) 269 -18.8% UNSTABLE (baseline high variance); server: FT.SEARCH p50 increased 34.3% (baseline CV=16.2%); client: OverallQuantiles.allCommands.q50 increased 21.7% (baseline CV=11.7%)
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-union-query 2740 +- 6.7% (7 datapoints) 2386 -12.9% REGRESSION
search-numeric-sortby-desc 2366 +- 24.7% UNSTABLE (7 datapoints) 2127 -10.1% UNSTABLE (baseline high variance); server: FT.SEARCH p50 increased 9.9% (baseline CV=16.6%); client: Latency increased 11.2% (baseline CV=16.4%); only client side confirms regression (server side stable) - insufficient evidence
ftsb-1M-enwiki_abstract-hashes-fulltext-simple-1word-query 999 +- 18.3% UNSTABLE (7 datapoints) 975 -2.4% UNSTABLE (baseline high variance); server: p50 latency stable; client: client latency stable; neither server nor client side confirms regression
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-union-query-non-sortable 1022 +- 10.0% UNSTABLE (7 datapoints) 1106 8.3% UNSTABLE (baseline high variance); server: p50 latency stable; client: client latency stable; neither server nor client side confirms regression
search-numeric-sortby 2425 +- 29.4% UNSTABLE (7 datapoints) 3703 52.7% UNSTABLE (baseline high variance); server: FT.SEARCH p50 decreased 35.5% (baseline CV=23.9%); client: Latency decreased 34.5% (baseline CV=23.0%); neither server nor client side confirms regression
search-numeric 2390 +- 28.9% UNSTABLE (7 datapoints) 3793 58.7% UNSTABLE (baseline high variance); server: FT.SEARCH p50 decreased 38.4% (baseline CV=21.8%); client: Latency decreased 37.0% (baseline CV=21.0%); neither server nor client side confirms regression
Tests with No Significant Changes (31 tests)

Tests with No Significant Changes

Test Case Baseline master (median obs. +- std.dev) Comparison guyav-fix_command_rewrite (median obs. +- std.dev) % change (higher-better) Note
ftsb-10K-enwiki_abstract-hashes-fulltext-sortby 84 +- 6.3% (7 datapoints) 85.00 0.2% No Change
ftsb-10K-enwiki_abstract-hashes-term-prefix 5824 +- 4.2% (7 datapoints) 5432.00 -6.7% potential REGRESSION
ftsb-10K-enwiki_abstract-hashes-term-suffix 2180 +- 1.6% (7 datapoints) 2185.00 0.2% No Change
ftsb-10K-enwiki_abstract-hashes-term-suffix-withsuffixtrie 16363 +- 1.6% (7 datapoints) 16429.00 0.4% No Change
ftsb-10K-enwiki_abstract-hashes-term-wildcard 8520 +- 4.4% (7 datapoints) 8666.00 1.7% No Change
ftsb-10K-enwiki_pages-hashes-fulltext-mixed_simple-1word-query_write_1_to_read_20.yml 1055 +- 8.4% (7 datapoints) 1121.00 6.2% waterline=8.4%. potential IMPROVEMENT
ftsb-10K-enwiki_pages-hashes-load 64823 +- 7.2% (7 datapoints) 60418.00 -6.8% potential REGRESSION
ftsb-10K-multivalue-numeric-json 1028 +- 1.5% (7 datapoints) 1030.00 0.2% No Change
ftsb-10K-singlevalue-numeric-json 494 +- 1.1% (7 datapoints) 494.00 -0.1% No Change
ftsb-1K-enwiki_abstract-hashes-term-contains 1884 +- 1.8% (7 datapoints) 1907.00 1.2% No Change
ftsb-1M-nyc_taxis-ftadd-load 28739 +- 3.2% (7 datapoints) 27772.00 -3.4% potential REGRESSION
ftsb-1M-nyc_taxis-hashes-load 30426 +- 4.2% (7 datapoints) 31914.00 4.9% potential IMPROVEMENT
search-aggregate-post-filter-simple.yml 17591 +- 1.2% (7 datapoints) 17717.00 0.7% No Change
search-filtering-tag-numeric 262 +- 6.2% (7 datapoints) 260.00 -0.7% No Change
search-filtering-tag-numeric-filter-pipeline 11253 +- 0.6% (7 datapoints) 11586.00 3.0% No Change
search-ftsb-10K-enwiki_abstract-hashes-fulltext-aggregate-sortby-limit-0-100 894 +- 2.0% (7 datapoints) 913.00 2.2% No Change
search-ftsb-10K-enwiki_abstract-hashes-fulltext-search-sortby-limit-0-100 885 +- 2.4% (7 datapoints) 905.00 2.3% No Change
search-ftsb-10K-enwiki_abstract-hashes-term-withoutsuffix-trie 14238 +- 1.3% (7 datapoints) 13963.00 -1.9% No Change
search-ftsb-10K-enwiki_abstract-hashes-term-withsuffix-trie 14472 +- 1.3% (7 datapoints) 14023.00 -3.1% potential REGRESSION
search-ftsb-1700K-docs-union-iterators-q3 8.2 +- 1.4% (7 datapoints) 8.30 1.3% No Change
search-ftsb-1M-enwiki_abstract-hashes-fulltext-simple-1word-query-non-sortable 171 +- 8.5% (7 datapoints) 171.00 -0.4% waterline=8.5%. No Change
search-ftsb-370K-docs-union-iterators-q4 8.5 +- 0.7% (7 datapoints) 8.40 -1.5% No Change
search-ftsb-5200K-docs-union-iterators-q1 0.87 +- 1.1% (7 datapoints) 0.85 -2.3% No Change
search-ftsb-5500K-docs-union-iterators-q2 1.2 +- 2.0% (7 datapoints) 1.20 1.7% No Change
search-geo 217 +- 3.5% (7 datapoints) 221.00 1.7% No Change
search-high-cardinality-negation-term-comparison_union_all_other_terms 13 +- 2.2% (7 datapoints) 13.00 0.6% No Change
search-numeric-optimize 7968 +- 1.2% (7 datapoints) 8114.00 1.8% No Change
search-numeric-sortby-desc-optimize 29 +- 6.9% (7 datapoints) 29.00 -1.0% No Change
search-numeric-sortby-optimize 27 +- 5.6% (7 datapoints) 27.00 2.6% No Change
vecsim-arxiv-titles-384-angular-filters-m16-ef-128-fulltext-filter 574 +- 3.4% (7 datapoints) 570.00 -0.8% No Change
vecsim-arxiv-titles-384-angular-filters-m16-ef-128-tag-filter 16061 +- 1.4% (7 datapoints) 15591.00 -2.9% No Change

@GuyAv46 GuyAv46 added this pull request to the merge queue Nov 20, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 20, 2025
@GuyAv46 GuyAv46 added this pull request to the merge queue Nov 20, 2025
Merged via the queue into master with commit 54f050c Nov 20, 2025
25 checks passed
@GuyAv46 GuyAv46 deleted the guyav-fix_command_rewrite branch November 20, 2025 21:11
redisearch-backport-pull-request bot pushed a commit that referenced this pull request Nov 20, 2025
…MOD-12493] (#7399)

* refresh cache on every modification

* add a failing test

* fix bug

* add assert to cover a future edge case

* remove redundant cleanup

* use helper

(cherry picked from commit 54f050c)
@redisearch-backport-pull-request
Copy link
Contributor

github-merge-queue bot pushed a commit that referenced this pull request Nov 21, 2025
…de - [MOD-12493] (#7455)

Fix internal cursors not being deleted immedietly in cluster mode - [MOD-12493] (#7399)

* refresh cache on every modification

* add a failing test

* fix bug

* add assert to cover a future edge case

* remove redundant cleanup

* use helper

(cherry picked from commit 54f050c)

Co-authored-by: GuyAv46 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants