Skip to content

Fix MRIterator ownership mechanism - [MOD-8108]#5244

Merged
GuyAv46 merged 10 commits intomasterfrom
guyav-fix_cursor_deadlock
Nov 27, 2024
Merged

Fix MRIterator ownership mechanism - [MOD-8108]#5244
GuyAv46 merged 10 commits intomasterfrom
guyav-fix_cursor_deadlock

Conversation

@GuyAv46
Copy link
Collaborator

@GuyAv46 GuyAv46 commented Nov 22, 2024

Implement a referencing mechanism between the reader and writers of an MRIterator, removing the need to wait for replies to arrive before releasing the iterator.

This PR fixes a potential deadlock when using cursors in cluster mode, and sometimes speeds up the freeing flow of an FT.AGGREGATE command (no need to sync with the shards).

Main objects this PR modified

  1. MRIterator now has a simple referencing mechanism (using a boolean flag). When both the reader and the writers give up the iterator, the last one to do so frees it. In the case of pending cursors, FT.CURSOR DEL commands will be sent and the iterator will free itself when all the replies return, without blocking the reader.
  2. Simplify MRChannel, so there is no waiting for it to be closed anymore. It is also possible to push new items to it after it is closed, so MRChannel_Push cannot fail. The channel and its pending items will be freed when the iterator is freed.
  3. Delete CursorList_Expire. The mechanism is no longer needed and we can now free the coordinator's cursors without worrying about deadlocks.

#5268 is a follow-up of this PR.

Mark if applicable

  • This PR introduces API changes
  • This PR introduces serialization changes

@codecov
Copy link

codecov bot commented Nov 22, 2024

Codecov Report

Attention: Patch coverage is 91.48936% with 4 lines in your changes missing coverage. Please review.

Project coverage is 86.71%. Comparing base (17c5e6e) to head (0d2e15b).
Report is 8 commits behind head on master.

Files with missing lines Patch % Lines
src/coord/rmr/rq.c 33.33% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5244      +/-   ##
==========================================
+ Coverage   86.60%   86.71%   +0.11%     
==========================================
  Files         193      193              
  Lines       34772    34671     -101     
==========================================
- Hits        30113    30065      -48     
+ Misses       4659     4606      -53     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@GuyAv46 GuyAv46 marked this pull request as ready for review November 22, 2024 10:44
@GuyAv46 GuyAv46 marked this pull request as draft November 22, 2024 10:44
@GuyAv46 GuyAv46 changed the title Fix MRIterator ownership mechanism - [MOD-] Fix MRIterator ownership mechanism - [MOD-8108] Nov 22, 2024
@GuyAv46 GuyAv46 marked this pull request as ready for review November 22, 2024 11:34
@fcostaoliveira
Copy link
Contributor

fcostaoliveira commented Nov 23, 2024

Automated performance analysis summary

This comment was automatically generated given there is performance data available.

In summary:

  • Detected a total of 29 stable tests between versions.
  • Detected a total of 5 highly unstable benchmarks.
  • Detected a total of 1 improvements above the improvement water line.
  • Detected a total of 10 regressions bellow the regression water line 5.0.

You can check a comparison in detail via the grafana link

Comparison between master and guyav-fix_cursor_deadlock.

Time Period from 30 days ago. (environment used: oss-standalone)

Test Case Baseline master (median obs. +- std.dev) Comparison guyav-fix_cursor_deadlock (median obs. +- std.dev) % change (higher-better) Note
ftsb-10K-enwiki_abstract-hashes-fulltext-sortby 70 +- 1.7% (7 datapoints) 69.00 -1.3% No Change
ftsb-10K-enwiki_abstract-hashes-term-prefix 7586 +- 1.1% (7 datapoints) 7013.00 -7.6% REGRESSION
ftsb-10K-enwiki_abstract-hashes-term-suffix 1945 +- 2.7% (7 datapoints) 2046.00 5.2% IMPROVEMENT
ftsb-10K-enwiki_abstract-hashes-term-suffix-withsuffixtrie 64108 +- 3.1% (7 datapoints) 54336.00 -15.2% REGRESSION
ftsb-10K-enwiki_abstract-hashes-term-wildcard 12627 +- 1.3% (7 datapoints) 12010.00 -4.9% potential REGRESSION
ftsb-10K-enwiki_pages-hashes-fulltext-mixed_simple-1word-query_write_1_to_read_20.yml 1294 +- 1.2% (7 datapoints) 1231.00 -4.9% potential REGRESSION
ftsb-10K-enwiki_pages-hashes-load 45669 +- 3.6% (7 datapoints) 44141.00 -3.3% potential REGRESSION
ftsb-10K-multivalue-numeric-json 821 +- 1.7% (7 datapoints) 794.00 -3.2% potential REGRESSION
ftsb-10K-singlevalue-numeric-json 330 +- 1.7% (7 datapoints) 328.00 -0.5% No Change
ftsb-1K-enwiki_abstract-hashes-term-contains 1686 +- 3.6% (7 datapoints) 1727.00 2.4% No Change
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-intersection-query 385 +- 3.6% (7 datapoints) 307.00 -20.3% REGRESSION
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-intersection-query-non-sortable 42 +- 8.9% (7 datapoints) 42.00 -1.3% waterline=8.9%. No Change
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-intersection-query@100_ops_sec 100 +- 0.0% (7 datapoints) 100.00 0.0% No Change
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-union-query 2752 +- 5.8% (7 datapoints) 2577.00 -6.4% waterline=5.8%. REGRESSION
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-union-query-non-sortable 1137 +- 6.0% (7 datapoints) 1044.00 -8.2% waterline=6.0%. REGRESSION
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-union-query@100_ops_sec 100 +- 0.0% (7 datapoints) 100.00 0.0% No Change
ftsb-1M-enwiki_abstract-hashes-fulltext-simple-1word-query 1072 +- 8.3% (7 datapoints) 976.00 -8.9% waterline=8.3%. REGRESSION
ftsb-1M-enwiki_abstract-hashes-fulltext-simple-1word-query@100_ops_sec 100 +- 0.0% (7 datapoints) 100.00 -0.0% No Change
ftsb-1M-enwiki_abstract-hashes-load 22944 +- 1.3% (7 datapoints) 20884.00 -9.0% REGRESSION
ftsb-1M-nyc_taxis-ftadd-load 20311 +- 2.8% (7 datapoints) 19608.00 -3.5% potential REGRESSION
ftsb-1M-nyc_taxis-hashes-load 21918 +- 1.5% (7 datapoints) 20649.00 -5.8% REGRESSION
search-aggregate-post-filter-simple.yml 80364 +- 2.9% (7 datapoints) 79220.00 -1.4% No Change
search-expire-doc-10-milliseconds 0.60 +- 2.2% (5 datapoints) 0.60 0.0%
search-expired-numeric-field-10-milliseconds 0.61 +- 2.7% (5 datapoints) 0.59 -3.3% potential REGRESSION
search-filtering-tag-numeric 189 +- 27.0% UNSTABLE (7 datapoints) 221.00 17.1% UNSTABLE (very high variance)
search-filtering-tag-numeric-filter-pipeline 19044 +- 1.8% (7 datapoints) 18575.00 -2.5% No Change
search-ftsb-10K-enwiki_abstract-hashes-term-withoutsuffix-trie 39262 +- 2.9% (7 datapoints) 37717.00 -3.9% potential REGRESSION
search-ftsb-10K-enwiki_abstract-hashes-term-withsuffix-trie 37998 +- 1.8% (7 datapoints) 37976.00 -0.1% No Change
search-ftsb-1700K-docs-union-iterators-q3 6.2 +- 1.6% (7 datapoints) 5.90 -4.5% potential REGRESSION
search-ftsb-1M-enwiki_abstract-hashes-fulltext-2word-intersection-query-non-sortable@50_ops_sec 50 +- 0.2% (7 datapoints) 50.00 -0.5% No Change
search-ftsb-1M-enwiki_abstract-hashes-fulltext-2word-union-query-non-sortable@100_ops_sec 100 +- 0.0% (7 datapoints) 100.00 -0.0% No Change
search-ftsb-1M-enwiki_abstract-hashes-fulltext-simple-1word-query-non-sortable 161 +- 5.6% (7 datapoints) 143.00 -11.1% waterline=5.6%. REGRESSION
search-ftsb-370K-docs-union-iterators-q4 6.5 +- 1.7% (7 datapoints) 6.40 -2.7% No Change
search-geo 184 +- 2.1% (7 datapoints) 183.00 -0.4% No Change
search-high-cardinality-negation-term-baseline 20 +- 2.2% (7 datapoints) 20.00 -2.4% No Change
search-high-cardinality-negation-term-comparison_union_all_other_terms 7.0 +- 4.0% (7 datapoints) 7.00 0.7% No Change
search-numeric 3162 +- 28.9% UNSTABLE (7 datapoints) 2675.00 -15.4% UNSTABLE (very high variance)
search-numeric-optimize 9110 +- 1.9% (7 datapoints) 8643.00 -5.1% REGRESSION
search-numeric-sortby 3196 +- 37.9% UNSTABLE (7 datapoints) 2808.00 -12.1% UNSTABLE (very high variance)
search-numeric-sortby-desc 2688 +- 6.9% (7 datapoints) 2572.00 -4.3% waterline=6.9%. potential REGRESSION
search-numeric-sortby-desc-optimize 31 +- 33.5% UNSTABLE (7 datapoints) 48.00 55.1% UNSTABLE (very high variance)
search-numeric-sortby-optimize 20 +- 3.5% (7 datapoints) 19.00 -4.8% potential REGRESSION
vecsim-arxiv-titles-384-angular-filters-m16-ef-128-fulltext-filter 492 +- 8.0% (7 datapoints) 455.00 -7.7% waterline=8.0%. potential REGRESSION
vecsim-arxiv-titles-384-angular-filters-m16-ef-128-numeric-filter 104 +- 13.5% UNSTABLE (7 datapoints) 105.00 1.2% UNSTABLE (very high variance)
vecsim-arxiv-titles-384-angular-filters-m16-ef-128-tag-filter 52561 +- 4.8% (7 datapoints) 52708.00 0.3% No Change

Copy link
Collaborator

@raz-mon raz-mon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AWESOME 🔥
Some small comments

Copy link
Collaborator

@raz-mon raz-mon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💣
Consider naming the blocking variable block instead of wait, a bit more intuitive.
Great addition!

@GuyAv46 GuyAv46 added this pull request to the merge queue Nov 27, 2024
Merged via the queue into master with commit 6f68d69 Nov 27, 2024
@GuyAv46 GuyAv46 deleted the guyav-fix_cursor_deadlock branch November 27, 2024 14:29
@redisearch-backport-pull-request
Copy link
Contributor

Backport failed for 2.8, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 2.8
git worktree add -d .worktree/backport-5244-to-2.8 origin/2.8
cd .worktree/backport-5244-to-2.8
git switch --create backport-5244-to-2.8
git cherry-pick -x 6f68d69d7688807bad5b89eb53673aaaff36c9c9

@redisearch-backport-pull-request
Copy link
Contributor

Backport failed for 2.6, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 2.6
git worktree add -d .worktree/backport-5244-to-2.6 origin/2.6
cd .worktree/backport-5244-to-2.6
git switch --create backport-5244-to-2.6
git cherry-pick -x 6f68d69d7688807bad5b89eb53673aaaff36c9c9

redisearch-backport-pull-request bot pushed a commit that referenced this pull request Nov 27, 2024
* implement release mechanism for MRIterator

* refactor MRChannel

* delete CursorList_Expire

* fix test for async cursor deletion

* minor cleanup

* more cleanup

* remove a log message that is now more likely to be logged

* log about stuck RQ less frequently

* even less noisy

* rename MRChannel_Close

(cherry picked from commit 6f68d69)
@redisearch-backport-pull-request
Copy link
Contributor

Successfully created backport PR for 2.10:

redisearch-backport-pull-request bot pushed a commit that referenced this pull request Nov 27, 2024
* implement release mechanism for MRIterator

* refactor MRChannel

* delete CursorList_Expire

* fix test for async cursor deletion

* minor cleanup

* more cleanup

* remove a log message that is now more likely to be logged

* log about stuck RQ less frequently

* even less noisy

* rename MRChannel_Close

(cherry picked from commit 6f68d69)
@redisearch-backport-pull-request
Copy link
Contributor

Successfully created backport PR for 8.0:

GuyAv46 added a commit that referenced this pull request Nov 27, 2024
* implement release mechanism for MRIterator

* refactor MRChannel

* delete CursorList_Expire

* fix test for async cursor deletion

* minor cleanup

* more cleanup

* remove a log message that is now more likely to be logged

* log about stuck RQ less frequently

* even less noisy

* rename MRChannel_Close

(cherry picked from commit 6f68d69)
GuyAv46 added a commit that referenced this pull request Nov 27, 2024
* implement release mechanism for MRIterator

* refactor MRChannel

* delete CursorList_Expire

* fix test for async cursor deletion

* minor cleanup

* more cleanup

* remove a log message that is now more likely to be logged

* log about stuck RQ less frequently

* even less noisy

* rename MRChannel_Close

(cherry picked from commit 6f68d69)
github-merge-queue bot pushed a commit that referenced this pull request Nov 27, 2024
* Fix MRIterator ownership mechanism - [MOD-8108] (#5244)

* implement release mechanism for MRIterator

* refactor MRChannel

* delete CursorList_Expire

* fix test for async cursor deletion

* minor cleanup

* more cleanup

* remove a log message that is now more likely to be logged

* log about stuck RQ less frequently

* even less noisy

* rename MRChannel_Close

(cherry picked from commit 6f68d69)

* Fix a potential use after free - [MOD-8108] (#5268)

fix a potential use after free

---------

Co-authored-by: GuyAv46 <[email protected]>
github-merge-queue bot pushed a commit that referenced this pull request Nov 27, 2024
* Fix MRIterator ownership mechanism - [MOD-8108] (#5244)

* implement release mechanism for MRIterator

* refactor MRChannel

* delete CursorList_Expire

* fix test for async cursor deletion

* minor cleanup

* more cleanup

* remove a log message that is now more likely to be logged

* log about stuck RQ less frequently

* even less noisy

* rename MRChannel_Close

(cherry picked from commit 6f68d69)

* Fix a potential use after free - [MOD-8108] (#5268)

fix a potential use after free

---------

Co-authored-by: GuyAv46 <[email protected]>
github-merge-queue bot pushed a commit that referenced this pull request Nov 28, 2024
* Fix MRIterator ownership mechanism - [MOD-8108] (#5244)

* implement release mechanism for MRIterator

* refactor MRChannel

* delete CursorList_Expire

* fix test for async cursor deletion

* minor cleanup

* more cleanup

* remove a log message that is now more likely to be logged

* log about stuck RQ less frequently

* even less noisy

* rename MRChannel_Close

(cherry picked from commit 6f68d69)

* additional fixes for 2.6
github-merge-queue bot pushed a commit that referenced this pull request Nov 28, 2024
* Fix MRIterator ownership mechanism - [MOD-8108] (#5244)

* implement release mechanism for MRIterator

* refactor MRChannel

* delete CursorList_Expire

* fix test for async cursor deletion

* minor cleanup

* more cleanup

* remove a log message that is now more likely to be logged

* log about stuck RQ less frequently

* even less noisy

* rename MRChannel_Close

(cherry picked from commit 6f68d69)

* additional fixes for 2.8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants