Skip to content

[MOD-9023] Replacing freeFlag with a reference counter#5736

Merged
meiravgri merged 7 commits intomasterfrom
meiravg_add_prints_to_test_mod6287
Mar 13, 2025
Merged

[MOD-9023] Replacing freeFlag with a reference counter#5736
meiravgri merged 7 commits intomasterfrom
meiravg_add_prints_to_test_mod6287

Conversation

@meiravgri
Copy link
Collaborator

@meiravgri meiravgri commented Mar 9, 2025

This PR fixes a race condition on freeFlag in the coordinator mechanism.
The issue occurred because freeFlag was expected to be reset by the uv thread when the coordinator reached MRIterator_Release but since the flag is reset from the a BG thread, this was not always the case.
This led to an incorrect sequence of operations, ultimately causing an assertion failure.
To resolve this, freeFlag has been replaced with a reference counter, which:

  • Increases before pushing a new job to the RQ.
  • Decreases when the coordinator no longer requires more replies.
  • Decreases again when all shards have finished pushing replies to the channel.

Logging improvements:

  • Added RS_DEBUG_LOG_FMT and RS_DEBUG_LOG macros for debug logging in deps/rmutil/rm_assert.h.
  • when ENABLE_ASSERT=1 it will call RedisModule_Log with "debug" level

@meiravgri meiravgri requested review from GuyAv46 and alonre24 March 10, 2025 16:22
@codecov
Copy link

codecov bot commented Mar 10, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.11%. Comparing base (d45701e) to head (48e3fa4).
Report is 15 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5736      +/-   ##
==========================================
- Coverage   88.13%   88.11%   -0.03%     
==========================================
  Files         198      200       +2     
  Lines       35999    36050      +51     
==========================================
+ Hits        31727    31764      +37     
- Misses       4272     4286      +14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

if (timedout && !cmd->forCursor) {
newCmd = MR_NewCommand(4, "_FT.CURSOR", "DEL", idx, buf);
newCmd.depleted = true;
// newCmd.depleted = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why commenting out?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong. Here the cmd is essentially not depleted as we are going to send the shard another request...

Comment on lines 515 to 516
RS_LOG_ASSERT_FMT(pending >= 0, "Pending (%d) should not reach a negative value. inproc: %d, freeflag: %d, channel size: %d, shard_slot: %d",
pending, ctx->it->ctx.inProcess, ctx->it->ctx.freeFlag, MRChannel_Size(ctx->it->ctx.chan), ctx->cmd.targetSlot);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we always print it in debug mode rather than only when the assertion fails? perhaps it can bring more information when time comes.

…e one side is changing it twice.

The refcount is increased when a new job is pushed to the queue, and decreased when the coordinator doesn't need any replies, or when all readers have done executing the job.

introduce logging capabilities when ENABLE_ASSERT=1 :
RS_DEBUG_LOG and RS_DEBUG_LOG_FMT
to log in debug level
@meiravgri meiravgri changed the title add prints [MOD-9023] Replacing freeFlag with a reference counter Mar 12, 2025
@meiravgri meiravgri requested review from alonre24 and Copilot March 12, 2025 16:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

@meiravgri meiravgri requested a review from GuyAv46 March 12, 2025 17:04
@alonre24 alonre24 requested a review from raz-mon March 12, 2025 17:30
Copy link
Collaborator

@GuyAv46 GuyAv46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What a catch and what a solution 🎣 🥇

Comment on lines +22 to +23
#define RS_DEBUG_LOG_FMT(fmt, ...) // NOP
#define RS_DEBUG_LOG(str) // NOP
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have the debug logs depend on ENABLE_ASSERT?
Since they are in debug level I think we should always have them

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Their purpose is to be used in a testing env, not in production (even if it's only enabled with debug loglevel)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, why not have it in production?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't know the impact on performance of calling RedisModule_log (even if it's not the configured log level, it still executes some code), so i prefer not to compile it at all.
We can open a ticket to measure that

@redisearch-backport-pull-request
Copy link
Contributor

Only merged pull requests can be backported.

@meiravgri meiravgri added this pull request to the merge queue Mar 13, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 13, 2025
@meiravgri meiravgri added this pull request to the merge queue Mar 13, 2025
Merged via the queue into master with commit 3420e93 Mar 13, 2025
11 checks passed
@meiravgri meiravgri deleted the meiravg_add_prints_to_test_mod6287 branch March 13, 2025 15:05
@meiravgri
Copy link
Collaborator Author

/backport

@redisearch-backport-pull-request
Copy link
Contributor

Backport failed for 8.0, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 8.0
git worktree add -d .worktree/backport-5736-to-8.0 origin/8.0
cd .worktree/backport-5736-to-8.0
git switch --create backport-5736-to-8.0
git cherry-pick -x 3420e93a87a6fbd4b8880075b557cf8e5d4a4ab0

meiravgri added a commit that referenced this pull request Mar 13, 2025
* add prints

* fix

* no depletion in getCursorCommand

* add info to assert

* free flag is now a ref counter to avoid potential race condition where one side is changing it twice.

The refcount is increased when a new job is pushed to the queue, and decreased when the coordinator doesn't need any replies, or when all readers have done executing the job.

introduce logging capabilities when ENABLE_ASSERT=1 :
RS_DEBUG_LOG and RS_DEBUG_LOG_FMT
to log in debug level

* cleanups

* cr

(cherry picked from commit 3420e93)
github-merge-queue bot pushed a commit that referenced this pull request Mar 13, 2025
[MOD-9023] Replacing freeFlag with a reference counter (#5736)

* add prints

* fix

* no depletion in getCursorCommand

* add info to assert

* free flag is now a ref counter to avoid potential race condition where one side is changing it twice.

The refcount is increased when a new job is pushed to the queue, and decreased when the coordinator doesn't need any replies, or when all readers have done executing the job.

introduce logging capabilities when ENABLE_ASSERT=1 :
RS_DEBUG_LOG and RS_DEBUG_LOG_FMT
to log in debug level

* cleanups

* cr

(cherry picked from commit 3420e93)
@meiravgri
Copy link
Collaborator Author

/backport

@redisearch-backport-pull-request
Copy link
Contributor

Backport failed for 2.10, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 2.10
git worktree add -d .worktree/backport-5736-to-2.10 origin/2.10
cd .worktree/backport-5736-to-2.10
git switch --create backport-5736-to-2.10
git cherry-pick -x 3420e93a87a6fbd4b8880075b557cf8e5d4a4ab0

@redisearch-backport-pull-request
Copy link
Contributor

Backport failed for 8.0, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 8.0
git worktree add -d .worktree/backport-5736-to-8.0 origin/8.0
cd .worktree/backport-5736-to-8.0
git switch --create backport-5736-to-8.0
git cherry-pick -x 3420e93a87a6fbd4b8880075b557cf8e5d4a4ab0

@meiravgri
Copy link
Collaborator Author

Backport failed for 8.0, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 8.0
git worktree add -d .worktree/backport-5736-to-8.0 origin/8.0
cd .worktree/backport-5736-to-8.0
git switch --create backport-5736-to-8.0
git cherry-pick -x 3420e93a87a6fbd4b8880075b557cf8e5d4a4ab0

Already backported

meiravgri added a commit that referenced this pull request Mar 18, 2025
* add prints

* fix

* no depletion in getCursorCommand

* add info to assert

* free flag is now a ref counter to avoid potential race condition where one side is changing it twice.

The refcount is increased when a new job is pushed to the queue, and decreased when the coordinator doesn't need any replies, or when all readers have done executing the job.

introduce logging capabilities when ENABLE_ASSERT=1 :
RS_DEBUG_LOG and RS_DEBUG_LOG_FMT
to log in debug level

* cleanups

* cr

(cherry picked from commit 3420e93)
github-merge-queue bot pushed a commit that referenced this pull request Mar 18, 2025
[MOD-9023] Replacing freeFlag with a reference counter (#5736)

* add prints

* fix

* no depletion in getCursorCommand

* add info to assert

* free flag is now a ref counter to avoid potential race condition where one side is changing it twice.

The refcount is increased when a new job is pushed to the queue, and decreased when the coordinator doesn't need any replies, or when all readers have done executing the job.

introduce logging capabilities when ENABLE_ASSERT=1 :
RS_DEBUG_LOG and RS_DEBUG_LOG_FMT
to log in debug level

* cleanups

* cr

(cherry picked from commit 3420e93)
meiravgri added a commit that referenced this pull request Mar 20, 2025
[MOD-9023] Replacing freeFlag with a reference counter (#5736)

* add prints

* fix

* no depletion in getCursorCommand

* add info to assert

* free flag is now a ref counter to avoid potential race condition where one side is changing it twice.

The refcount is increased when a new job is pushed to the queue, and decreased when the coordinator doesn't need any replies, or when all readers have done executing the job.

introduce logging capabilities when ENABLE_ASSERT=1 :
RS_DEBUG_LOG and RS_DEBUG_LOG_FMT
to log in debug level

* cleanups

* cr

(cherry picked from commit 3420e93)
(cherry picked from commit 607a118)
meiravgri added a commit that referenced this pull request Mar 20, 2025
[MOD-9023] Replacing freeFlag with a reference counter (#5736)

* add prints

* fix

* no depletion in getCursorCommand

* add info to assert

* free flag is now a ref counter to avoid potential race condition where one side is changing it twice.

The refcount is increased when a new job is pushed to the queue, and decreased when the coordinator doesn't need any replies, or when all readers have done executing the job.

introduce logging capabilities when ENABLE_ASSERT=1 :
RS_DEBUG_LOG and RS_DEBUG_LOG_FMT
to log in debug level

* cleanups

* cr

(cherry picked from commit 3420e93)
(cherry picked from commit 607a118)
github-merge-queue bot pushed a commit that referenced this pull request Mar 20, 2025
[2.10] [MOD-9023] Replacing freeFlag with a reference counter (#5784)

[MOD-9023] Replacing freeFlag with a reference counter (#5736)

* add prints

* fix

* no depletion in getCursorCommand

* add info to assert

* free flag is now a ref counter to avoid potential race condition where one side is changing it twice.

The refcount is increased when a new job is pushed to the queue, and decreased when the coordinator doesn't need any replies, or when all readers have done executing the job.

introduce logging capabilities when ENABLE_ASSERT=1 :
RS_DEBUG_LOG and RS_DEBUG_LOG_FMT
to log in debug level

* cleanups

* cr

(cherry picked from commit 3420e93)
(cherry picked from commit 607a118)
meiravgri added a commit that referenced this pull request Apr 26, 2025
[2.10] [MOD-9023] Replacing freeFlag with a reference counter (#5784)

[MOD-9023] Replacing freeFlag with a reference counter (#5736)

* add prints

* fix

* no depletion in getCursorCommand

* add info to assert

* free flag is now a ref counter to avoid potential race condition where one side is changing it twice.

The refcount is increased when a new job is pushed to the queue, and decreased when the coordinator doesn't need any replies, or when all readers have done executing the job.

introduce logging capabilities when ENABLE_ASSERT=1 :
RS_DEBUG_LOG and RS_DEBUG_LOG_FMT
to log in debug level

* cleanups

* cr

(cherry picked from commit 3420e93)
(cherry picked from commit 607a118)
(cherry picked from commit d01e71d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants