Skip to content

[MOD-12069] [MOD-12695] Add active_io_threads metric#7530

Merged
meiravgri merged 8 commits intomasterfrom
meiravg_uv_active_thread_counter
Nov 27, 2025
Merged

[MOD-12069] [MOD-12695] Add active_io_threads metric#7530
meiravgri merged 8 commits intomasterfrom
meiravg_uv_active_thread_counter

Conversation

@meiravgri
Copy link
Collaborator

@meiravgri meiravgri commented Nov 26, 2025

This PR adds a new metric to monitor the number of I/O thread callbacks currently executing in cluster mode, exposed via INFO MODULES under a new multi_threading section.

INFO MODULES Output

New section added:

# search_multi_threading
search_active_io_threads:0

Testing Approach

This metric is tested in C++ (test_cpp_io_runtime_ctx.cpp) rather than pytest because I/O thread callbacks run asynchronously on separate UV event loop threads. Existing debug mechanisms (PAUSE_BEFORE_RP_N, DEBUG WORKERS PAUSE) only control worker threads, not I/O threads, making it impossible to deterministically pause/control I/O thread execution from pytest.

Additional Changes

  • tests/cpptests/common.h: Added RS::WaitForCondition() template utility for timeout-based condition polling (30s default timeout)

Note

Introduce active_io_threads metric tracking active I/O callbacks, exposed under a new multi_threading INFO section, with unit and integration tests.

  • Metrics/Stats:
    • Add MultiThreadingStats with active_io_threads to TotalGlobalStats and APIs: GlobalStats_UpdateActiveIoThreads, GlobalStats_GetMultiThreadingStats.
  • I/O Runtime:
    • Track active I/O callbacks by incrementing/decrementing active_io_threads around rqAsyncCb callback execution in io_runtime_ctx.c.
  • INFO MODULES:
    • New multi_threading section exposing active_io_threads.
  • Tests:
    • C++: ActiveIoThreadsMetric validates counter increments/decrements; add RS::WaitForCondition helper.
    • Pytests: verify multi_threading section exists and active_io_threads initializes to 0.

Written by Cursor Bugbot for commit 6d6656e. This will update automatically on new commits. Configure here.

@meiravgri meiravgri changed the title ]MOD-12695] Add active_io_threads metric to track I/O thread callback execution ]MOD-12695] Add active_io_threads metric Nov 26, 2025
@meiravgri meiravgri changed the title ]MOD-12695] Add active_io_threads metric [MOD-12069] [MOD-12695] Add active_io_threads metric Nov 26, 2025
@meiravgri meiravgri requested a review from JoanFM November 26, 2025 13:15

// Update the number of active io threads.
void GlobalStats_UpdateActiveIoThreads(int toAdd) {
INCR_BY(RSGlobalStats.totalStats.multi_threading.active_io_threads, toAdd);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add an assert that if toAdd is negative, active_io_threads should be positive?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possible

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Topology callback not tracked by active_io_threads metric

The topology callback execution at line 107 in topologyAsyncCB is not wrapped with GlobalStats_UpdateActiveIoThreads calls, unlike the regular callbacks in rqAsyncCb at lines 53-55. This causes the active_io_threads metric to undercount active I/O thread callbacks when topology updates are being processed, leading to inaccurate monitoring data.

src/coord/rmr/io_runtime_ctx.c#L106-L107

io_runtime_ctx->uv_runtime.loop_th_ready = false;
task->cb(task->privdata);

Fix in Cursor Fix in Web


@meiravgri meiravgri requested a review from JoanFM November 26, 2025 13:40
RS_LOG_ASSERT_FMT(toAdd > 0 || current > 0,
"Cannot decrease active_io_threads below 0. toAdd: %d, current: %zu", toAdd, current);
#endif
INCR_BY(RSGlobalStats.totalStats.multi_threading.active_io_threads, toAdd);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Race condition in active IO threads assertion

The assertion check reads active_io_threads at line 155 and validates it at line 156-157, but the actual decrement happens at line 159. This creates a Time-of-Check to Time-of-Use race condition when multiple I/O threads execute concurrently. Between the check and the atomic decrement, another thread could decrement the counter, causing it to underflow since size_t is unsigned. The check becomes ineffective at preventing underflow in multi-threaded scenarios.

Fix in Cursor Fix in Web

void GlobalStats_UpdateActiveIoThreads(int toAdd) {
#ifdef ENABLE_ASSERT
RS_LOG_ASSERT(toAdd != 0, "Attempt to change active_io_threads by 0");
size_t current = READ(RSGlobalStats.totalStats.multi_threading.active_io_threads);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isnt' RS_LOG_ASSERT directly checking if ASSERT is enabled and not happening in prod?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but i want to avoid the READ overhead as well

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and make sure i print the same value i read in the condition

@codecov
Copy link

codecov bot commented Nov 26, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.99%. Comparing base (874ce68) to head (6d6656e).
⚠️ Report is 8 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7530      +/-   ##
==========================================
- Coverage   85.00%   84.99%   -0.01%     
==========================================
  Files         349      349              
  Lines       53877    53892      +15     
  Branches    14384    14384              
==========================================
+ Hits        45797    45807      +10     
- Misses       7884     7890       +6     
+ Partials      196      195       -1     
Flag Coverage Δ
flow 85.15% <100.00%> (-0.04%) ⬇️
unit 52.39% <66.66%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

JoanFM
JoanFM previously approved these changes Nov 26, 2025
Copy link

@jit-ci jit-ci bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ The following Jit checks failed to run:

  • secret-detection-trufflehog

#jit_bypass_commit in this PR to bypass, Jit Admin privileges required.

More info in the Jit platform.

Copy link

@jit-ci jit-ci bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ The following Jit checks failed to run:

  • secret-detection-trufflehog

#jit_bypass_commit in this PR to bypass, Jit Admin privileges required.

More info in the Jit platform.

Copy link

@jit-ci jit-ci bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ The following Jit checks failed to run:

  • secret-detection-trufflehog

#jit_bypass_commit in this PR to bypass, Jit Admin privileges required.

More info in the Jit platform.

@meiravgri meiravgri added this pull request to the merge queue Nov 26, 2025
@meiravgri meiravgri removed this pull request from the merge queue due to a manual request Nov 26, 2025
Added unistd.h header for POSIX compliance.
@meiravgri meiravgri added this pull request to the merge queue Nov 27, 2025
Merged via the queue into master with commit 8bbdafa Nov 27, 2025
98 of 99 checks passed
@meiravgri meiravgri deleted the meiravg_uv_active_thread_counter branch November 27, 2025 10:10
@redisearch-backport-pull-request
Copy link
Contributor

Backport failed for 2.8, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 2.8
git worktree add -d .worktree/backport-7530-to-2.8 origin/2.8
cd .worktree/backport-7530-to-2.8
git switch --create backport-7530-to-2.8
git cherry-pick -x 8bbdafa81c5e4562a357e798d9258289e6ecefa2

@redisearch-backport-pull-request
Copy link
Contributor

Backport failed for 2.10, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 2.10
git worktree add -d .worktree/backport-7530-to-2.10 origin/2.10
cd .worktree/backport-7530-to-2.10
git switch --create backport-7530-to-2.10
git cherry-pick -x 8bbdafa81c5e4562a357e798d9258289e6ecefa2

@redisearch-backport-pull-request
Copy link
Contributor

Backport failed for 8.2, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 8.2
git worktree add -d .worktree/backport-7530-to-8.2 origin/8.2
cd .worktree/backport-7530-to-8.2
git switch --create backport-7530-to-8.2
git cherry-pick -x 8bbdafa81c5e4562a357e798d9258289e6ecefa2

@redisearch-backport-pull-request
Copy link
Contributor

Backport failed for 8.4, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 8.4
git worktree add -d .worktree/backport-7530-to-8.4 origin/8.4
cd .worktree/backport-7530-to-8.4
git switch --create backport-7530-to-8.4
git cherry-pick -x 8bbdafa81c5e4562a357e798d9258289e6ecefa2

meiravgri added a commit that referenced this pull request Nov 27, 2025
* Add multi-threading statistics tracking for active I/O threads

* fix comment

* add cpp test

* fix spelling

* address comment

* remove unnecessary nre line

* test cleanups

* Include unistd.h in common.h

Added unistd.h header for POSIX compliance.

(cherry picked from commit 8bbdafa)
github-merge-queue bot pushed a commit that referenced this pull request Nov 27, 2025
[MOD-12069] [MOD-12695] Add `active_io_threads` metric (#7530)

* Add multi-threading statistics tracking for active I/O threads

* fix comment

* add cpp test

* fix spelling

* address comment

* remove unnecessary nre line

* test cleanups

* Include unistd.h in common.h

Added unistd.h header for POSIX compliance.

(cherry picked from commit 8bbdafa)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants