[MOD-12069] [MOD-12695] Add active_io_threads metric#7530
Conversation
active_io_threads metric
|
|
||
| // Update the number of active io threads. | ||
| void GlobalStats_UpdateActiveIoThreads(int toAdd) { | ||
| INCR_BY(RSGlobalStats.totalStats.multi_threading.active_io_threads, toAdd); |
There was a problem hiding this comment.
should we add an assert that if toAdd is negative, active_io_threads should be positive?
There was a problem hiding this comment.
Bug: Topology callback not tracked by active_io_threads metric
The topology callback execution at line 107 in topologyAsyncCB is not wrapped with GlobalStats_UpdateActiveIoThreads calls, unlike the regular callbacks in rqAsyncCb at lines 53-55. This causes the active_io_threads metric to undercount active I/O thread callbacks when topology updates are being processed, leading to inaccurate monitoring data.
src/coord/rmr/io_runtime_ctx.c#L106-L107
RediSearch/src/coord/rmr/io_runtime_ctx.c
Lines 106 to 107 in ebe7871
| RS_LOG_ASSERT_FMT(toAdd > 0 || current > 0, | ||
| "Cannot decrease active_io_threads below 0. toAdd: %d, current: %zu", toAdd, current); | ||
| #endif | ||
| INCR_BY(RSGlobalStats.totalStats.multi_threading.active_io_threads, toAdd); |
There was a problem hiding this comment.
Bug: Race condition in active IO threads assertion
The assertion check reads active_io_threads at line 155 and validates it at line 156-157, but the actual decrement happens at line 159. This creates a Time-of-Check to Time-of-Use race condition when multiple I/O threads execute concurrently. Between the check and the atomic decrement, another thread could decrement the counter, causing it to underflow since size_t is unsigned. The check becomes ineffective at preventing underflow in multi-threaded scenarios.
| void GlobalStats_UpdateActiveIoThreads(int toAdd) { | ||
| #ifdef ENABLE_ASSERT | ||
| RS_LOG_ASSERT(toAdd != 0, "Attempt to change active_io_threads by 0"); | ||
| size_t current = READ(RSGlobalStats.totalStats.multi_threading.active_io_threads); |
There was a problem hiding this comment.
isnt' RS_LOG_ASSERT directly checking if ASSERT is enabled and not happening in prod?
There was a problem hiding this comment.
yes but i want to avoid the READ overhead as well
There was a problem hiding this comment.
and make sure i print the same value i read in the condition
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #7530 +/- ##
==========================================
- Coverage 85.00% 84.99% -0.01%
==========================================
Files 349 349
Lines 53877 53892 +15
Branches 14384 14384
==========================================
+ Hits 45797 45807 +10
- Misses 7884 7890 +6
+ Partials 196 195 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
❌ The following Jit checks failed to run:
- secret-detection-trufflehog
#jit_bypass_commit in this PR to bypass, Jit Admin privileges required.
More info in the Jit platform.
There was a problem hiding this comment.
❌ The following Jit checks failed to run:
- secret-detection-trufflehog
#jit_bypass_commit in this PR to bypass, Jit Admin privileges required.
More info in the Jit platform.
There was a problem hiding this comment.
❌ The following Jit checks failed to run:
- secret-detection-trufflehog
#jit_bypass_commit in this PR to bypass, Jit Admin privileges required.
More info in the Jit platform.
Added unistd.h header for POSIX compliance.
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin 2.8
git worktree add -d .worktree/backport-7530-to-2.8 origin/2.8
cd .worktree/backport-7530-to-2.8
git switch --create backport-7530-to-2.8
git cherry-pick -x 8bbdafa81c5e4562a357e798d9258289e6ecefa2 |
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin 2.10
git worktree add -d .worktree/backport-7530-to-2.10 origin/2.10
cd .worktree/backport-7530-to-2.10
git switch --create backport-7530-to-2.10
git cherry-pick -x 8bbdafa81c5e4562a357e798d9258289e6ecefa2 |
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin 8.2
git worktree add -d .worktree/backport-7530-to-8.2 origin/8.2
cd .worktree/backport-7530-to-8.2
git switch --create backport-7530-to-8.2
git cherry-pick -x 8bbdafa81c5e4562a357e798d9258289e6ecefa2 |
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin 8.4
git worktree add -d .worktree/backport-7530-to-8.4 origin/8.4
cd .worktree/backport-7530-to-8.4
git switch --create backport-7530-to-8.4
git cherry-pick -x 8bbdafa81c5e4562a357e798d9258289e6ecefa2 |
* Add multi-threading statistics tracking for active I/O threads * fix comment * add cpp test * fix spelling * address comment * remove unnecessary nre line * test cleanups * Include unistd.h in common.h Added unistd.h header for POSIX compliance. (cherry picked from commit 8bbdafa)
[MOD-12069] [MOD-12695] Add `active_io_threads` metric (#7530) * Add multi-threading statistics tracking for active I/O threads * fix comment * add cpp test * fix spelling * address comment * remove unnecessary nre line * test cleanups * Include unistd.h in common.h Added unistd.h header for POSIX compliance. (cherry picked from commit 8bbdafa)
This PR adds a new metric to monitor the number of I/O thread callbacks currently executing in cluster mode, exposed via
INFO MODULESunder a newmulti_threadingsection.INFO MODULES Output
New section added:
Testing Approach
This metric is tested in C++ (
test_cpp_io_runtime_ctx.cpp) rather than pytest because I/O thread callbacks run asynchronously on separate UV event loop threads. Existing debug mechanisms (PAUSE_BEFORE_RP_N,DEBUG WORKERS PAUSE) only control worker threads, not I/O threads, making it impossible to deterministically pause/control I/O thread execution from pytest.Additional Changes
tests/cpptests/common.h: AddedRS::WaitForCondition()template utility for timeout-based condition polling (30s default timeout)Note
Introduce
active_io_threadsmetric tracking active I/O callbacks, exposed under a newmulti_threadingINFO section, with unit and integration tests.MultiThreadingStatswithactive_io_threadstoTotalGlobalStatsand APIs:GlobalStats_UpdateActiveIoThreads,GlobalStats_GetMultiThreadingStats.active_io_threadsaroundrqAsyncCbcallback execution inio_runtime_ctx.c.multi_threadingsection exposingactive_io_threads.ActiveIoThreadsMetricvalidates counter increments/decrements; addRS::WaitForConditionhelper.multi_threadingsection exists andactive_io_threadsinitializes to0.Written by Cursor Bugbot for commit 6d6656e. This will update automatically on new commits. Configure here.