[2.10] [MOD-12252] [MOD-11658] reduce number of worker threads asynchronously#7535
[2.10] [MOD-12252] [MOD-11658] reduce number of worker threads asynchronously#7535
Conversation
#7410) * change number of threads asynchronously * do some changes to add thread * improve code * cancel old config change if happens * more improvements * destroy when sure noone needsit * simplify implementation drastically * improve async implementation * revert barrier changes * test: do not skip on standalone * test: add new test * fix tests * remove extra tests * remove unneeded code * delay ensure init after * fix: remove weird unlock * test: handle cpp tests * small improvement on the test * improve the checking of thread starting * remove some unneeded functions from thpool * remove unused code from tests * small cleanup of code * update deps/thpool/thpool.c * add checks to thpool stats in test * fix multithread test * fix memory leak and remove unreachable code * assert some invariants * fix: respect adding workers is a lazy operation even after start * small changes * remove double assertion * add another assertion * small change to avoid dangling pointer in stack * changes according to comments to the code * change tests of place and simplify some * small change * try to improve flakiness and test time using TimeLimit * make static and reuse code * Update deps/thpool/thpool.c Co-authored-by: GuyAv46 <[email protected]> * add volatile to started array * remove extra assertion * add warning to ft debug command code comment * change assertion method * remove flakiness from test * change test query while flush test * Update deps/thpool/thpool.c Co-authored-by: GuyAv46 <[email protected]> * Update deps/thpool/thpool.c Co-authored-by: GuyAv46 <[email protected]> --------- Co-authored-by: GuyAv46 <[email protected]> (cherry picked from commit 874ce68)
| } | ||
| // I do not need to verify init since we are actually putting in priority queue, and I do not want to wait on another barrier | ||
| // As per the input, I know i am Initialized, I do not need to verify it (and avoid waiting) | ||
| redisearch_thpool_add_n_work_not_verify_init(thpool_p, jobs, n_threads_to_remove, THPOOL_PRIORITY_ADMIN); |
There was a problem hiding this comment.
Bug: Memory leak and state inconsistency on job creation failure
The return value of redisearch_thpool_add_n_work_not_verify_init is not checked. This function returns -1 when memory allocation fails in create_jobs_chain. When this happens, the barrier and signal_job_arg allocated at lines 1076-1078 are leaked since they're normally freed by the worker threads executing the jobs. Additionally, n_threads has already been decremented at line 1073, and if terminate_when_empty is true, the state is set to THPOOL_UNINITIALIZED at line 1091 even though no termination jobs were added. This leaves the thread pool in an inconsistent state where the configured thread count doesn't match reality.
Additional Locations (1)
There was a problem hiding this comment.
❌ The following Jit checks failed to run:
- secret-detection-trufflehog
#jit_bypass_commit in this PR to bypass, Jit Admin privileges required.
More info in the Jit platform.
| *started = false; | ||
| struct thread_do_args *args = rm_malloc(sizeof(struct thread_do_args)); | ||
| args->thpool_p = thpool_p; | ||
| args->started = started; |
There was a problem hiding this comment.
Bug: Missing NULL check after malloc causes crash
The newly allocated args struct from rm_malloc at line 616 is not checked for NULL before being dereferenced at lines 617-618. If memory allocation fails, this causes a NULL pointer dereference crash when accessing args->thpool_p and args->started. The function signature even documents it returns -1 on failure, but the implementation always returns 0 and never handles the allocation failure case.
There was a problem hiding this comment.
❌ The following Jit checks failed to run:
- secret-detection-trufflehog
#jit_bypass_commit in this PR to bypass, Jit Admin privileges required.
More info in the Jit platform.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## 2.10 #7535 +/- ##
==========================================
- Coverage 89.34% 89.27% -0.08%
==========================================
Files 207 207
Lines 35447 35504 +57
==========================================
+ Hits 31670 31696 +26
- Misses 3777 3808 +31
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Backport #7410 to 2.10
Note
Make thread reduction asynchronous with a new thpool API, improve thread startup handling, adapt workers to use it, and update tests to cover scaling and under-load scenarios.
deps/thpool):redisearch_thpool_schedule_config_reduce_threads_job(...)to asynchronously reduce threads (supports TERMINATE_WHEN_EMPTY/ASAP) and set pool state accordingly.thread_donow receives packed args;thread_initsignals per-thread start; wait for explicit start flags; addredisearch_thpool_push_chain(...)helper.RS_ASSERT.terminate_threadsand expose async reducer API.src/util/workers.c):FT.DEBUG WORKERS DRAINcan deadlock in specific scenarios; minor comment clarification in config.WORKERSunder load; extend workers scaling scenarios.Written by Cursor Bugbot for commit 12bc85e. This will update automatically on new commits. Configure here.