Skip to content

[2.10] [MOD-12252] [MOD-11658] reduce number of worker threads asynchronously#7535

Merged
JoanFM merged 4 commits into2.10from
backport-7410-to-2.10
Nov 26, 2025
Merged

[2.10] [MOD-12252] [MOD-11658] reduce number of worker threads asynchronously#7535
JoanFM merged 4 commits into2.10from
backport-7410-to-2.10

Conversation

@JoanFM
Copy link
Collaborator

@JoanFM JoanFM commented Nov 26, 2025

Backport #7410 to 2.10


Note

Make thread reduction asynchronous with a new thpool API, improve thread startup handling, adapt workers to use it, and update tests to cover scaling and under-load scenarios.

  • Threadpool (deps/thpool):
    • Add redisearch_thpool_schedule_config_reduce_threads_job(...) to asynchronously reduce threads (supports TERMINATE_WHEN_EMPTY/ASAP) and set pool state accordingly.
    • Refactor init/startup: thread_do now receives packed args; thread_init signals per-thread start; wait for explicit start flags; add redisearch_thpool_push_chain(...) helper.
    • Improve add-threads path to synchronously await thread starts; replace raw asserts with RS_ASSERT.
    • Remove synchronous removal APIs from public header; keep terminate_threads and expose async reducer API.
  • Workers (src/util/workers.c):
    • Switch resizing to async reducer for shrinking (incl. full drain-to-zero) with notices; validate add-threads result.
    • Add drain logging.
  • Debug/Config:
    • Warn that FT.DEBUG WORKERS DRAIN can deadlock in specific scenarios; minor comment clarification in config.
  • Tests:
    • Update C++ and Python tests to async shrinking semantics; add robust thread-start waits and helpers.
    • New stress tests to avoid deadlock when reducing WORKERS under load; extend workers scaling scenarios.

Written by Cursor Bugbot for commit 12bc85e. This will update automatically on new commits. Configure here.

#7410)

* change number of threads asynchronously

* do some changes to add thread

* improve code

* cancel old config change if happens

* more improvements

* destroy when sure noone needsit

* simplify implementation drastically

* improve async implementation

* revert barrier changes

* test: do not skip on standalone

* test: add new test

* fix tests

* remove extra tests

* remove unneeded code

* delay ensure init after

* fix: remove weird unlock

* test: handle cpp tests

* small improvement on the test

* improve the checking of thread starting

* remove some unneeded functions from thpool

* remove unused code from tests

* small cleanup of code

* update deps/thpool/thpool.c

* add checks to thpool stats in test

* fix multithread test

* fix memory leak and remove unreachable code

* assert some invariants

* fix: respect adding workers is a lazy operation even after start

* small changes

* remove double assertion

* add another assertion

* small change to avoid dangling pointer in stack

* changes according to comments to the code

* change tests of place and simplify some

* small change

* try to improve flakiness and test time using TimeLimit

* make static and reuse code

* Update deps/thpool/thpool.c

Co-authored-by: GuyAv46 <[email protected]>

* add volatile to started array

* remove extra assertion

* add warning to ft debug command code comment

* change assertion method

* remove flakiness from test

* change test query while flush test

* Update deps/thpool/thpool.c

Co-authored-by: GuyAv46 <[email protected]>

* Update deps/thpool/thpool.c

Co-authored-by: GuyAv46 <[email protected]>

---------

Co-authored-by: GuyAv46 <[email protected]>
(cherry picked from commit 874ce68)
@JoanFM JoanFM requested review from GuyAv46 and alonre24 and removed request for GuyAv46 November 26, 2025 14:10
@JoanFM JoanFM enabled auto-merge November 26, 2025 14:51
}
// I do not need to verify init since we are actually putting in priority queue, and I do not want to wait on another barrier
// As per the input, I know i am Initialized, I do not need to verify it (and avoid waiting)
redisearch_thpool_add_n_work_not_verify_init(thpool_p, jobs, n_threads_to_remove, THPOOL_PRIORITY_ADMIN);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Memory leak and state inconsistency on job creation failure

The return value of redisearch_thpool_add_n_work_not_verify_init is not checked. This function returns -1 when memory allocation fails in create_jobs_chain. When this happens, the barrier and signal_job_arg allocated at lines 1076-1078 are leaked since they're normally freed by the worker threads executing the jobs. Additionally, n_threads has already been decremented at line 1073, and if terminate_when_empty is true, the state is set to THPOOL_UNINITIALIZED at line 1091 even though no termination jobs were added. This leaves the thread pool in an inconsistent state where the configured thread count doesn't match reality.

Additional Locations (1)

Fix in Cursor Fix in Web

@JoanFM JoanFM self-assigned this Nov 26, 2025
GuyAv46
GuyAv46 previously approved these changes Nov 26, 2025
Copy link

@jit-ci jit-ci bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ The following Jit checks failed to run:

  • secret-detection-trufflehog

#jit_bypass_commit in this PR to bypass, Jit Admin privileges required.

More info in the Jit platform.

*started = false;
struct thread_do_args *args = rm_malloc(sizeof(struct thread_do_args));
args->thpool_p = thpool_p;
args->started = started;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Missing NULL check after malloc causes crash

The newly allocated args struct from rm_malloc at line 616 is not checked for NULL before being dereferenced at lines 617-618. If memory allocation fails, this causes a NULL pointer dereference crash when accessing args->thpool_p and args->started. The function signature even documents it returns -1 on failure, but the implementation always returns 0 and never handles the allocation failure case.

Fix in Cursor Fix in Web

Copy link

@jit-ci jit-ci bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ The following Jit checks failed to run:

  • secret-detection-trufflehog

#jit_bypass_commit in this PR to bypass, Jit Admin privileges required.

More info in the Jit platform.

@JoanFM JoanFM added this pull request to the merge queue Nov 26, 2025
@codecov
Copy link

codecov bot commented Nov 26, 2025

Codecov Report

❌ Patch coverage is 97.08738% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.27%. Comparing base (5b47f12) to head (12bc85e).
⚠️ Report is 2 commits behind head on 2.10.

Files with missing lines Patch % Lines
deps/thpool/thpool.c 96.80% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             2.10    #7535      +/-   ##
==========================================
- Coverage   89.34%   89.27%   -0.08%     
==========================================
  Files         207      207              
  Lines       35447    35504      +57     
==========================================
+ Hits        31670    31696      +26     
- Misses       3777     3808      +31     
Flag Coverage Δ
flow 83.84% <95.14%> (-0.19%) ⬇️
unit 42.38% <85.43%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Merged via the queue into 2.10 with commit 6f41f33 Nov 26, 2025
17 checks passed
@JoanFM JoanFM deleted the backport-7410-to-2.10 branch November 26, 2025 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants