Skip to content

Dynamic cpu pool#8790

Merged
generall merged 11 commits into
devfrom
dynamic-cpu-pool
Apr 27, 2026
Merged

Dynamic cpu pool#8790
generall merged 11 commits into
devfrom
dynamic-cpu-pool

Conversation

@generall

@generall generall commented Apr 24, 2026

Copy link
Copy Markdown
Member

Motivation

We need to monitor CPU usage of the current qdrant process,
to determine if we want to decrease / increase search thread pool at the current moment.

Thread pool used to have size equal to number of CPUs, but it doesn't work well if we have high IO.
In this case we want more threads (up to 4x of CPU count). But this affects the full in-RAM usa case (because of CPU contention).
So we want to dynamically adjust thread pool size based on the current CPU usage of the process.

Implementation

Step 1: Get CPU usage of the current process

We need a function, which would return CPU usage of qdrant process in last N (configured constant) seconds.
This function should work on linux and other platforms are optional.

Request to this function should be cheap, so we need to create a "TTL cache" functionality,
so it would actually read CPU usage from OS only once per N seconds.

Function semantic:

If 2 CPU cores were used 100% in last N seconds, we should return value 2.0

It is prefered to use procfs to read process CPU usage.

Function should be available globally, and implemented in common crate.

Step 2: telemetry

Additionally, we want to have CPU usage value in telemetry, so it would be easy to monitor and debug it.
If function is not supported on platform, it should return None.

Step 3: Auto-adjust thread pool

(old version, for historical reasons)

Details Currently, we use tokio runtime as a thread pool, and it have fixed size of max_blocking_threads, which can't be dynamically changed. So we need a second layer of thread pool control.

I propose this:

search_runtime is stored in ShardReplicaSet, it is currently propagated to all search operations, and it is used to spawn search tasks.

What if we create a wrapper around Handle type (starting from the main.rs), which would add a dynamic check before each spawn_blocking operation,
that would check current CPU usage, and if it is close to 100% across all available cores (read with num_cpus function),
then we would lower number of available threads by 1 until (with N seconds cooldown) it is equal to number of CPU cores,
and if it is low (less than 50%), we would increase number of threads by 1 up to 4x of CPUs.

Initially, we should have 2x of CPU cores, and then adjust it based on CPU usage.


Step 3: thread pool selection

After initial experiments, it turns out that having semaphore in spawn-blocking creates overhead:

  • it requires async
  • with it RPS drops from 4k into 3k under cpu-heavy workflow

We approach gets read of it by introducing 2 runtimes instead of 1 - high_cpu and high_io.

Choise between runtimes is same - based on cpu usage thresholds. But check is much cheaper - just single atomic.
With this approach we have practically no overhead for thread selection.
The downside is that we loose granularity in threads count, but that seems fine.

Note, that having more runtimes doesn't actually introduce more threads, as tokio blocking threads are started on-demand. There might be only over-use of threads during switch from one runtime into another, but that doesn't seem like a big problem


Benchmarks

BQ + rescoring from disk

3-node cluster with 20M 1024D vectors, which doesn't fit RAM.
Binary quantization enabled with rescroing.

# Async scorer
# Avg rps: 9.783414976222488
# IOPS: 7.5K / 10K
# CPU: 0.53 / 2

# Async scorer + dynamic thread pool
# Avg rps: 13.985548929909106
# IOPS: 10K / 10K
# CPU: 0.87 / 2

The result: dynamic threadpool can utlize disk resources better

Full in-ram

1 million 128d, 2 cpu limit

Avg rps: 

1.17.1: 4000
dev: 3000
this: 4000

@generall generall requested a review from dancixx April 24, 2026 15:35
@generall generall marked this pull request as ready for review April 26, 2026 15:25
coderabbitai[bot]

This comment was marked as resolved.

- OpenAPI / telemetry: user-facing cpu_cores_used description (2s window, when null).
- process_cpu_usage: backoff after procfs errors; serialize Linux unit tests on CACHE.
- Docs: decouple runtime thread comments from hardcoded 4× multiplier; name search_runtime in test.
- consensus test: replace stale runtime comment.

Made-with: Cursor
coderabbitai[bot]

This comment was marked as resolved.

Replace hand-edited cpu_cores_used description with output from
schema_generator + merge pipeline so openapi_consistency_check passes.

Made-with: Cursor
coderabbitai[bot]

This comment was marked as resolved.

@generall generall merged commit d02ef48 into dev Apr 27, 2026
15 checks passed
@generall generall deleted the dynamic-cpu-pool branch April 27, 2026 10:56
@coszio coszio mentioned this pull request Apr 28, 2026
@qdrant qdrant deleted a comment from coderabbitai Bot Apr 29, 2026
timvisee pushed a commit that referenced this pull request May 8, 2026
* [AI] inptoduce CPU process measurement

* use parking_lot + 4 seconds refresh rate

* [AI] AdaptiveSearchHandle

* fmt

* openapi schema

* keep Runtime field

* fix test

* [AI] instead of async semaphore, use 2 runtimes

* Adjust usage window to 2 seconds

* Address CodeRabbit review comments for dynamic CPU pool

- OpenAPI / telemetry: user-facing cpu_cores_used description (2s window, when null).
- process_cpu_usage: backoff after procfs errors; serialize Linux unit tests on CACHE.
- Docs: decouple runtime thread comments from hardcoded 4× multiplier; name search_runtime in test.
- consensus test: replace stale runtime comment.

Made-with: Cursor

* chore(openapi): regenerate master spec via generate_openapi_models.sh

Replace hand-edited cpu_cores_used description with output from
schema_generator + merge pipeline so openapi_consistency_check passes.

Made-with: Cursor

---------

Co-authored-by: Cursor Agent <[email protected]>
@timvisee timvisee mentioned this pull request May 8, 2026
@cainzhong

Copy link
Copy Markdown

This change is really helpful. We ran a benchmark using TurboQuant with rescoring from disk, and the RPS increased from 326 to 480, an improvement of nearly 47%, which closely matches your benchmark results.
#8769

@timvisee

Copy link
Copy Markdown
Member

This change is really helpful. We ran a benchmark using TurboQuant with rescoring from disk, and the RPS increased from 326 to 480, an improvement of nearly 47%, which closely matches your benchmark results. #8769

Glad to hear. Thanks for sharing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants