Dynamic cpu pool by generall · Pull Request #8790 · qdrant/qdrant

generall · 2026-04-24T15:35:35Z

Motivation

We need to monitor CPU usage of the current qdrant process,
to determine if we want to decrease / increase search thread pool at the current moment.

Thread pool used to have size equal to number of CPUs, but it doesn't work well if we have high IO.
In this case we want more threads (up to 4x of CPU count). But this affects the full in-RAM usa case (because of CPU contention).
So we want to dynamically adjust thread pool size based on the current CPU usage of the process.

Implementation

Step 1: Get CPU usage of the current process

We need a function, which would return CPU usage of qdrant process in last N (configured constant) seconds.
This function should work on linux and other platforms are optional.

Request to this function should be cheap, so we need to create a "TTL cache" functionality,
so it would actually read CPU usage from OS only once per N seconds.

Function semantic:

If 2 CPU cores were used 100% in last N seconds, we should return value 2.0

It is prefered to use procfs to read process CPU usage.

Function should be available globally, and implemented in common crate.

Step 2: telemetry

Additionally, we want to have CPU usage value in telemetry, so it would be easy to monitor and debug it.
If function is not supported on platform, it should return None.

Step 3: Auto-adjust thread pool

(old version, for historical reasons)

Details

Currently, we use tokio runtime as a thread pool, and it have fixed size of max_blocking_threads, which can't be dynamically changed. So we need a second layer of thread pool control.

I propose this:

search_runtime is stored in ShardReplicaSet, it is currently propagated to all search operations, and it is used to spawn search tasks.

What if we create a wrapper around Handle type (starting from the main.rs), which would add a dynamic check before each spawn_blocking operation,
that would check current CPU usage, and if it is close to 100% across all available cores (read with num_cpus function),
then we would lower number of available threads by 1 until (with N seconds cooldown) it is equal to number of CPU cores,
and if it is low (less than 50%), we would increase number of threads by 1 up to 4x of CPUs.

Initially, we should have 2x of CPU cores, and then adjust it based on CPU usage.

Step 3: thread pool selection

After initial experiments, it turns out that having semaphore in spawn-blocking creates overhead:

it requires async
with it RPS drops from 4k into 3k under cpu-heavy workflow

We approach gets read of it by introducing 2 runtimes instead of 1 - high_cpu and high_io.

Choise between runtimes is same - based on cpu usage thresholds. But check is much cheaper - just single atomic.
With this approach we have practically no overhead for thread selection.
The downside is that we loose granularity in threads count, but that seems fine.

Note, that having more runtimes doesn't actually introduce more threads, as tokio blocking threads are started on-demand. There might be only over-use of threads during switch from one runtime into another, but that doesn't seem like a big problem

Benchmarks

BQ + rescoring from disk

3-node cluster with 20M 1024D vectors, which doesn't fit RAM.
Binary quantization enabled with rescroing.

# Async scorer
# Avg rps: 9.783414976222488
# IOPS: 7.5K / 10K
# CPU: 0.53 / 2

# Async scorer + dynamic thread pool
# Avg rps: 13.985548929909106
# IOPS: 10K / 10K
# CPU: 0.87 / 2

The result: dynamic threadpool can utlize disk resources better

Full in-ram

1 million 128d, 2 cpu limit

Avg rps: 

1.17.1: 4000
dev: 3000
this: 4000

- OpenAPI / telemetry: user-facing cpu_cores_used description (2s window, when null). - process_cpu_usage: backoff after procfs errors; serialize Linux unit tests on CACHE. - Docs: decouple runtime thread comments from hardcoded 4× multiplier; name search_runtime in test. - consensus test: replace stale runtime comment. Made-with: Cursor

Replace hand-edited cpu_cores_used description with output from schema_generator + merge pipeline so openapi_consistency_check passes. Made-with: Cursor

* [AI] inptoduce CPU process measurement * use parking_lot + 4 seconds refresh rate * [AI] AdaptiveSearchHandle * fmt * openapi schema * keep Runtime field * fix test * [AI] instead of async semaphore, use 2 runtimes * Adjust usage window to 2 seconds * Address CodeRabbit review comments for dynamic CPU pool - OpenAPI / telemetry: user-facing cpu_cores_used description (2s window, when null). - process_cpu_usage: backoff after procfs errors; serialize Linux unit tests on CACHE. - Docs: decouple runtime thread comments from hardcoded 4× multiplier; name search_runtime in test. - consensus test: replace stale runtime comment. Made-with: Cursor * chore(openapi): regenerate master spec via generate_openapi_models.sh Replace hand-edited cpu_cores_used description with output from schema_generator + merge pipeline so openapi_consistency_check passes. Made-with: Cursor --------- Co-authored-by: Cursor Agent <[email protected]>

cainzhong · 2026-05-15T13:31:03Z

This change is really helpful. We ran a benchmark using TurboQuant with rescoring from disk, and the RPS increased from 326 to 480, an improvement of nearly 47%, which closely matches your benchmark results.
#8769

timvisee · 2026-05-15T13:32:22Z

This change is really helpful. We ran a benchmark using TurboQuant with rescoring from disk, and the RPS increased from 326 to 480, an improvement of nearly 47%, which closely matches your benchmark results. #8769

Glad to hear. Thanks for sharing!

generall added 4 commits April 24, 2026 15:19

[AI] inptoduce CPU process measurement

288a501

use parking_lot + 4 seconds refresh rate

dd60c6c

[AI] AdaptiveSearchHandle

face10f

fmt

46985b4

generall requested a review from dancixx April 24, 2026 15:35

generall added 5 commits April 24, 2026 17:42

openapi schema

e555590

keep Runtime field

186d884

fix test

ec5ed6f

[AI] instead of async semaphore, use 2 runtimes

3ef9869

Adjust usage window to 2 seconds

2c73c88

generall marked this pull request as ready for review April 26, 2026 15:25

This comment was marked as resolved.

Sign in to view

chore(openapi): regenerate master spec via generate_openapi_models.sh

ec75aef

Replace hand-edited cpu_cores_used description with output from schema_generator + merge pipeline so openapi_consistency_check passes. Made-with: Cursor

This comment was marked as resolved.

Sign in to view

dancixx approved these changes Apr 27, 2026

View reviewed changes

generall merged commit d02ef48 into dev Apr 27, 2026
15 checks passed

generall deleted the dynamic-cpu-pool branch April 27, 2026 10:56

coszio mentioned this pull request Apr 28, 2026

Fix linux import lints #8829

Merged

coderabbitai Bot mentioned this pull request Apr 28, 2026

[AI] Do not hold shards_holder.read on search #8830

Merged

qdrant deleted a comment from coderabbitai Bot Apr 29, 2026

timvisee mentioned this pull request May 8, 2026

Bump version to 1.18.0 #8959

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic cpu pool#8790

Dynamic cpu pool#8790
generall merged 11 commits into
devfrom
dynamic-cpu-pool

generall commented Apr 24, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

cainzhong commented May 15, 2026

Uh oh!

timvisee commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

generall commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Implementation

Step 1: Get CPU usage of the current process

Step 2: telemetry

Step 3: Auto-adjust thread pool

Step 3: thread pool selection

Benchmarks

BQ + rescoring from disk

Full in-ram

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

cainzhong commented May 15, 2026

Uh oh!

timvisee commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

generall commented Apr 24, 2026 •

edited

Loading