Skip to content

Low Memory mode#8714

Merged
generall merged 7 commits into
devfrom
low-memory-mode
Apr 20, 2026
Merged

Low Memory mode#8714
generall merged 7 commits into
devfrom
low-memory-mode

Conversation

@generall

@generall generall commented Apr 18, 2026

Copy link
Copy Markdown
Member

Low memory mode

Motivation

It is a frequent situation in production, when customer just keep pushing more data regardless of the machine capacity.
At some point capacity is exhausted and machine cashes. In worst case, machine goes into crash loop, and there are no
nice way to recover it from this situation, as we can't even change config as API are not available.

We need a way to recovery from this situation.

Proposal

Special configuration option low_memory_mode is added to the config.

Should have 3 options:

  • disabled (default) - no special handling, all collection modules are loaded as usual

  • no-resident
    When it is set, loading of all collection modules should not force anything to RAM if possible:

    • Quantization should be loaded as if always_ram=false and vectors are on disk
    • Payload indexes should be loaded as if on_disk=true
  • no-populate - same as no-resident, but also no population of RAM from disk should be done. This affects loading of orginal vectors, HNSW index, payload storage

Implementation details

  • Make sure that all components that support loading into RAM or disk have compatible format on disk, so they can be loaded in both modes without any issues.
  • Decide how to propagate parameter, either use global variable, or propagate it through function parameters. It would depend on how deep we need to propagate.
  • Implement parameter check and handling in all relevant components

Testing scenario:

  • load snapshot with all payloads and vector quantizations
  • check out memory reporitng API with and without option enabled

coderabbitai[bot]

This comment was marked as resolved.

@generall generall requested a review from timvisee April 19, 2026 22:12
@qdrant qdrant deleted a comment from coderabbitai Bot Apr 20, 2026
coderabbitai[bot]

This comment was marked as resolved.

Comment thread config/config.yaml
Comment on lines +104 to +109
// Low-memory mode `no_populate` suppresses mmap prefault globally.
// Pages will be faulted in on demand when queries touch them.
if crate::low_memory::low_memory_mode().skip_populate() {
return Ok(());
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's my understanding the low memory mode should also suppress population of universal IO disk cache.

@xzfc could you also confirm this from your side?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say yes, because otherwise we can crash because of local disk cash is full

Comment thread lib/common/common/src/low_memory.rs
@qdrant qdrant deleted a comment from coderabbitai Bot Apr 20, 2026

@timvisee timvisee left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally, works as expected 👌

In fact, it clearly shows how slow loading into memory is for some of our storage components. In my test loading into memory takes 9 seconds, while starting with no_resident makes it startup in 0.5 seconds. I'm using a local NVMe disk.

@generall generall merged commit f321c9f into dev Apr 20, 2026
29 of 30 checks passed
@generall generall deleted the low-memory-mode branch April 20, 2026 09:33
VainJoker pushed a commit to VainJoker/qdrant that referenced this pull request Apr 21, 2026
* [AI] implement parameter + cover populate + cover quantized vectors

* telemetry OpenAPI schema

* [AI] hook immutable payload indexes

* fmt

* do not populate payload index if we fallback to mmap

* Reformat

* Also suppress universal IO disk cache population

---------

Co-authored-by: timvisee <[email protected]>
timvisee added a commit that referenced this pull request May 8, 2026
* [AI] implement parameter + cover populate + cover quantized vectors

* telemetry OpenAPI schema

* [AI] hook immutable payload indexes

* fmt

* do not populate payload index if we fallback to mmap

* Reformat

* Also suppress universal IO disk cache population

---------

Co-authored-by: timvisee <[email protected]>
@timvisee timvisee mentioned this pull request May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants