Low Memory mode#8714
Merged
Merged
Conversation
caa6170 to
074bd8b
Compare
timvisee
reviewed
Apr 20, 2026
Comment on lines
+104
to
+109
| // Low-memory mode `no_populate` suppresses mmap prefault globally. | ||
| // Pages will be faulted in on demand when queries touch them. | ||
| if crate::low_memory::low_memory_mode().skip_populate() { | ||
| return Ok(()); | ||
| } | ||
|
|
Member
There was a problem hiding this comment.
It's my understanding the low memory mode should also suppress population of universal IO disk cache.
@xzfc could you also confirm this from your side?
Member
Author
There was a problem hiding this comment.
I would say yes, because otherwise we can crash because of local disk cash is full
timvisee
approved these changes
Apr 20, 2026
timvisee
left a comment
Member
There was a problem hiding this comment.
Tested locally, works as expected 👌
In fact, it clearly shows how slow loading into memory is for some of our storage components. In my test loading into memory takes 9 seconds, while starting with no_resident makes it startup in 0.5 seconds. I'm using a local NVMe disk.
VainJoker
pushed a commit
to VainJoker/qdrant
that referenced
this pull request
Apr 21, 2026
* [AI] implement parameter + cover populate + cover quantized vectors * telemetry OpenAPI schema * [AI] hook immutable payload indexes * fmt * do not populate payload index if we fallback to mmap * Reformat * Also suppress universal IO disk cache population --------- Co-authored-by: timvisee <[email protected]>
timvisee
added a commit
that referenced
this pull request
May 8, 2026
* [AI] implement parameter + cover populate + cover quantized vectors * telemetry OpenAPI schema * [AI] hook immutable payload indexes * fmt * do not populate payload index if we fallback to mmap * Reformat * Also suppress universal IO disk cache population --------- Co-authored-by: timvisee <[email protected]>
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Low memory mode
Motivation
It is a frequent situation in production, when customer just keep pushing more data regardless of the machine capacity.
At some point capacity is exhausted and machine cashes. In worst case, machine goes into crash loop, and there are no
nice way to recover it from this situation, as we can't even change config as API are not available.
We need a way to recovery from this situation.
Proposal
Special configuration option
low_memory_modeis added to the config.Should have 3 options:
disabled (default) - no special handling, all collection modules are loaded as usual
no-residentWhen it is set, loading of all collection modules should not force anything to RAM if possible:
always_ram=falseand vectors are on diskon_disk=trueno-populate- same asno-resident, but also no population of RAM from disk should be done. This affects loading of orginal vectors, HNSW index, payload storageImplementation details
Testing scenario: