Optimize ImmutablePointToValues: inline single-value points to reduce RAM usage and increase performance#8235
Conversation
- Replace PointMeta with PointValueEntry enum to distinguish single vs multi-value points - Store single values directly in point_entries, reducing container overhead - Only multi-value and empty points use the shared values_container - Update all methods and tests to use new representation - Add tests for edge cases and single-value optimizations
timvisee
left a comment
There was a problem hiding this comment.
Thank you!
I'd like to add an explicit branch for a zero sized value. I'll commit it to this branch right away.
|
We found in our tests on check_values_any that adding the extra branch caused a small performance drop, so we ended up removing it. That said, we totally respect your call on this.
|
I see. Thanks for considering it. I didn't rerun a benchmark on it, so let me do that now. My guess is that it'd just be this branch. |
|
I get more significant differences between the existing and the inline version, so that's good news in terms of what speed we can gain here!
Having the extra branch for zero sized values does indeed appear slightly worse, though it might fall within margin of error: I'll remove the branch anyhow. Thanks once again for the detailed PR and benchmarks! |
… RAM usage and increase performance (#8235) * optimize: performance improve * Optimize ImmutablePointToValues to store single values inline - Replace PointMeta with PointValueEntry enum to distinguish single vs multi-value points - Store single values directly in point_entries, reducing container overhead - Only multi-value and empty points use the shared values_container - Update all methods and tests to use new representation - Add tests for edge cases and single-value optimizations * Explicit branch for zero sized value * Update comment * Remove zero count branch as it appears to make it slightly slower --------- Co-authored-by: 1995chen <[email protected]> Co-authored-by: timvisee <[email protected]>
… RAM usage and increase performance (#8235) * optimize: performance improve * Optimize ImmutablePointToValues to store single values inline - Replace PointMeta with PointValueEntry enum to distinguish single vs multi-value points - Store single values directly in point_entries, reducing container overhead - Only multi-value and empty points use the shared values_container - Update all methods and tests to use new representation - Add tests for edge cases and single-value optimizations * Explicit branch for zero sized value * Update comment * Remove zero count branch as it appears to make it slightly slower --------- Co-authored-by: 1995chen <[email protected]> Co-authored-by: timvisee <[email protected]>
|
We've released Qdrant 1.18.0 includes these changes. You can read more about it in our release blog here. Thank you for picking this up! |
All Submissions:
devbranch. Did you create your branch fromdev?New Feature Submissions:
cargo +nightly fmt --allcommand prior to submission?cargo clippy --workspace --all-featurescommand?Changes to Core Features:
Summary
Refactors
ImmutablePointToValues<N>inlib/segment/src/index/field_index/immutable_point_to_values.rsto store single-value points inline instead of placing them in the sharedvalues_container. This reduces memory indirection and improves performance for the common case where a payload field holds exactly one value per point (e.g. acategoryorstatusfield).Motivation
The original implementation stored each point's values as a
Range<u32>inpoint_to_values, with the actual values being stored in thepoint_to_values_container:For a single-value point this meant:
Range<u32>entry inpoint_to_values.Nstored separately insidepoint_to_values_container.In practice, the majority of payload fields (keyword, integer, float…) store exactly one value per point. Forcing those values through the container adds unnecessary heap indirection and wastes allocations that could be avoided entirely.
Changes
New
PointValueEntry<N>enumReplaces the
Range<u32>with a enum:Range<u32>(empty) + nothing in containerSlice { start: 0, count: 0 }Range<u32>+Nin containerSingle(N)— fully inlineRange<u32>+N…in containerSlice { start, count }+N…in containerUpdated struct fields
point_to_values: Vec<Range<u32>>point_entries: Vec<PointValueEntry<N>>point_to_values_container: Vec<N>values_container: Vec<N>Smarter
values_containerpre-allocationThe constructor now calculates capacity by excluding single-value points, so the container is pre-allocated to exactly the right size:
All public methods updated
new()— dispatches toSinglevsSliceduring construction.check_values_any()— matches on the entry variant;Singleavoids a container lookup entirely.get_values()— returns a uniformstd::slice::Iter<'_, N>for both variants.get_values_count()— returnsSome(1)directly forSingle, no container access.remove_point()— usesstd::mem::takeon the entry;Singleextracts the value directly without touching the container.Impact
For a segment where every point has exactly one value in a given field, the
values_containerwill be completely empty after this change — all values will live inline inpoint_entries. The previously mandatory container entries are completely eliminated.When searching for a field that has exactly one value per point, we no longer need to get a
Range<u32>and then retrieve values from thepoint_to_values_container. Instead, we can get the value directly frompoint_entries, avoiding indirection and improving cache locality.Tests
The existing
test_immutable_point_to_values_removetest is preserved and refactored to use the new sharedcheck_valueshelper. The following new test cases are added:test_single_value_stored_inlinevalues_containertest_all_single_valuesvalues_containeris empty when every point has exactly one valuetest_get_values_counttest_check_values_anytest_remove_single_value_pointSingleentriestest_remove_out_of_boundsVectest_get_values_out_of_boundsNonetest_empty_sourceVecis a no-op with safe default behaviourCompatibility
u32is preserved throughout to maintain the existing RAM-efficiency guarantee for indicesPerformance Test in a Real Scenario
In our test collection with 260 million points, filtered-query RPS improved from 358 to 569 (about a 58% improvement). The test configuration is shown below.
In this test, each filtered field has at most one value per point (some points have no value).
Collection Config
{ "vectors": { "pv": { "size": 768, "distance": "Dot", "hnsw_config": { "m": 48, "ef_construct": 512, "on_disk": false, "payload_m": 48 }, "quantization_config": { "binary": { "always_ram": true, "encoding": "two_bits" } }, "on_disk": true } }, "sharding_method": "auto", "shard_number": 15, "replication_factor": 2, "write_consistency_factor": 2, "on_disk_payload": true, "optimizer_config": { "default_segment_number": 4, "max_segment_size": 16777216 } }Query Sample
{ "query": pv_test, "using": "pv", "params": { "hnsw_ef": 48, "exact": False, "quantization": {"ignore": False, "rescore": True, "oversampling": 2}, }, "limit": 60, "filter": { "must": [ { "key": "A", "range": { "gte": A1, "lte": A2, }, }, {"key": "B", "match": {"any": B_list}}, {"key": "C", "match": {"any": [C1, C2]}}, {"key": "D", "match": {"value": D2}}, {"key": "E", "range": {"gte": E1, "lte": E2}}, { "key": "F", "match": {"except": [F1, F2, F3]}, }, ] }, "with_payload": ["S", "V"], "with_vector": False, }Benches
We extracted the core logic from both structs, focused on
check_values_any, and ran benchmarks. The results are shown below.single-value means every point has exactly one value.
sparse values means every point has at most one value.
multi-values means every point has a random number of values.
And the detail benchmark file as shown below:
common.rs
random.rs
origin.rs
inline.rs