Skip to content

Optimize ImmutablePointToValues: inline single-value points to reduce RAM usage and increase performance#8235

Merged
timvisee merged 5 commits into
qdrant:devfrom
TY0909:optimize/immutable_point_to_values
Feb 26, 2026
Merged

Optimize ImmutablePointToValues: inline single-value points to reduce RAM usage and increase performance#8235
timvisee merged 5 commits into
qdrant:devfrom
TY0909:optimize/immutable_point_to_values

Conversation

@TY0909

@TY0909 TY0909 commented Feb 26, 2026

Copy link
Copy Markdown
Contributor

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you formatted your code locally using cargo +nightly fmt --all command prior to submission?
  3. Have you checked your code using cargo clippy --workspace --all-features command?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

Summary

Refactors ImmutablePointToValues<N> in lib/segment/src/index/field_index/immutable_point_to_values.rs to store single-value points inline instead of placing them in the shared values_container. This reduces memory indirection and improves performance for the common case where a payload field holds exactly one value per point (e.g. a category or status field).

Motivation

The original implementation stored each point's values as a Range<u32> in point_to_values, with the actual values being stored in the point_to_values_container:

// OLD layout
point_to_values: Vec<Range<u32>>,   // 8 bytes per point
point_to_values_container: Vec<N>,  // all values, including single-value points

For a single-value point this meant:

  1. An 8-byte Range<u32> entry in point_to_values.
  2. The actual value N stored separately inside point_to_values_container.

In practice, the majority of payload fields (keyword, integer, float…) store exactly one value per point. Forcing those values through the container adds unnecessary heap indirection and wastes allocations that could be avoided entirely.

Changes

New PointValueEntry<N> enum

Replaces the Range<u32> with a enum:

enum PointValueEntry<N> {
    /// Exactly one value — stored inline, no container access needed.
    Single(N),
    /// Zero or 2+ values — stored as a contiguous slice in `values_container`.
    Slice { start: u32, count: u32 },
}
Cardinality Old storage New storage
0 values Range<u32> (empty) + nothing in container Slice { start: 0, count: 0 }
1 value Range<u32> + N in container Single(N) — fully inline
2+ values Range<u32> + N… in container Slice { start, count } + N… in container

Updated struct fields

Old New
point_to_values: Vec<Range<u32>> point_entries: Vec<PointValueEntry<N>>
point_to_values_container: Vec<N> values_container: Vec<N>

Smarter values_container pre-allocation

The constructor now calculates capacity by excluding single-value points, so the container is pre-allocated to exactly the right size:

let container_capacity = src
    .iter()
    .filter(|values| values.len() != 1)
    .fold(0, |acc, values| acc + values.len());

All public methods updated

  • new() — dispatches to Single vs Slice during construction.
  • check_values_any() — matches on the entry variant; Single avoids a container lookup entirely.
  • get_values() — returns a uniform std::slice::Iter<'_, N> for both variants.
  • get_values_count() — returns Some(1) directly for Single, no container access.
  • remove_point() — uses std::mem::take on the entry; Single extracts the value directly without touching the container.

Impact

For a segment where every point has exactly one value in a given field, the values_container will be completely empty after this change — all values will live inline in point_entries. The previously mandatory container entries are completely eliminated.

When searching for a field that has exactly one value per point, we no longer need to get a Range<u32> and then retrieve values from the point_to_values_container. Instead, we can get the value directly from point_entries, avoiding indirection and improving cache locality.

Tests

The existing test_immutable_point_to_values_remove test is preserved and refactored to use the new shared check_values helper. The following new test cases are added:

Test What it covers
test_single_value_stored_inline Verifies that single-value points do not appear in values_container
test_all_single_values Verifies values_container is empty when every point has exactly one value
test_get_values_count Correct counts for all cardinalities (0, 1, 2+)
test_check_values_any Predicate evaluation across all variants and edge cases
test_remove_single_value_point Removal correctness for inline Single entries
test_remove_out_of_bounds Out-of-bounds removal returns an empty Vec
test_get_values_out_of_bounds Out-of-bounds lookup returns None
test_empty_source Constructing from an empty Vec is a no-op with safe default behaviour

Compatibility

  • Logic is fully backward-compatible (same public API surface)
  • All existing tests pass
  • New tests cover all entry variants and boundary conditions
  • No unsafe code introduced
  • u32 is preserved throughout to maintain the existing RAM-efficiency guarantee for indices

Performance Test in a Real Scenario

In our test collection with 260 million points, filtered-query RPS improved from 358 to 569 (about a 58% improvement). The test configuration is shown below.

In this test, each filtered field has at most one value per point (some points have no value).

Collection Config
{
    "vectors": {
        "pv": {
            "size": 768,
            "distance": "Dot",
            "hnsw_config": {
                "m": 48,
                "ef_construct": 512,
                "on_disk": false,
                "payload_m": 48
            },
            "quantization_config": {
                "binary": {
                    "always_ram": true,
                    "encoding": "two_bits"
                }
            },
            "on_disk": true
        }
    },
    "sharding_method": "auto",
    "shard_number": 15,
    "replication_factor": 2,
    "write_consistency_factor": 2,
    "on_disk_payload": true,
    "optimizer_config": {
        "default_segment_number": 4,
        "max_segment_size": 16777216
    }
}
Query Sample
{
    "query": pv_test,
    "using": "pv",
    "params": {
        "hnsw_ef": 48,
        "exact": False,
        "quantization": {"ignore": False, "rescore": True, "oversampling": 2},
    },
    "limit": 60,
    "filter": {
        "must": [
            {
                "key": "A",
                "range": {
                    "gte": A1,
                    "lte": A2,
                },
            },
            {"key": "B", "match": {"any": B_list}},
            {"key": "C", "match": {"any": [C1, C2]}},
            {"key": "D", "match": {"value": D2}},
            {"key": "E", "range": {"gte": E1, "lte": E2}},
            {
                "key": "F",
                "match": {"except": [F1, F2, F3]},
            },
        ]
    },
    "with_payload": ["S", "V"],
    "with_vector": False,
}

Benches

We extracted the core logic from both structs, focused on check_values_any, and ran benchmarks. The results are shown below.

Cardinality Old storage New storage
single-value [1.7936 ms 1.7941 ms 1.7948 ms] [834.23 µs 835.15 µs 836.41 µs]
sparse values [1.6957 ms 1.6972 ms 1.6993 ms] [864.67 µs 864.78 µs 864.91 µs]
multi-values [3.4935 ms 3.5153 ms 3.5442 ms] [3.2952 ms 3.2977 ms 3.3012 ms]

single-value means every point has exactly one value.

sparse values means every point has at most one value.

multi-values means every point has a random number of values.

And the detail benchmark file as shown below:

common.rs
#![allow(dead_code)]

use std::time::Duration;

use criterion::BenchmarkGroup;
use criterion::measurement::WallTime;
use rand::{RngExt, SeedableRng};

/// Change this alias to switch the value type used by every benchmark.
pub type ValueType = i64;

use qdrant_filter_test::inline;
use qdrant_filter_test::origin;

// ── Shared constants ─────────────────────────────────────────────────────────
// Change these once and every benchmark will pick up the new values.

/// Seed used to generate the point data.
pub const DATA_SEED: u64 = 42;

/// Seed used to generate random query point IDs.
pub const QUERY_SEED: u64 = 123;

/// Total number of points to generate.
pub const POINTS_NUM: usize = 4000_0000;

/// Upper bound (inclusive) for the random value assigned to each point.
pub const MAX_VALUE: ValueType = 2;

/// Number of point IDs to query during the benchmark.
pub const QUERY_COUNT: usize = 10_0000;

/// The value we pass to `check_values_any` as the predicate target.
pub const CHECK_VALUE: ValueType = 1;

/// Ratio of inner `Vec`s that are empty in the sparse-single-value data type.
/// Must be in the range `0.0..=1.0`. For example, `0.3` means 30 % of entries
/// will be empty.
pub const EMPTY_RATIO: f64 = 0.1;

/// Maximum number of elements in each inner `Vec` for the multi-value data
/// type. Each inner `Vec` will have a random length in `1..=MAX_VALUES_PER_POINT`.
pub const MAX_VALUES_PER_POINT: usize = 5;

// ── Data builders ────────────────────────────────────────────────────────────

/// **Type 1 – single value**: every inner `Vec` has exactly one element.
pub fn build_single_value_data(
    seed: u64,
    points_num: usize,
    max_value: ValueType,
) -> Vec<Vec<ValueType>> {
    let mut rng = rand::rngs::StdRng::seed_from_u64(seed);
    let mut values = Vec::with_capacity(points_num);
    for _ in 0..points_num {
        let value: ValueType = rng.random_range(0..=max_value);
        values.push(vec![value]);
    }
    values
}

/// **Type 2 – sparse single value**: every inner `Vec` has *at most* one
/// element. A proportion of `empty_ratio` entries will be empty.
pub fn build_sparse_single_value_data(
    seed: u64,
    points_num: usize,
    max_value: ValueType,
    empty_ratio: f64,
) -> Vec<Vec<ValueType>> {
    use rand::seq::SliceRandom;

    let mut rng = rand::rngs::StdRng::seed_from_u64(seed);
    let empty_count = (points_num as f64 * empty_ratio).round() as usize;
    let non_empty_count = points_num - empty_count;

    let mut values = Vec::with_capacity(points_num);

    // First, push all non-empty entries
    for _ in 0..non_empty_count {
        let value: ValueType = rng.random_range(0..=max_value);
        values.push(vec![value]);
    }

    // Then, push all empty entries
    for _ in 0..empty_count {
        values.push(vec![]);
    }

    // Shuffle to distribute empties randomly
    values.shuffle(&mut rng);

    values
}

/// **Type 3 – multi value**: every inner `Vec` has a random number of elements
/// in `1..=max_values_per_point`.
pub fn build_multi_value_data(
    seed: u64,
    points_num: usize,
    max_value: ValueType,
    max_values_per_point: usize,
) -> Vec<Vec<ValueType>> {
    let mut rng = rand::rngs::StdRng::seed_from_u64(seed);
    let mut values = Vec::with_capacity(points_num);
    for _ in 0..points_num {
        let count = rng.random_range(1..=max_values_per_point);
        let inner: Vec<ValueType> = (0..count)
            .map(|_| rng.random_range(0..=max_value))
            .collect();
        values.push(inner);
    }
    values
}

// ── TestData ─────────────────────────────────────────────────────────────────

/// Pre-built instances of every implementation, ready for benchmarking.
pub struct TestData {
    pub inline: inline::ImmutablePointToValues<ValueType>,
    pub origin: origin::ImmutablePointToValues<ValueType>,
}

impl TestData {
    /// Construct all implementations from the given raw data.
    pub fn from_data(points_values: Vec<Vec<ValueType>>) -> Self {
        let inline = inline::ImmutablePointToValues::new(points_values.clone());
        let origin = origin::ImmutablePointToValues::new(points_values.clone());

        Self { inline, origin }
    }

    /// Convenience: build from the *single-value* data type using the shared
    /// constants.
    pub fn single_value() -> Self {
        Self::from_data(build_single_value_data(DATA_SEED, POINTS_NUM, MAX_VALUE))
    }

    /// Convenience: build from the *sparse-single-value* data type using the
    /// shared constants.
    pub fn sparse_single_value() -> Self {
        Self::from_data(build_sparse_single_value_data(
            DATA_SEED,
            POINTS_NUM,
            MAX_VALUE,
            EMPTY_RATIO,
        ))
    }

    /// Convenience: build from the *multi-value* data type using the shared
    /// constants.
    pub fn multi_value() -> Self {
        Self::from_data(build_multi_value_data(
            DATA_SEED,
            POINTS_NUM,
            MAX_VALUE,
            MAX_VALUES_PER_POINT,
        ))
    }
}

// ── Query-ID helpers ─────────────────────────────────────────────────────────

/// Generate `QUERY_COUNT` random point IDs in `[0, POINTS_NUM)`.
pub fn build_random_point_ids() -> Vec<u32> {
    let mut rng = rand::rngs::StdRng::seed_from_u64(QUERY_SEED);
    (0..QUERY_COUNT)
        .map(|_| rng.random_range(0..POINTS_NUM as u32))
        .collect()
}

// ── Bench Group Config ───────────────────────────────────────────────────────

/// Configure a benchmark group with standardised timing parameters.
pub fn configure_benchmark(group: &mut BenchmarkGroup<WallTime>) {
    group.measurement_time(Duration::from_secs(30));
}
random.rs
mod common;

use criterion::{Criterion, black_box, criterion_group, criterion_main, configure_benchmark};

use crate::common::{CHECK_VALUE, TestData, build_random_point_ids};

// ── Helper: run all implementations against one TestData ─────────────────────

fn bench_all_impls(
    group: &mut criterion::BenchmarkGroup<criterion::measurement::WallTime>,
    data: &TestData,
    point_ids: &[u32],
) {
    configure_benchmark(group);

    group.bench_function("origin", |b| {
        b.iter(|| {
            for &point_id in point_ids {
                black_box(
                    data.origin
                        .check_values_any(black_box(point_id), |v| *v == CHECK_VALUE),
                );
            }
        });
    });

    group.bench_function("inline", |b| {
        b.iter(|| {
            for &point_id in point_ids {
                black_box(
                    data.inline
                        .check_values_any(black_box(point_id), |v| *v == CHECK_VALUE),
                );
            }
        });
    });
}

// ── Benchmarks ───────────────────────────────────────────────────────────────

fn random_single_value(c: &mut Criterion) {
    let data = TestData::single_value();
    let point_ids = build_random_point_ids();

    let mut group = c.benchmark_group("Random single_value");
    bench_all_impls(&mut group, &data, &point_ids);
    group.finish();
}

fn random_sparse_single_value(c: &mut Criterion) {
    let data = TestData::sparse_single_value();
    let point_ids = build_random_point_ids();

    let mut group = c.benchmark_group("Random sparse_single_value");
    bench_all_impls(&mut group, &data, &point_ids);
    group.finish();
}

fn random_multi_value(c: &mut Criterion) {
    let data = TestData::multi_value();
    let point_ids = build_random_point_ids();

    let mut group = c.benchmark_group("Random multi_value");
    bench_all_impls(&mut group, &data, &point_ids);
    group.finish();
}

criterion_group!(
    benches,
    random_single_value,
    random_sparse_single_value,
    random_multi_value,
);
criterion_main!(benches);
origin.rs
use std::ops::Range;

pub struct ImmutablePointToValues<N: Default> {
    point_to_values: Vec<Range<u32>>,
    point_to_values_container: Vec<N>,
}

impl<N: Default> ImmutablePointToValues<N> {
    pub fn new(src: Vec<Vec<N>>) -> Self {
        let mut point_to_values = Vec::with_capacity(src.len());
        let all_values_count = src.iter().fold(0, |acc, values| acc + values.len());
        let mut point_to_values_container = Vec::with_capacity(all_values_count);
        for values in src {
            let container_len = point_to_values_container.len() as u32;
            let range = container_len..container_len + values.len() as u32;
            point_to_values.push(range.clone());
            point_to_values_container.extend(values);
        }
        Self {
            point_to_values,
            point_to_values_container,
        }
    }

    pub fn check_values_any(&self, idx: u32, check_fn: impl FnMut(&N) -> bool) -> bool {
        let Some(range) = self.point_to_values.get(idx as usize) else {
            return false;
        };

        let range = range.start as usize..range.end as usize;
        if let Some(values) = self.point_to_values_container.get(range) {
            values.iter().any(check_fn)
        } else {
            false
        }
    }
}
inline.rs
#[derive(Debug, Clone)]
enum PointValueEntry<N> {
    Single(N),
    Slice { start: u32, count: u32 },
}

impl<N> Default for PointValueEntry<N> {
    fn default() -> Self {
        PointValueEntry::Slice { start: 0, count: 0 }
    }
}

#[derive(Debug, Clone, Default)]
pub struct ImmutablePointToValues<N: Default> {
    point_entries: Vec<PointValueEntry<N>>,
    values_container: Vec<N>,
}

impl<N: Default> ImmutablePointToValues<N> {
    pub fn new(src: Vec<Vec<N>>) -> Self {
        let mut point_entries = Vec::with_capacity(src.len());
        let container_capacity = src
            .iter()
            .filter(|values| values.len() != 1)
            .fold(0, |acc, values| acc + values.len());
        let mut values_container = Vec::with_capacity(container_capacity);

        for values in src {
            if values.len() == 1 {
                let value = values.into_iter().next().expect("length checked above");
                point_entries.push(PointValueEntry::Single(value));
            } else {
                let start = values_container.len() as u32;
                let count = values.len() as u32;
                point_entries.push(PointValueEntry::Slice { start, count });
                values_container.extend(values);
            }
        }

        Self {
            point_entries,
            values_container,
        }
    }

    pub fn check_values_any(&self, idx: u32, check_fn: impl FnMut(&N) -> bool) -> bool {
        let Some(entry) = self.point_entries.get(idx as usize) else {
            return false;
        };

        match entry {
            PointValueEntry::Single(v) => std::iter::once(v).any(check_fn),
            PointValueEntry::Slice { start, count } => {
                let range = *start as usize..(*start + *count) as usize;
                if let Some(values) = self.values_container.get(range) {
                    values.iter().any(check_fn)
                } else {
                    false
                }
            }
        }
    }
}

1995chen and others added 2 commits February 26, 2026 07:59
- Replace PointMeta with PointValueEntry enum to distinguish single vs
  multi-value points
- Store single values directly in point_entries, reducing container
  overhead
- Only multi-value and empty points use the shared values_container
- Update all methods and tests to use new representation
- Add tests for edge cases and single-value optimizations
coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai Bot Feb 26, 2026
@1995chen

Copy link
Copy Markdown
Contributor

Hi @timvisee @generall, we've implemented some minor optimizations to the filter query component. Initial testing in our environment shows performance gains. Would you be available to review it when convenient? Please don't hesitate to contact us if anything is unclear.

@IvanPleshkov IvanPleshkov self-requested a review February 26, 2026 11:23
@timvisee timvisee self-requested a review February 26, 2026 11:25
@generall

Copy link
Copy Markdown
Member

Hey @TY0909, @1995chen thank you for a very high quality contributions so far! We will review your PR shortly and very likely include it into the next patch release comming in a few weeks.

@timvisee timvisee left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

I'd like to add an explicit branch for a zero sized value. I'll commit it to this branch right away.

Comment thread lib/segment/src/index/field_index/immutable_point_to_values.rs Outdated
Comment thread lib/segment/src/index/field_index/immutable_point_to_values.rs Outdated
Comment thread lib/segment/src/index/field_index/immutable_point_to_values.rs
@qdrant qdrant deleted a comment from coderabbitai Bot Feb 26, 2026
@qdrant qdrant deleted a comment from coderabbitai Bot Feb 26, 2026
@1995chen

Copy link
Copy Markdown
Contributor

We found in our tests on check_values_any that adding the extra branch caused a small performance drop, so we ended up removing it. That said, we totally respect your call on this.

Thank you!

I'd like to add an explicit branch for a zero sized value. I'll commit it to this branch right away.

@timvisee

Copy link
Copy Markdown
Member

We found in our tests on check_values_any that adding the extra branch caused a small performance drop, so we ended up removing it. That said, we totally respect your call on this.

Thank you!
I'd like to add an explicit branch for a zero sized value. I'll commit it to this branch right away.

I see. Thanks for considering it.

I didn't rerun a benchmark on it, so let me do that now. My guess is that it'd just be this branch.

@timvisee

Copy link
Copy Markdown
Member

I get more significant differences between the existing and the inline version, so that's good news in terms of what speed we can gain here!

Cardinality Old storage New storage
single-value [879.80 µs 884.48 µs 890.47 µs] [289.52 µs 289.61 µs 289.69 µs]
sparse values [881.96 µs 887.21 µs 893.43 µs] [338.06 µs 339.56 µs 341.12 µs]
multi-values [1.7552 ms 1.7587 ms 1.7625 ms] [1.7858 ms 1.7903 ms 1.7955 ms]

Having the extra branch for zero sized values does indeed appear slightly worse, though it might fall within margin of error:

Random single_value/inline
                        time:   [289.61 µs 291.14 µs 292.53 µs]
                        change: [+0.1420% +0.5605% +0.9003%] (p = 0.00 < 0.05)
                        Change within noise threshold.

Random sparse_single_value/inline
                        time:   [332.71 µs 333.92 µs 335.27 µs]
                        change: [−1.1472% −0.7012% −0.2721%] (p = 0.00 < 0.05)
                        Change within noise threshold.

Random multi_value/inline
                        time:   [1.9469 ms 1.9562 ms 1.9665 ms]
                        change: [+8.7182% +9.4678% +10.246%] (p = 0.00 < 0.05)
                        Performance has regressed.

I'll remove the branch anyhow.

Thanks once again for the detailed PR and benchmarks!

coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai Bot Feb 26, 2026
@timvisee timvisee merged commit ada8108 into qdrant:dev Feb 26, 2026
16 of 17 checks passed
generall pushed a commit that referenced this pull request Mar 26, 2026
… RAM usage and increase performance (#8235)

* optimize: performance improve

* Optimize ImmutablePointToValues to store single values inline

- Replace PointMeta with PointValueEntry enum to distinguish single vs
  multi-value points
- Store single values directly in point_entries, reducing container
  overhead
- Only multi-value and empty points use the shared values_container
- Update all methods and tests to use new representation
- Add tests for edge cases and single-value optimizations

* Explicit branch for zero sized value

* Update comment

* Remove zero count branch as it appears to make it slightly slower

---------

Co-authored-by: 1995chen <[email protected]>
Co-authored-by: timvisee <[email protected]>
KShivendu pushed a commit that referenced this pull request Mar 30, 2026
… RAM usage and increase performance (#8235)

* optimize: performance improve

* Optimize ImmutablePointToValues to store single values inline

- Replace PointMeta with PointValueEntry enum to distinguish single vs
  multi-value points
- Store single values directly in point_entries, reducing container
  overhead
- Only multi-value and empty points use the shared values_container
- Update all methods and tests to use new representation
- Add tests for edge cases and single-value optimizations

* Explicit branch for zero sized value

* Update comment

* Remove zero count branch as it appears to make it slightly slower

---------

Co-authored-by: 1995chen <[email protected]>
Co-authored-by: timvisee <[email protected]>
@timvisee

Copy link
Copy Markdown
Member

We've released Qdrant 1.18.0 includes these changes. You can read more about it in our release blog here.

Thank you for picking this up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants