Skip to content

feat: avoid payload-index rebuild#9138

Merged
dancixx merged 14 commits into
devfrom
feat/avoid-payload-index-rebuild
Jun 3, 2026
Merged

feat: avoid payload-index rebuild#9138
dancixx merged 14 commits into
devfrom
feat/avoid-payload-index-rebuild

Conversation

@dancixx
Copy link
Copy Markdown
Contributor

@dancixx dancixx commented May 22, 2026

Impl of this plan: #9114

Summary

Re-issuing create_payload_index with only a different on_disk value previously triggered a full index rebuild from payload storage. For non-appendable (mmap) segments the on-disk format is identical between on_disk: false (loaded into RAM) and on_disk: true (mmap, page-cache only) — same files, only a different in-memory wrapper. This PR avoids the rebuild by reusing the existing files and reloading them in the requested mode, integrated into the existing delete → build → apply pipeline (no separate in-place swap). The reload runs off the write lock and the previous index is not dropped, so reads are never blocked and the field is never left unindexed.

  • drop_index_if_incompatible — classifies the schema delta; an on_disk-only change is treated as compatible (the index is kept, files not dropped). Any other change still drops and rebuilds.
  • build_indexreuse_existing_index — for an on_disk-only change, reloads the field's index from its existing files in the new mode (via the normal new_index loader) instead of rebuilding from payload; falls back to a full rebuild if the files can't be loaded.
  • apply_index — swaps the freshly-loaded index in; only this brief step takes the write lock.
  • Appendable Gridstoreon_disk has no storage-layer effect there, so the new value is only persisted in the schema (config) and the live index is left untouched; the optimizer applies it when the segment is converted to non-appendable.

Testsdrop_index_if_incompatible keeps the index on an on_disk-only change; build_index reloads in the new mode and preserves the indexed-point count; classifier unit tests for every PayloadSchemaParams variant.


All Submissions:

⚠️ PRs must target the dev branch. If your PR targets master, please change the base branch to dev before requesting a review.

  • My PR targets the dev branch (not master) and my branch was created from dev.
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you formatted your code locally using cargo +nightly fmt --all command prior to submission?
  3. Have you checked your code using cargo clippy --workspace --all-features command?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
lib/segment/src/index/field_index/map_index/tests.rs (1)

477-501: ⚡ Quick win

Add behavior assertions to the round-trip swap test.

int_map_immutable_to_mmap_round_trip_preserves_files currently proves byte-for-byte file stability, but not query/value correctness after each swap. Please also assert representative get_iterator/values_count outputs before and after both transitions to catch in-memory rebuild regressions.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/segment/src/index/field_index/map_index/tests.rs` around lines 477 - 501,
The test int_map_immutable_to_mmap_round_trip_preserves_files currently only
checks file snapshots and swap state; add assertions to validate query
correctness by calling MapIndex methods like get_iterator and values_count on
the loaded index (from data saved via save_map_index/load_map_index) before the
first swap, after swapping to Mmap, after the no-op swap, and after swapping
back to Immutable: pick a few representative keys from the test data (e.g., keys
corresponding to values 1, 4, and 6) and assert that values_count(key) matches
expected counts and that iterating via get_iterator yields the expected Value
items for those keys each time so that in-memory rebuilds preserve semantic
results across swap_on_disk(true/false) transitions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/segment/src/index/struct_payload_index/mod.rs`:
- Around line 426-441: In update_indexed_schema, avoid leaving
self.config.indices mutated when save_config() fails by capturing the previous
entry for field (e.g., previous = self.config.indices.remove(field) or get
clone), perform the insert of PayloadFieldSchemaWithIndexType, call
save_config(), and if save_config() returns Err restore the previous value
(re-insert previous if Some or ensure the key is removed if it was None) before
returning the error; reference the update_indexed_schema function,
self.config.indices insertion, and save_config() call to locate where to
snapshot and rollback the in-memory change.

---

Nitpick comments:
In `@lib/segment/src/index/field_index/map_index/tests.rs`:
- Around line 477-501: The test
int_map_immutable_to_mmap_round_trip_preserves_files currently only checks file
snapshots and swap state; add assertions to validate query correctness by
calling MapIndex methods like get_iterator and values_count on the loaded index
(from data saved via save_map_index/load_map_index) before the first swap, after
swapping to Mmap, after the no-op swap, and after swapping back to Immutable:
pick a few representative keys from the test data (e.g., keys corresponding to
values 1, 4, and 6) and assert that values_count(key) matches expected counts
and that iterating via get_iterator yields the expected Value items for those
keys each time so that in-memory rebuilds preserve semantic results across
swap_on_disk(true/false) transitions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: da0805f9-5f3d-4258-9ddd-8ef4fdc794e2

📥 Commits

Reviewing files that changed from the base of the PR and between 965cf4c and 290de24.

📒 Files selected for processing (23)
  • lib/segment/src/entry/entry_point.rs
  • lib/segment/src/index/field_index/field_index_base/field_index.rs
  • lib/segment/src/index/field_index/full_text_index/immutable_text_index/lifecycle.rs
  • lib/segment/src/index/field_index/full_text_index/inverted_index/mmap_inverted_index/mod.rs
  • lib/segment/src/index/field_index/full_text_index/lifecycle.rs
  • lib/segment/src/index/field_index/full_text_index/tests/mod.rs
  • lib/segment/src/index/field_index/geo_index/immutable_geo_index/lifecycle.rs
  • lib/segment/src/index/field_index/geo_index/mod.rs
  • lib/segment/src/index/field_index/geo_index/tests.rs
  • lib/segment/src/index/field_index/map_index/immutable_map_index/lifecycle.rs
  • lib/segment/src/index/field_index/map_index/lifecycle.rs
  • lib/segment/src/index/field_index/map_index/tests.rs
  • lib/segment/src/index/field_index/mod.rs
  • lib/segment/src/index/field_index/numeric_index/immutable_numeric_index/lifecycle.rs
  • lib/segment/src/index/field_index/numeric_index/lifecycle.rs
  • lib/segment/src/index/field_index/numeric_index/storage/lifecycle.rs
  • lib/segment/src/index/field_index/numeric_index/tests.rs
  • lib/segment/src/index/field_index/schema_transition.rs
  • lib/segment/src/index/field_index/swap_in_place.rs
  • lib/segment/src/index/struct_payload_index/mod.rs
  • lib/segment/src/index/struct_payload_index/payload_index.rs
  • lib/segment/src/segment/entry.rs
  • lib/shard/src/update.rs

Comment thread lib/segment/src/index/struct_payload_index/mod.rs Outdated
@dancixx dancixx force-pushed the feat/avoid-payload-index-rebuild branch from 5cc23aa to 00b68c4 Compare May 26, 2026 10:35
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
lib/segment/src/index/field_index/field_index_base/field_index.rs (1)

249-252: ⚡ Quick win

Update stale phase comment to match current behavior.

Line 249–252 says every arm returns Ok(false), but bool/null swaps now return Ok(true). Please adjust this doc to avoid misleading future changes.

Proposed doc-only diff
-    /// Phase 1 scaffolding: every arm returns `Ok(false)` so the wiring is
-    /// in place but the fast path never fires. Each family's real
-    /// implementation will replace its arm in later phases (numeric → map →
-    /// geo → full-text).
+    /// Some families perform real in-place swaps already (for example,
+    /// schema-only families); others still return `Ok(false)` until their
+    /// swap implementation is added.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/segment/src/index/field_index/field_index_base/field_index.rs` around
lines 249 - 252, Update the Phase 1 scaffolding doc comment in field_index.rs
(the "Phase 1 scaffolding" block near FieldIndexBase) to reflect current
behavior: instead of saying every arm returns `Ok(false)`, note that most arms
return `Ok(false)` but bool/null swap arms now return `Ok(true)` (so the fast
path can fire for those cases); rephrase to avoid misleading future maintainers
and mention that concrete family implementations will replace these arms in
later phases.
lib/segment/src/index/field_index/map_index/tests.rs (1)

477-500: ⚡ Quick win

Strengthen swap regression coverage with behavioral assertions.

Please also assert read behavior after each swap direction (and add the swap_on_disk(false) false-path check for mutable) so this test guards semantics, not only file-byte stability.

Proposed test additions
     assert!(index.swap_on_disk(true).unwrap());
     assert!(matches!(index, MapIndex::Mmap(_)));
     assert_eq!(snapshot_files(&index), files_before);
+    let hw_counter = HardwareCounterCell::new();
+    let mut hits: Vec<PointOffsetType> = index.get_iterator(&4, &hw_counter).collect();
+    hits.sort();
+    assert_eq!(hits, vec![1, 2]);

     // No-op
     assert!(index.swap_on_disk(true).unwrap());

     assert!(index.swap_on_disk(false).unwrap());
     assert!(matches!(index, MapIndex::Immutable(_)));
     assert_eq!(snapshot_files(&index), files_before);
+    let hw_counter = HardwareCounterCell::new();
+    let mut hits: Vec<PointOffsetType> = index.get_iterator(&4, &hw_counter).collect();
+    hits.sort();
+    assert_eq!(hits, vec![1, 2]);
@@
     assert!(matches!(index, MapIndex::Mutable(_)));
     assert!(!index.swap_on_disk(true).unwrap());
     assert!(matches!(index, MapIndex::Mutable(_)));
+    assert!(!index.swap_on_disk(false).unwrap());
+    assert!(matches!(index, MapIndex::Mutable(_)));

Also applies to: 503-519

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/segment/src/index/field_index/map_index/tests.rs` around lines 477 - 500,
Add behavioral assertions to
int_map_immutable_to_mmap_round_trip_preserves_files: after loading via
load_map_index and after every swap_on_disk(true/false) call, verify that
reading from the index returns the expected values for the original `data` (use
the same accessors used elsewhere in tests to read mapped values); also add the
missing false-path check for swap_on_disk(false) on the mutable-to-mmap
transition to ensure it returns Ok(false) when no-op is expected. Reference the
test helpers save_map_index, load_map_index, snapshot_files and the MapIndex
variants (MapIndex::Immutable, MapIndex::Mmap) and call swap_on_disk on the
`index` variable, asserting both the boolean return and that lookups into the
`index` match `data` after each swap.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/segment/src/index/field_index/full_text_index/tests/mod.rs`:
- Around line 602-613: The test is flaky because FullTextIndex::new_mmap(...,
false, &empty_deleted) can return Mmap depending on runtime memory heuristics;
change the setup to deterministically create an Immutable index by calling
FullTextIndex::new_mmap with the flag that forces immutable creation (replace
the third argument false with true) or otherwise use the explicit constructor
that returns FullTextIndex::Immutable so the assert!(matches!(index,
FullTextIndex::Immutable(_))) and subsequent snapshot_files checks are
deterministic.

---

Nitpick comments:
In `@lib/segment/src/index/field_index/field_index_base/field_index.rs`:
- Around line 249-252: Update the Phase 1 scaffolding doc comment in
field_index.rs (the "Phase 1 scaffolding" block near FieldIndexBase) to reflect
current behavior: instead of saying every arm returns `Ok(false)`, note that
most arms return `Ok(false)` but bool/null swap arms now return `Ok(true)` (so
the fast path can fire for those cases); rephrase to avoid misleading future
maintainers and mention that concrete family implementations will replace these
arms in later phases.

In `@lib/segment/src/index/field_index/map_index/tests.rs`:
- Around line 477-500: Add behavioral assertions to
int_map_immutable_to_mmap_round_trip_preserves_files: after loading via
load_map_index and after every swap_on_disk(true/false) call, verify that
reading from the index returns the expected values for the original `data` (use
the same accessors used elsewhere in tests to read mapped values); also add the
missing false-path check for swap_on_disk(false) on the mutable-to-mmap
transition to ensure it returns Ok(false) when no-op is expected. Reference the
test helpers save_map_index, load_map_index, snapshot_files and the MapIndex
variants (MapIndex::Immutable, MapIndex::Mmap) and call swap_on_disk on the
`index` variable, asserting both the boolean return and that lookups into the
`index` match `data` after each swap.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 3b60baff-c88f-4fd5-a47a-4d4cb4a16701

📥 Commits

Reviewing files that changed from the base of the PR and between 5cc23aa and 00b68c4.

📒 Files selected for processing (25)
  • lib/segment/src/entry/entry_point.rs
  • lib/segment/src/index/field_index/bool_index/mod.rs
  • lib/segment/src/index/field_index/field_index_base/field_index.rs
  • lib/segment/src/index/field_index/full_text_index/immutable_text_index/lifecycle.rs
  • lib/segment/src/index/field_index/full_text_index/inverted_index/mmap_inverted_index/mod.rs
  • lib/segment/src/index/field_index/full_text_index/lifecycle.rs
  • lib/segment/src/index/field_index/full_text_index/tests/mod.rs
  • lib/segment/src/index/field_index/geo_index/immutable_geo_index/lifecycle.rs
  • lib/segment/src/index/field_index/geo_index/mod.rs
  • lib/segment/src/index/field_index/geo_index/tests.rs
  • lib/segment/src/index/field_index/map_index/immutable_map_index/lifecycle.rs
  • lib/segment/src/index/field_index/map_index/lifecycle.rs
  • lib/segment/src/index/field_index/map_index/tests.rs
  • lib/segment/src/index/field_index/mod.rs
  • lib/segment/src/index/field_index/null_index/mod.rs
  • lib/segment/src/index/field_index/numeric_index/immutable_numeric_index/lifecycle.rs
  • lib/segment/src/index/field_index/numeric_index/lifecycle.rs
  • lib/segment/src/index/field_index/numeric_index/storage/lifecycle.rs
  • lib/segment/src/index/field_index/numeric_index/tests.rs
  • lib/segment/src/index/field_index/schema_transition.rs
  • lib/segment/src/index/field_index/swap_in_place.rs
  • lib/segment/src/index/struct_payload_index/mod.rs
  • lib/segment/src/index/struct_payload_index/payload_index.rs
  • lib/segment/src/segment/entry.rs
  • lib/shard/src/update.rs
✅ Files skipped from review due to trivial changes (2)
  • lib/segment/src/index/field_index/numeric_index/immutable_numeric_index/lifecycle.rs
  • lib/segment/src/index/field_index/full_text_index/inverted_index/mmap_inverted_index/mod.rs

Comment on lines +602 to +613
let mut index = FullTextIndex::new_mmap(
temp_dir.path().to_path_buf(),
config.clone(),
false,
&empty_deleted,
)
.unwrap()
.unwrap();

assert!(matches!(index, FullTextIndex::Immutable(_)));
let files_before = snapshot_files(&index);
assert!(!files_before.is_empty());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Make immutable setup deterministic in swap test

This test currently depends on runtime low-memory mode. FullTextIndex::new_mmap(..., false, ...) can still return FullTextIndex::Mmap, making the Immutable assertion flaky.

Proposed fix
-        // Reload as Immutable
-        let mut index = FullTextIndex::new_mmap(
-            temp_dir.path().to_path_buf(),
-            config.clone(),
-            false,
-            &empty_deleted,
-        )
-        .unwrap()
-        .unwrap();
+        // Reload as Immutable explicitly (avoid low-memory mode auto-selecting Mmap)
+        let mmap_index = crate::index::field_index::full_text_index::mmap_text_index::MmapFullTextIndex::open(
+            temp_dir.path().to_path_buf(),
+            config.clone(),
+            false,
+            &empty_deleted,
+        )
+        .unwrap()
+        .unwrap();
+        let mut index = FullTextIndex::Immutable(
+            crate::index::field_index::full_text_index::immutable_text_index::ImmutableFullTextIndex::open_mmap(mmap_index).unwrap(),
+        );
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/segment/src/index/field_index/full_text_index/tests/mod.rs` around lines
602 - 613, The test is flaky because FullTextIndex::new_mmap(..., false,
&empty_deleted) can return Mmap depending on runtime memory heuristics; change
the setup to deterministically create an Immutable index by calling
FullTextIndex::new_mmap with the flag that forces immutable creation (replace
the third argument false with true) or otherwise use the explicit constructor
that returns FullTextIndex::Immutable so the assert!(matches!(index,
FullTextIndex::Immutable(_))) and subsequent snapshot_files checks are
deterministic.

@qdrant qdrant deleted a comment from coderabbitai Bot May 26, 2026
@dancixx dancixx force-pushed the feat/avoid-payload-index-rebuild branch from 00b68c4 to 87b0638 Compare May 26, 2026 14:24
coderabbitai[bot]

This comment was marked as resolved.

@dancixx dancixx force-pushed the feat/avoid-payload-index-rebuild branch from cd23362 to 84e8b34 Compare May 26, 2026 15:17
@dancixx dancixx marked this pull request as ready for review May 26, 2026 15:36
@dancixx dancixx force-pushed the feat/avoid-payload-index-rebuild branch from 84e8b34 to 1b43a26 Compare May 26, 2026 21:01
@dancixx dancixx requested a review from generall May 27, 2026 07:03
@dancixx dancixx marked this pull request as draft May 29, 2026 13:50
@dancixx dancixx force-pushed the feat/avoid-payload-index-rebuild branch from 1b43a26 to aee2b8f Compare June 1, 2026 10:39
@qdrant qdrant deleted a comment from coderabbitai Bot Jun 1, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 2, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 949e01f4-789e-458c-826b-326c70e07a04

📥 Commits

Reviewing files that changed from the base of the PR and between 64fc8c0 and 0da355e.

📒 Files selected for processing (2)
  • lib/segment/src/index/struct_payload_index/payload_index.rs
  • lib/segment/src/index/struct_payload_index/tests.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • lib/segment/src/index/struct_payload_index/tests.rs

📝 Walkthrough

Walkthrough

This PR implements smart index reuse when only the on_disk flag changes across schema updates. It introduces SchemaTransition::classify() to distinguish identical schemas, on-disk-only flag changes, and incompatible changes. StructPayloadIndex now uses this classifier to selectively preserve or rebuild field indexes: identical schemas return AlreadyBuilt, on-disk-only flips call reuse_or_build_index() to reload persisted indexes, and incompatible changes return IncompatibleSchema. The PR also boxes mmap-backed indexes early to preserve storage references and refactors error handling in lifecycle methods for full-text, geo, and map indexes, while making MmapInvertedIndex.is_on_disk module-visible for classification logic.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • qdrant/qdrant#8967: Introduces the refactored StructPayloadIndex implementation that this PR builds upon by adding schema transition classification and index reuse logic.

Suggested labels

release:1.18.0

Suggested reviewers

  • generall
  • coszio
  • ffuugoo

Poem

🐰 A box wraps each index with care,
Schemas dance through classify's stair,
On-disk flips don't rebuild anymore—
Just reload what was stored before!
Smart reuse hops through the door. 🎉

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: avoid payload-index rebuild' clearly and concisely summarizes the main change: avoiding unnecessary rebuilds when only the on_disk flag changes.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the rationale, implementation details, and testing approach for avoiding payload-index rebuilds.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
lib/segment/src/index/struct_payload_index/mod.rs (1)

348-363: ⚠️ Potential issue | 🟠 Major

Rollback the in-memory schema when save_config() fails.

update_indexed_schema() mutates self.config.indices before persistence. If save_config() errors, later retries in the same process can see the field as already transitioned and return AlreadyBuilt, while the on-disk config is still stale.

Minimal fix
     let index_types: Vec<_> = self
         .field_indexes
         .get(field)
         .map(|indexes| indexes.iter().map(|i| i.get_full_index_type()).collect())
         .unwrap_or_default();
-    self.config.indices.insert(
-        field.clone(),
-        PayloadFieldSchemaWithIndexType::new(new_schema, index_types),
-    );
-    self.save_config()?;
+    let next = PayloadFieldSchemaWithIndexType::new(new_schema, index_types);
+    let prev = self.config.indices.insert(field.clone(), next);
+    if let Err(err) = self.save_config() {
+        match prev {
+            Some(prev) => {
+                self.config.indices.insert(field.clone(), prev);
+            }
+            None => {
+                self.config.indices.remove(field);
+            }
+        }
+        return Err(err);
+    }
     Ok(())
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/segment/src/index/struct_payload_index/mod.rs` around lines 348 - 363,
update_indexed_schema currently mutates self.config.indices before calling
save_config, causing a stale on-disk state if save_config fails; capture the
previous entry (via the insert return or get) for the given field, perform the
insert of PayloadFieldSchemaWithIndexType(new_schema, index_types), call
save_config(), and if save_config() returns Err restore the prior in-memory
value (re-insert the previous value or remove the key if it did not exist) and
propagate the error so callers don’t see a half-applied in-memory state
(referencing update_indexed_schema, self.config.indices,
PayloadFieldSchemaWithIndexType, save_config and the AlreadyBuilt semantics).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@lib/segment/src/index/struct_payload_index/mod.rs`:
- Around line 348-363: update_indexed_schema currently mutates
self.config.indices before calling save_config, causing a stale on-disk state if
save_config fails; capture the previous entry (via the insert return or get) for
the given field, perform the insert of
PayloadFieldSchemaWithIndexType(new_schema, index_types), call save_config(),
and if save_config() returns Err restore the prior in-memory value (re-insert
the previous value or remove the key if it did not exist) and propagate the
error so callers don’t see a half-applied in-memory state (referencing
update_indexed_schema, self.config.indices, PayloadFieldSchemaWithIndexType,
save_config and the AlreadyBuilt semantics).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: b5e960c1-e70e-4b91-a801-2c287db45dca

📥 Commits

Reviewing files that changed from the base of the PR and between 00b68c4 and 5af7163.

📒 Files selected for processing (10)
  • lib/segment/src/index/field_index/full_text_index/immutable_text_index/lifecycle.rs
  • lib/segment/src/index/field_index/full_text_index/inverted_index/mmap_inverted_index/mod.rs
  • lib/segment/src/index/field_index/geo_index/immutable_geo_index/lifecycle.rs
  • lib/segment/src/index/field_index/map_index/immutable_map_index/lifecycle.rs
  • lib/segment/src/index/field_index/mod.rs
  • lib/segment/src/index/field_index/schema_transition.rs
  • lib/segment/src/index/struct_payload_index/build.rs
  • lib/segment/src/index/struct_payload_index/mod.rs
  • lib/segment/src/index/struct_payload_index/payload_index.rs
  • lib/segment/src/index/struct_payload_index/tests.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • lib/segment/src/index/field_index/full_text_index/inverted_index/mmap_inverted_index/mod.rs
  • lib/segment/src/index/field_index/schema_transition.rs

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
lib/segment/src/index/field_index/geo_index/immutable_geo_index/lifecycle.rs (1)

22-25: ⚡ Quick win

Use explicit OperationError::from(err) conversions here.

These new err.into() calls violate the repo rule for explicit conversions in Rust.

As per coding guidelines, "Prefer explicit conversions: use SomeType::from(x) instead of implicit x.into() in new code."

Also applies to: 28-30, 61-63, 131-133

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/segment/src/index/field_index/geo_index/immutable_geo_index/lifecycle.rs`
around lines 22 - 25, Replace implicit conversions using err.into() with
explicit OperationError::from(err) in the Result error branches: where
counts_per_hash is built from index.storage.counts_per_hash.read_whole() (the
match arm returning Err(err) -> Err(err.into())), and the other similar matches
referenced (the match that constructs counts_per_hash, the ones at the other
occurrences around the code and the ones noted at 28-30, 61-63, 131-133).
Concretely, change each Err(err) => return Err(err.into()) to Err(err) => return
Err(OperationError::from(err)) so the code uses explicit OperationError::from
conversions for those read_whole() error paths and any identical patterns in the
same module.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@lib/segment/src/index/field_index/full_text_index/immutable_text_index/lifecycle.rs`:
- Around line 15-20: open_mmap currently boxes/consumes the MmapFullTextIndex
before calling ImmutableInvertedIndex::try_from, so on failure the mmap-backed
index is dropped; fix by delaying consumption: do not call Box::new(index) until
after ImmutableInvertedIndex::try_from succeeds — create the inverted index from
&index.inverted_index while index is still owned (unboxed), handle Err by
returning the OperationError without dropping/consuming the original index, and
only box (Box::new(index)) after you have an Ok(inverted_index) to preserve the
mmap for rollback.

In
`@lib/segment/src/index/field_index/geo_index/immutable_geo_index/lifecycle.rs`:
- Around line 20-25: The error branches in open_mmap currently return an
OperationError while the boxed StoredGeoMapIndex<MmapFile> is still owned,
causing it to be dropped and losing the mmap-backed index; to fix this, ensure
the boxed index is not dropped on any load failure by leaking or converting the
Box to a raw pointer before returning the error (e.g., use Box::into_raw or
std::mem::forget on the Boxed index) so the mmap file remains intact for
in-place swap rollback; apply the same change to all similar Err branches that
call storage.counts_per_hash.read_whole(), storage.offsets.read_whole(),
storage.hashes.read_whole() (the other reported locations) so
StoredGeoMapIndex<MmapFile> survives failures.

In
`@lib/segment/src/index/field_index/map_index/immutable_map_index/lifecycle.rs`:
- Around line 23-25: open_mmap currently takes ownership of UniversalMapIndex<N>
and drops the mmap-backed index if the internal scan fails, preventing rollback;
change open_mmap (and the analogous function around the other occurrence) so the
mmap index is not lost on error: either accept a reference/Box you can return or
change the error branch of OperationResult to carry back the original
UniversalMapIndex<N> (e.g., return Result<Self, (ScanError,
UniversalMapIndex<N>)>), perform the scan without dropping the owned index, and
return the index alongside Err so the caller can reinstall it during rollback.

---

Nitpick comments:
In
`@lib/segment/src/index/field_index/geo_index/immutable_geo_index/lifecycle.rs`:
- Around line 22-25: Replace implicit conversions using err.into() with explicit
OperationError::from(err) in the Result error branches: where counts_per_hash is
built from index.storage.counts_per_hash.read_whole() (the match arm returning
Err(err) -> Err(err.into())), and the other similar matches referenced (the
match that constructs counts_per_hash, the ones at the other occurrences around
the code and the ones noted at 28-30, 61-63, 131-133). Concretely, change each
Err(err) => return Err(err.into()) to Err(err) => return
Err(OperationError::from(err)) so the code uses explicit OperationError::from
conversions for those read_whole() error paths and any identical patterns in the
same module.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: a7d5409c-6fd2-40c7-a2e2-33a9d65bf7ce

📥 Commits

Reviewing files that changed from the base of the PR and between 5af7163 and 7d1f93c.

📒 Files selected for processing (3)
  • lib/segment/src/index/field_index/full_text_index/immutable_text_index/lifecycle.rs
  • lib/segment/src/index/field_index/geo_index/immutable_geo_index/lifecycle.rs
  • lib/segment/src/index/field_index/map_index/immutable_map_index/lifecycle.rs

Comment thread lib/segment/src/index/field_index/geo_index/immutable_geo_index/lifecycle.rs Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lib/segment/src/index/field_index/geo_index/immutable_geo_index/lifecycle.rs (1)

105-105: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove/rework the no-op let _ = index; (it shadows intent, even though it compiles)
let _ = index; at line 105 is a move/reassignment-like line, but cargo check reports no error and the code immediately reuses index to build Self { ..., storage: index, ... }. That means index is likely a Copy/handle type (e.g., reference/Arc-like) rather than a unique owning value—so the line is probably redundant and confusing; if it was meant to influence drop/borrow order, it should be replaced with an explicit, targeted pattern (or removed).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/segment/src/index/field_index/geo_index/immutable_geo_index/lifecycle.rs`
at line 105, The line `let _ = index;` is a redundant no-op that confuses
intent; remove that statement from lifecycle.rs and leave the subsequent usage
that builds `Self { ..., storage: index, ... }` unchanged. If the original
intent was to force a drop or control borrow/drop order for `index`, replace the
no-op with an explicit operation: either wrap the earlier scope to let `index`
go out of scope before reuse, or call `std::mem::drop(index)` (and then obtain a
new value) — otherwise simply delete the `let _ = index;` line so `index` is
used directly when constructing the `Self` instance.
♻️ Duplicate comments (2)
lib/segment/src/index/field_index/map_index/immutable_map_index/lifecycle.rs (1)

24-24: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Mmap-backed index is dropped on scan failure, preventing rollback.

The boxed index (line 24) is dropped if for_each_value_map returns an error and the ? operator on line 67 triggers an early return. This makes the original mmap-backed UniversalMapIndex<N> unavailable for rollback during swap operations, as noted in the previous review.

To preserve the index for rollback, either leak it before returning the error (Box::into_raw / mem::forget) or change the return type to include the index on failure: Result<Self, (OperationError, Box<UniversalMapIndex<N>>)>.

Also applies to: 40-67

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/segment/src/index/field_index/map_index/immutable_map_index/lifecycle.rs`
at line 24, The boxed mmap-backed index is dropped when for_each_value_map
returns an error because the function currently returns early with `?`; change
the function to return Result<Self, (OperationError,
Box<UniversalMapIndex<N>>)>, keep the `let index = Box::new(index);` as is, and
update all error propagation around the for_each_value_map call (and any returns
in the 40-67 region) to map/map_err the OperationError into Err((error, index))
so the caller receives the boxed index for rollback; update the function
signature and any call sites accordingly to accept the new Err payload.
lib/segment/src/index/field_index/geo_index/immutable_geo_index/lifecycle.rs (1)

21-29: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Mmap-backed index is still dropped on all error paths, preventing rollback.

The boxed index is dropped when any error path returns early (? on lines 25, 32, 99, 129, or explicit Err on line 63). This prevents the caller from recovering the original mmap-backed index during swap rollback, as flagged in the previous review.

The current code boxes the index on line 21 but does not prevent it from being dropped on error. To support rollback, the index must survive failures—either by using Box::into_raw / mem::forget before returning errors, or by changing the signature to return Result<Self, (OperationError, Box<StoredGeoMapIndex<MmapFile>>)>.

Also applies to: 32-32, 62-64, 98-99, 129-129

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/segment/src/index/field_index/geo_index/immutable_geo_index/lifecycle.rs`
around lines 21 - 29, The boxed mmap-backed index (created as Box::new(index))
is being dropped on early error returns which prevents rollback; update the code
so the boxed index is preserved on all error paths: either convert the Box to a
raw pointer (Box::into_raw or std::mem::forget) before any `?`/Err returns so it
is not dropped, or change the function's error type to return Result<Self,
(OperationError, Box<StoredGeoMapIndex<MmapFile>>)>, and adjust every early
return (the places that currently `?` on storage.counts_per_hash.read_whole()
and other fallible calls and explicit Err returns) to return the error paired
with the boxed index instead of letting the Box be dropped. Ensure every path
that currently returns an error returns the boxed index unchanged so the caller
can perform rollback.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In
`@lib/segment/src/index/field_index/geo_index/immutable_geo_index/lifecycle.rs`:
- Line 105: The line `let _ = index;` is a redundant no-op that confuses intent;
remove that statement from lifecycle.rs and leave the subsequent usage that
builds `Self { ..., storage: index, ... }` unchanged. If the original intent was
to force a drop or control borrow/drop order for `index`, replace the no-op with
an explicit operation: either wrap the earlier scope to let `index` go out of
scope before reuse, or call `std::mem::drop(index)` (and then obtain a new
value) — otherwise simply delete the `let _ = index;` line so `index` is used
directly when constructing the `Self` instance.

---

Duplicate comments:
In
`@lib/segment/src/index/field_index/geo_index/immutable_geo_index/lifecycle.rs`:
- Around line 21-29: The boxed mmap-backed index (created as Box::new(index)) is
being dropped on early error returns which prevents rollback; update the code so
the boxed index is preserved on all error paths: either convert the Box to a raw
pointer (Box::into_raw or std::mem::forget) before any `?`/Err returns so it is
not dropped, or change the function's error type to return Result<Self,
(OperationError, Box<StoredGeoMapIndex<MmapFile>>)>, and adjust every early
return (the places that currently `?` on storage.counts_per_hash.read_whole()
and other fallible calls and explicit Err returns) to return the error paired
with the boxed index instead of letting the Box be dropped. Ensure every path
that currently returns an error returns the boxed index unchanged so the caller
can perform rollback.

In
`@lib/segment/src/index/field_index/map_index/immutable_map_index/lifecycle.rs`:
- Line 24: The boxed mmap-backed index is dropped when for_each_value_map
returns an error because the function currently returns early with `?`; change
the function to return Result<Self, (OperationError,
Box<UniversalMapIndex<N>>)>, keep the `let index = Box::new(index);` as is, and
update all error propagation around the for_each_value_map call (and any returns
in the 40-67 region) to map/map_err the OperationError into Err((error, index))
so the caller receives the boxed index for rollback; update the function
signature and any call sites accordingly to accept the new Err payload.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 8c278820-bbba-4a14-8529-091d422a1ce5

📥 Commits

Reviewing files that changed from the base of the PR and between 7d1f93c and b42b21b.

📒 Files selected for processing (3)
  • lib/segment/src/index/field_index/full_text_index/immutable_text_index/lifecycle.rs
  • lib/segment/src/index/field_index/geo_index/immutable_geo_index/lifecycle.rs
  • lib/segment/src/index/field_index/map_index/immutable_map_index/lifecycle.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • lib/segment/src/index/field_index/full_text_index/immutable_text_index/lifecycle.rs

@dancixx dancixx marked this pull request as ready for review June 2, 2026 10:43
@dancixx dancixx requested a review from coszio June 2, 2026 12:37
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lib/segment/src/index/struct_payload_index/build.rs (1)

57-85: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clarify doc comment about on_disk vs the hardcoded false: the false passed to IndexSelector::new_index / new_null_index is create_if_missing, while the on_disk representation is honored by selector(payload_schema) via payload_schema.is_on_disk() feeding IndexSelectorMmap { is_on_disk }. Update the comment wording to avoid implying that the hardcoded boolean controls on_disk.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/segment/src/index/struct_payload_index/build.rs` around lines 57 - 85,
Update the doc comment for reuse_or_build_index to make clear that the hardcoded
false passed to selector.new_index and selector.new_null_index is the
create_if_missing flag and does not control the on_disk representation; state
that on_disk is determined when constructing the selector via
selector(payload_schema) (which uses payload_schema.is_on_disk() to configure
IndexSelectorMmap { is_on_disk }), so rephrase the comment to avoid implying the
boolean toggles on_disk and instead explain that selector honors
payload_schema.is_on_disk().
♻️ Duplicate comments (1)
lib/segment/src/index/field_index/geo_index/immutable_geo_index/lifecycle.rs (1)

20-32: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Error branches still drop the boxed mmap index, removing the swap-rollback recovery path.

After boxing index at Line 21, every fallible step (read_whole()? at Lines 25 and 32, read_batch(...)? at Line 61, collected? at Line 96, and encode_max_precision(...)? at Line 126) propagates only the OperationError and drops the owned StoredGeoMapIndex<MmapFile>. During an in-place Mmap → Immutable swap, a load failure here therefore destroys the original mmap-backed index and prevents restoring it on rollback.

Also applies to: 61-61, 96-96, 126-126

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/segment/src/index/field_index/geo_index/immutable_geo_index/lifecycle.rs`
around lines 20 - 32, In open_mmap, do not box/move the
StoredGeoMapIndex<MmapFile> into heap (the local variable index) before
performing fallible reads; currently boxing at the start (variable index) causes
the mmap-backed index to be dropped on any early error (e.g.,
counts_per_hash.read_whole(), storage.points_map.read_whole(), read_batch(...),
iterator.collect(), encode_max_precision(...)), preventing swap-rollback
recovery. Fix by deferring the Box::new(...) (or move into the final
ImmutableGeoIndex) until after all
read_whole/read_batch/collect/encode_max_precision calls succeed — use
references/borrows to access index.storage for these operations or perform the
reads into temporaries first, then box the StoredGeoMapIndex only once all
fallible operations have completed successfully.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@lib/segment/src/index/struct_payload_index/build.rs`:
- Around line 57-85: Update the doc comment for reuse_or_build_index to make
clear that the hardcoded false passed to selector.new_index and
selector.new_null_index is the create_if_missing flag and does not control the
on_disk representation; state that on_disk is determined when constructing the
selector via selector(payload_schema) (which uses payload_schema.is_on_disk() to
configure IndexSelectorMmap { is_on_disk }), so rephrase the comment to avoid
implying the boolean toggles on_disk and instead explain that selector honors
payload_schema.is_on_disk().

---

Duplicate comments:
In
`@lib/segment/src/index/field_index/geo_index/immutable_geo_index/lifecycle.rs`:
- Around line 20-32: In open_mmap, do not box/move the
StoredGeoMapIndex<MmapFile> into heap (the local variable index) before
performing fallible reads; currently boxing at the start (variable index) causes
the mmap-backed index to be dropped on any early error (e.g.,
counts_per_hash.read_whole(), storage.points_map.read_whole(), read_batch(...),
iterator.collect(), encode_max_precision(...)), preventing swap-rollback
recovery. Fix by deferring the Box::new(...) (or move into the final
ImmutableGeoIndex) until after all
read_whole/read_batch/collect/encode_max_precision calls succeed — use
references/borrows to access index.storage for these operations or perform the
reads into temporaries first, then box the StoredGeoMapIndex only once all
fallible operations have completed successfully.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 188eaae1-622d-4db9-846d-d1251179ffcf

📥 Commits

Reviewing files that changed from the base of the PR and between b42b21b and 5f2ca3d.

📒 Files selected for processing (4)
  • lib/segment/src/index/field_index/geo_index/immutable_geo_index/lifecycle.rs
  • lib/segment/src/index/field_index/schema_transition.rs
  • lib/segment/src/index/struct_payload_index/build.rs
  • lib/segment/src/index/struct_payload_index/payload_index.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • lib/segment/src/index/struct_payload_index/payload_index.rs
  • lib/segment/src/index/field_index/schema_transition.rs

// optimizer applies `on_disk` when converting to non-appendable).
SchemaTransition::OnlyOnDiskFlipped { .. } => {
if matches!(self.storage_type, StorageType::GridstoreAppendable) {
self.update_indexed_schema(field, new_payload_schema.clone())?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to update the schema during drop_if_incompatible, shouldn't it be during apply?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed in chat: It may be more intuitive to do during apply, but it also requires more changes than just doing what this PR does.


Appendable Gridstore ignores on_disk at the storage layer and is unsafe to reload while mutable

Gridstore-backed indexes behave the same regardless of on-disk parameter, yes. But it should be safe to reload if we are holding an upgradable read lock.

Making the decision here that mutable indexes are not affected by on-disk parameter feels like it might shoot us in the foot if the concrete implementations change.

wdyt about separating this edge case from the main optimization of this PR?

@dancixx dancixx force-pushed the feat/avoid-payload-index-rebuild branch from 5f2ca3d to 64fc8c0 Compare June 3, 2026 08:05
@dancixx dancixx merged commit 2f57f97 into dev Jun 3, 2026
16 checks passed
@dancixx dancixx deleted the feat/avoid-payload-index-rebuild branch June 3, 2026 12:44
timvisee pushed a commit that referenced this pull request Jun 3, 2026
* feat: add swap on_disk methods

* fix: linter

* feat: add swap in disk for indexes

* fix: linter

* fix: std::fs::read -> fs_err::read

* feat: add on_disk swap bool & null indexes

* refactor: use macro instead boilerplate

* feat: add better error propagation

* fix: review comments -> remove try_swap.. and mut lock

* fix: linter

* chore: remove try_open_mmap

* fix: match -> ?

* review nits

* fix: seperate edge case for appendable segment

---------

Co-authored-by: Luis Cossío <[email protected]>
@timvisee timvisee mentioned this pull request Jun 3, 2026
@coderabbitai coderabbitai Bot mentioned this pull request Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants