Plan: in-place morph between Immutable and Mmap payload index variants#9114
Plan: in-place morph between Immutable and Mmap payload index variants#9114qdrant-cloud-bot wants to merge 1 commit into
Conversation
Proposal for avoiding a full payload index rebuild when only the `on_disk` flag changes. The on-disk format is identical between the two variants — only the in-memory wrapper differs — so an in-place swap is safe and orders of magnitude cheaper than the current drop + rebuild. This file lives under `docs/plans/` which is gitignored; it is added here for review only and will be removed before merging the implementation PR. Co-authored-by: Cursor <[email protected]>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
docs/plans/payload-index-on-disk-morph.md (3)
54-60: 💤 Low valueClarify the cost-benefit tradeoff for
Mmap → Immutablemorphing.While the document correctly states this is cheaper than a full rebuild, the "one scan of the existing mmap files to populate derived structures" (line 57) could still be expensive for large indices. The claim of "Orders of magnitude cheaper" (line 60) depends on:
- Size of the index vs size of payload storage
- Complexity of derived structure construction vs full rebuild
Consider adding:
- Rough quantitative guidance on when this optimization provides meaningful benefit (e.g., index size relative to payload storage)
- Expected wall-clock time ranges for typical index sizes
- Threshold guidance for when the scan cost might still be prohibitive
This will help implementers decide whether to add heuristics that skip the morph and fall through to rebuild for certain size ranges.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/plans/payload-index-on-disk-morph.md` around lines 54 - 60, Clarify the cost-benefit tradeoff for the "Mmap → Immutable" morph by expanding the text around the Mmap → Immutable section to include quantitative guidance: state expected wall-clock ranges (e.g., seconds/minutes/hours) for small/medium/large index sizes, describe cost factors (index file size vs payload storage size and derived-structure complexity), and add a recommended threshold heuristic (e.g., index size or scan-time estimate beyond which to prefer full rebuild) and mention any warm-up options like madvise(WILLNEED)/populate() and the ImmutableX::open_mmap(inner) scan step so implementers can use these numbers to decide whether to morph or rebuild.
223-223: ⚡ Quick winClarify crash-recovery analysis for mid-morph crashes.
Line 223 states: "morph never modifies index files, only in-memory wrappers, and
save_config()is the last step. On restart, the stored config is the pre-morph config and the files match it — fully consistent."However, the flow in lines 166-168 shows:
if self.try_morph_on_disk(field, new_on_disk)? { self.update_config_only(field, &payload_schema)?; return Ok(()); }If
try_morph_on_disksucceeds (in-memory structures are now morphed) but the system crashes before or duringupdate_config_only→save_config(), the on-restart state is:
- Index files: unchanged (correct)
- Persisted config: old schema (pre-morph)
- In-memory structures: don't exist (restart loads from disk)
On restart, the segment loads with the old config and builds in-memory structures according to that config, which is correct. So the analysis is sound, but the explanation could be clearer:
Suggested rewording:
"Morph never modifies index files, only in-memory wrappers.save_config()is the commit point. If we crash after morph succeeds but beforesave_config()persists the new schema, on restart the old schema is loaded and in-memory structures are rebuilt according to the old schema from the unchanged files — fully consistent. The morph will be retried on next WAL replay of theCreateIndexoperation."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/plans/payload-index-on-disk-morph.md` at line 223, Reword the crash-recovery paragraph to explicitly state that try_morph_on_disk only changes in-memory wrappers, that update_config_only (which calls save_config()) is the durable commit point, and that if a crash occurs after try_morph_on_disk succeeds but before update_config_only/save_config persists the new schema, on restart the system will load the old persisted config and rebuild in-memory structures from the unchanged index files; also add that the CreateIndex WAL entry will cause the morph to be retried during replay. Include references to try_morph_on_disk, update_config_only, save_config, and the CreateIndex WAL replay to make the flow clear.
84-98: 💤 Low valueClarify pseudo-code for
morph_on_diskimplementation.The sketch has a few gaps that reduce clarity for design review:
Line 86: "construct a tiny empty variant or use take()" is underspecified. Rust enums don't have default empty variants. Options include:
- Using
Option<Self>andOption::take()- Defining a minimal variant like
Self::Placeholder(but this pollutes the enum)- Using a temporary valid variant (but which one?)
Lines 88-94: The match arms cover
Immutable ↔ Mmaptransitions and thealready_matchingcase, but theMutable(Gridstore) case mentioned in the comment (lines 92-94) isn't shown explicitly. Should be:(Self::Mutable(m), _) => { *self = Self::Mutable(m); return Ok(false); }Line 101: Failure handling says "restore the original variant" but the
currentvalue was consumed bystd::mem::replace. The parenthetical suggests wrapping the operation in a Result-returning closure, but this isn't reflected in the sketch.Consider refining the pseudo-code to show a complete, compilable pattern (even if simplified) so reviewers can validate the approach end-to-end.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/plans/payload-index-on-disk-morph.md` around lines 84 - 98, The pseudo-code for NumericIndexInner::morph_on_disk is incomplete: replace the placeholder approach with a safe take pattern (e.g., wrap self in Option<Self> and use Option::take or std::mem::take) so you can move out the current variant without inventing a dummy variant; add an explicit arm for the mutable case (match (Self::Mutable(m), _) => { *self = Self::Mutable(m); return Ok(false); }) to restore and report not-applicable; and ensure error-prone conversions (like ImmutableNumericIndex::open_mmap and imm.into_inner_mmap()) are performed inside a Result-returning block so on Err you restore the original variant into *self before returning the error, thereby avoiding lost state after std::mem::replace.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/plans/payload-index-on-disk-morph.md`:
- Line 182: Implement a clear partial-failure rollback in try_morph_on_disk: as
you iterate self.field_indexes[field] and call FieldIndex::morph_on_disk, record
successful indices (e.g., morphed_indices), and if any call returns Ok(false) or
Err(e) perform rollback by iterating morphed_indices in reverse and calling
morph_on_disk with the opposite direction (!new_on_disk) for each; if any
reverse morph fails, propagate that error (or at minimum log and return Err) so
the caller knows rollback didn't fully succeed; ensure you reference
try_morph_on_disk, FieldIndex, morph_on_disk and morphed_indices in the
implementation and return Ok(false) when a non-applicable variant is encountered
after successful rollbacks.
- Around line 148-184: Add a "Concurrency & Atomicity" subsection describing the
locking and failure semantics around morphing: state the current synchronization
points (e.g., whether StructPayloadIndex and SegmentEntry already hold any locks
when set_indexed() runs), document that try_morph_on_disk() iterates
field_indexes[] and calls morph_on_disk() and that update_config_only() calls
save_config() after recomputing types from get_full_index_type(), then specify
the desired atomicity (treat save_config() as the commit point or roll back
in-memory morph on save failure) and required locking (e.g., acquire a write
lock for morph operations on a field while readers use a read lock, serialize
concurrent set_indexed() calls per field/segment) and the expected behavior for
concurrent readers during Mmap→Immutable scans (block, snapshot, or allow stale
reads) so reviewers know whether existing locks suffice or new synchronization
is needed.
---
Nitpick comments:
In `@docs/plans/payload-index-on-disk-morph.md`:
- Around line 54-60: Clarify the cost-benefit tradeoff for the "Mmap →
Immutable" morph by expanding the text around the Mmap → Immutable section to
include quantitative guidance: state expected wall-clock ranges (e.g.,
seconds/minutes/hours) for small/medium/large index sizes, describe cost factors
(index file size vs payload storage size and derived-structure complexity), and
add a recommended threshold heuristic (e.g., index size or scan-time estimate
beyond which to prefer full rebuild) and mention any warm-up options like
madvise(WILLNEED)/populate() and the ImmutableX::open_mmap(inner) scan step so
implementers can use these numbers to decide whether to morph or rebuild.
- Line 223: Reword the crash-recovery paragraph to explicitly state that
try_morph_on_disk only changes in-memory wrappers, that update_config_only
(which calls save_config()) is the durable commit point, and that if a crash
occurs after try_morph_on_disk succeeds but before
update_config_only/save_config persists the new schema, on restart the system
will load the old persisted config and rebuild in-memory structures from the
unchanged index files; also add that the CreateIndex WAL entry will cause the
morph to be retried during replay. Include references to try_morph_on_disk,
update_config_only, save_config, and the CreateIndex WAL replay to make the flow
clear.
- Around line 84-98: The pseudo-code for NumericIndexInner::morph_on_disk is
incomplete: replace the placeholder approach with a safe take pattern (e.g.,
wrap self in Option<Self> and use Option::take or std::mem::take) so you can
move out the current variant without inventing a dummy variant; add an explicit
arm for the mutable case (match (Self::Mutable(m), _) => { *self =
Self::Mutable(m); return Ok(false); }) to restore and report not-applicable; and
ensure error-prone conversions (like ImmutableNumericIndex::open_mmap and
imm.into_inner_mmap()) are performed inside a Result-returning block so on Err
you restore the original variant into *self before returning the error, thereby
avoiding lost state after std::mem::replace.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 822d2ab4-8118-4e23-8871-e9e22489df27
📒 Files selected for processing (1)
docs/plans/payload-index-on-disk-morph.md
|
|
||
| `lib/segment/src/index/struct_payload_index/payload_index.rs:61`: | ||
|
|
||
| ```rust | ||
| fn set_indexed(&mut self, field, payload_schema, hw_counter) -> OperationResult<()> { | ||
| let payload_schema = payload_schema.into(); | ||
|
|
||
| if let Some(prev) = self.config.indices.get(field).cloned() { | ||
| match classify(&prev.schema, &payload_schema) { | ||
| SchemaTransition::Identical => return Ok(()), | ||
| SchemaTransition::OnlyOnDiskFlipped { new_on_disk } => { | ||
| // Gridstore segments don't honor on_disk at storage layer. | ||
| // Update config so the next non-appendable segment built from | ||
| // this one (via optimizer) picks it up; nothing to morph now. | ||
| if matches!(self.storage_type, StorageType::GridstoreAppendable) { | ||
| self.update_config_only(field, &payload_schema)?; | ||
| return Ok(()); | ||
| } | ||
| if self.try_morph_on_disk(field, new_on_disk)? { | ||
| self.update_config_only(field, &payload_schema)?; | ||
| return Ok(()); | ||
| } | ||
| // morph failed — fall through to drop+rebuild | ||
| } | ||
| SchemaTransition::Incompatible => {} | ||
| } | ||
| } | ||
|
|
||
| // Existing path: drop_index_if_incompatible -> build_index -> apply_index | ||
| self.drop_index_if_incompatible(field, &payload_schema)?; | ||
| // ... unchanged | ||
| } | ||
| ``` | ||
|
|
||
| `try_morph_on_disk(field, new_on_disk)` walks `self.field_indexes[field]` and calls `morph_on_disk` on each entry. If any returns `Ok(false)` because the variant doesn't support morphing, restore the morphed ones and fall through to the legacy path. | ||
|
|
||
| `update_config_only` updates `self.config.indices[field]` to the new schema and re-computes `types` from the (now morphed) `FieldIndex`es' `get_full_index_type()` so the persisted `StorageType::Mmap { is_on_disk }` flag is up to date, then calls `save_config()`. |
There was a problem hiding this comment.
Add discussion of concurrency and atomicity guarantees.
The plan doesn't address critical concurrency and atomicity concerns:
Atomicity:
- Line 167:
update_config_onlyis called after morph succeeds, which callssave_config()(line 184). - If
save_config()fails after in-memory morph succeeds, the in-memory state is morphed but the persisted config still has the old schema. - On restart, the segment loads with the old config but the in-memory structures don't match.
- Mitigation: Should
save_config()be the commit point? If it fails, should the morph be rolled back?
Concurrency:
- What if multiple threads call
set_indexedconcurrently on the same or different fields? - What if queries are reading the index during the morph operation (especially
Mmap → Immutablewhich scans)? - Are there existing locks at the
StructPayloadIndexlevel that provide mutual exclusion? - Should the morph be behind a write lock while queries use read locks?
Add a "Concurrency & Atomicity" subsection to the risks section (or a new section) addressing:
- Existing locking/synchronization guarantees at the
StructPayloadIndexandSegmentEntrylevels - Whether morph operations are naturally serialized by existing locks or need new synchronization
- The atomicity contract between in-memory morph and
save_config() - Impact on concurrent readers during morph (especially the scan-heavy
Mmap → Immutable)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/plans/payload-index-on-disk-morph.md` around lines 148 - 184, Add a
"Concurrency & Atomicity" subsection describing the locking and failure
semantics around morphing: state the current synchronization points (e.g.,
whether StructPayloadIndex and SegmentEntry already hold any locks when
set_indexed() runs), document that try_morph_on_disk() iterates field_indexes[]
and calls morph_on_disk() and that update_config_only() calls save_config()
after recomputing types from get_full_index_type(), then specify the desired
atomicity (treat save_config() as the commit point or roll back in-memory morph
on save failure) and required locking (e.g., acquire a write lock for morph
operations on a field while readers use a read lock, serialize concurrent
set_indexed() calls per field/segment) and the expected behavior for concurrent
readers during Mmap→Immutable scans (block, snapshot, or allow stale reads) so
reviewers know whether existing locks suffice or new synchronization is needed.
| } | ||
| ``` | ||
|
|
||
| `try_morph_on_disk(field, new_on_disk)` walks `self.field_indexes[field]` and calls `morph_on_disk` on each entry. If any returns `Ok(false)` because the variant doesn't support morphing, restore the morphed ones and fall through to the legacy path. |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Specify the partial-failure rollback mechanism for try_morph_on_disk.
Line 182 describes try_morph_on_disk as walking multiple FieldIndex entries and morphing each, with a note that "if any returns Ok(false)... restore the morphed ones" (implied by line 182's description and line 222's risk mitigation). However, the rollback mechanism isn't specified:
- How do you track which entries were successfully morphed?
- Do you morph them back by calling
morph_on_diskwith the opposite direction? If so, what if the reverse morph fails? - Should the rollback happen in reverse order of the forward morph?
Consider adding a subsection or note clarifying:
fn try_morph_on_disk(&mut self, field: &str, new_on_disk: bool) -> OperationResult<bool> {
let mut morphed_indices = Vec::new();
for (idx, field_index) in self.field_indexes[field].iter_mut().enumerate() {
match field_index.morph_on_disk(new_on_disk) {
Ok(true) => morphed_indices.push(idx),
Ok(false) => {
// Not applicable; rollback and return false
for idx in morphed_indices.iter().rev() {
self.field_indexes[field][*idx].morph_on_disk(!new_on_disk)?;
}
return Ok(false);
}
Err(e) => {
// Morph failed; rollback and propagate error
for idx in morphed_indices.iter().rev() {
self.field_indexes[field][*idx].morph_on_disk(!new_on_disk)?;
}
return Err(e);
}
}
}
Ok(true)
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/plans/payload-index-on-disk-morph.md` at line 182, Implement a clear
partial-failure rollback in try_morph_on_disk: as you iterate
self.field_indexes[field] and call FieldIndex::morph_on_disk, record successful
indices (e.g., morphed_indices), and if any call returns Ok(false) or Err(e)
perform rollback by iterating morphed_indices in reverse and calling
morph_on_disk with the opposite direction (!new_on_disk) for each; if any
reverse morph fails, propagate that error (or at minimum log and return Err) so
the caller knows rollback didn't fully succeed; ensure you reference
try_morph_on_disk, FieldIndex, morph_on_disk and morphed_indices in the
implementation and return Ok(false) when a non-applicable variant is encountered
after successful rollbacks.
Summary
Proposal-only PR. Adds a planning document under
docs/plans/describing how to skip the full payload-index rebuild when only theon_diskflag changes.PayloadFieldSchema(including juston_disk) hitsdrop_index_if_incompatible→wipe()→ full rebuild from payload storage.on_disk: false(Immutablewrapper, derived in-RAM structures) andon_disk: true(Mmap/Storagewrapper, no derived structures). The existinglow_memory_modepath already swaps between them at load time without rebuilding.Immutable ⇄ Mmapin-place morph and aSchemaTransitionclassifier soset_indexedtakes the fast path when onlyon_diskdiffers.See
docs/plans/payload-index-on-disk-morph.mdfor the full plan: morph semantics, per-family implementation outline, classifier shape, wiring, phased rollout (numeric → map → geo → text), risks, and test plan.Notes
docs/plans/is gitignored — the file was force-added. Will be removed before merging the eventual implementation PR(s).Test plan
Made with Cursor