Skip to content

fix: propagate flush cancellation through payload index flushers#8651

Merged
timvisee merged 2 commits into
devfrom
fix/propagate-flush-cancelled-through-payload-index
Apr 13, 2026
Merged

fix: propagate flush cancellation through payload index flushers#8651
timvisee merged 2 commits into
devfrom
fix/propagate-flush-cancelled-through-payload-index

Conversation

@qdrant-cloud-bot
Copy link
Copy Markdown
Contributor

Summary

  • Fixes a bug where GridstoreError::FlushCancelled (a benign condition during payload index deletion) was converted into a fatal ServiceError instead of being propagated as OperationError::Cancelled
  • The Cancelled variant was lost at two intermediate layers that unconditionally wrapped errors with service_error(format!(...)): all 4 mutable index flushers (map, text, numeric, geo) and the struct_payload_index flusher
  • After this fix, the cancellation correctly reaches the handler in entry.rs which gracefully skips the flush with a debug log

Root cause

When a payload index is deleted, the gridstore backing it is dropped. A concurrent background flush detects this (weak Arc refs fail to upgrade) and returns GridstoreError::FlushCancelled. There is already:

  1. A From<GridstoreError> for OperationError impl that maps FlushCancelledOperationError::Cancelled
  2. A handler in entry.rs that gracefully handles Cancelled by skipping the flush

But the Cancelled variant was destroyed before reaching the handler because:

  1. All 4 mutable index flushers used OperationError::service_error(format!(...)) to convert the gridstore error, instead of using the Into<OperationError> conversion
  2. struct_payload_index flusher also unconditionally wrapped sub-flusher errors as service_error

This caused the error: "last background flush failed: ... Failed to flush mutable map index gridstore: Flush was cancelled"

Fix

  • In the 4 mutable index flushers: convert GridstoreError via Into<OperationError> first, then only re-wrap non-Cancelled variants
  • In struct_payload_index flusher: propagate Cancelled errors as-is, matching the pattern already used for vector storage, quantization, and id tracker flushers in entry.rs

Test plan

  • cargo check -p segment passes
  • CI passes

Made with Cursor

When a payload index is deleted, the gridstore backing it is dropped.
A concurrent background flush detects this (weak Arc refs fail to upgrade)
and returns GridstoreError::FlushCancelled. The From<GridstoreError> impl
correctly maps this to OperationError::Cancelled, and entry.rs gracefully
handles Cancelled by skipping the flush.

However, two intermediate layers unconditionally wrapped all errors as
ServiceError, destroying the Cancelled variant before it reached the
handler in entry.rs:

1. All 4 mutable index flushers (map, text, numeric, geo) used
   OperationError::service_error(format!(...)) instead of converting
   via Into<OperationError> first.

2. struct_payload_index flusher also unconditionally wrapped sub-flusher
   errors as service_error.

This caused the benign FlushCancelled to surface as a fatal
"last background flush failed" ServiceError.

Fix both layers to propagate Cancelled errors as-is, matching the
pattern already used for vector storage, quantization, and id tracker
flushers in entry.rs.

Made-with: Cursor
coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai Bot Apr 13, 2026
@timvisee timvisee merged commit c6b7446 into dev Apr 13, 2026
15 checks passed
@timvisee timvisee deleted the fix/propagate-flush-cancelled-through-payload-index branch April 13, 2026 10:37
timvisee pushed a commit that referenced this pull request May 8, 2026
* fix: propagate flush cancellation through payload index flushers

When a payload index is deleted, the gridstore backing it is dropped.
A concurrent background flush detects this (weak Arc refs fail to upgrade)
and returns GridstoreError::FlushCancelled. The From<GridstoreError> impl
correctly maps this to OperationError::Cancelled, and entry.rs gracefully
handles Cancelled by skipping the flush.

However, two intermediate layers unconditionally wrapped all errors as
ServiceError, destroying the Cancelled variant before it reached the
handler in entry.rs:

1. All 4 mutable index flushers (map, text, numeric, geo) used
   OperationError::service_error(format!(...)) instead of converting
   via Into<OperationError> first.

2. struct_payload_index flusher also unconditionally wrapped sub-flusher
   errors as service_error.

This caused the benign FlushCancelled to surface as a fatal
"last background flush failed" ServiceError.

Fix both layers to propagate Cancelled errors as-is, matching the
pattern already used for vector storage, quantization, and id tracker
flushers in entry.rs.

Made-with: Cursor

* simplify error conversion

---------

Co-authored-by: Cursor Agent <[email protected]>
Co-authored-by: generall <[email protected]>
@timvisee timvisee mentioned this pull request May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants