Skip to content

Make resharding operations (shard holder) idempotent#8789

Merged
timvisee merged 11 commits into
devfrom
resharding-shard-holder-idempotent
Apr 30, 2026
Merged

Make resharding operations (shard holder) idempotent#8789
timvisee merged 11 commits into
devfrom
resharding-shard-holder-idempotent

Conversation

@timvisee
Copy link
Copy Markdown
Member

@timvisee timvisee commented Apr 24, 2026

Fixes #8709
Extends #8522

In all resharding operations, make each individual step idempotent. If a resharding operation has been partially applied, we fall through applied steps and still apply the remaining steps.

This fixes resharding on consensus WAL replay. We've seen resharding consensus state inconsistencies in chaos testing. I have a strong signal these are happening because of broken consensus WAL replay.

This also prevents creating too many shards, losing a shard, or generally breaking the shard holder when (re)applying resharding consensus operations. Now this works with exact shard IDs (idempotent) rather than adding or subtracting (not idempotent) a shard.

I'd like to extensively test this on chaos testing before we include this in a new release. Resharding is very complicated, and it's hard to fully comprehend all changes and understand all the possible side effects.

I recommend to review per commit.

Tasks

  • Make sure custom sharding is also idempotent in shard holder

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you formatted your code locally using cargo +nightly fmt --all command prior to submission?
  3. Have you checked your code using cargo clippy --workspace --all-features command?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

timvisee and others added 7 commits April 24, 2026 16:50
Why: consensus entries may be re-applied after a crash (partial state
on disk) or during raft recovery. The unchecked state-transition
helpers used `debug_assert!` to require a specific starting state, so
a replay would panic in debug or silently overwrite in release.

How to apply: use write_optional so the state file is only touched
when the in-memory state needs to change. This also avoids unnecessary
fsyncs when a replay is a no-op.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Why: on replay, `create_replica_set` would call `create_shard_dir`,
which removes and recreates the existing shard directory -- wiping
any shard contents that had already been migrated or written since
the first apply.

How to apply: check `contains_shard(shard_id)` before creating the
replica set and pass `None` to `start_resharding_unchecked` when a
replica set with the target shard id is already present. Also
relaxes `check_start_resharding` to return `Ok` (instead of a
swallowed `bad_request`) when a matching resharding state is
persisted, so the caller can fall through each idempotent step.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Why: the start/finish/abort paths used ++/-- on the persisted shard
number with a `debug_assert` pinning the expected starting value. On
replay (e.g. after a crash between the shard holder mutation and the
config save) this either panics or produces a wrong count.

Since resharding always targets the last shard (auto sharding assigns
contiguous ids from zero), the target count is a pure function of the
shard id: `shard_id + 1` for start up, `shard_id` for finish down and
abort up. Set it directly and skip the save when it already matches,
so replay converges to the same value without touching the file.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Why: the early return on `resharding_state.is_none()` assumed that a
missing state means the operation was fully applied. But a crash can
leave the state cleared while the shard drop, key mapping removal, or
shard count update are still pending. Short-circuiting then skips the
reconciling work those replays need to do.

How to apply: drop the early return so every step runs; each step is
already individually idempotent (check_*, drop_and_remove_shard,
remove_shard_from_key_mapping, the set-based shard count update).
The abort_resharding down-invalidation path now reads nodes from the
router regardless of its variant, so a replay over an already-rolled-
back ring doesn't trip the removed debug_assert!s.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Why: `write_optional` returns `Some` whenever the shard key is
present, even if the shard id we're removing is already gone. That
still triggers a JSON rewrite and a cache notification on every
replay of a finished finish/abort.

How to apply: check that the shard id is actually in the set before
returning `Some`. If the set is missing or already doesn't contain
the id, return `None` so the on-disk file and in-memory data are
left untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Covers the paths that must not error or panic on replay:
- `check_start_resharding` when matching state is already present
- `start_resharding_unchecked` preserves matching state verbatim
- `finish_resharding_unchecked` when state is already cleared

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Replace a manual `match` on `Option` with `as_ref().map(...)` to
satisfy clippy::manual_map.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Copy link
Copy Markdown
Contributor

@tellet-q tellet-q left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@timvisee timvisee marked this pull request as ready for review April 29, 2026 13:23
@qdrant qdrant deleted a comment from coderabbitai Bot Apr 29, 2026
@timvisee
Copy link
Copy Markdown
Member Author

timvisee commented Apr 29, 2026

This now includes the test from #8839

coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai Bot Apr 29, 2026
@coderabbitai

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

Copy link
Copy Markdown
Contributor

@ffuugoo ffuugoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems Ok.

Comment thread lib/collection/src/shards/shard_holder/resharding.rs Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
lib/collection/src/shards/shard_holder/resharding.rs (1)

997-1011: ⚡ Quick win

Make this replay test seed the ring into resharding mode too.

A real replay after a successful start_resharding has both persisted ReshardState and an already-resharding router. This test seeds only the state, so it would still pass even if a second ring.start_resharding(...) were not replay-safe.

Suggested test adjustment
         // Pre-seed matching state to simulate a replay after a successful
         // start had already persisted the resharding state.
+        holder
+            .rings
+            .get_mut(&key.shard_key)
+            .unwrap()
+            .start_resharding(key.shard_id, key.direction);
         holder
             .resharding_state
             .write(|state| {
                 *state = Some(expected_state.clone());
             })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/collection/src/shards/shard_holder/resharding.rs` around lines 997 -
1011, The test currently seeds only holder.resharding_state with expected_state
but doesn't mark the router/ring as already in resharding mode, so replay safety
of ring.start_resharding isn't exercised; after writing the resharding_state
(the block that sets *state = Some(expected_state.clone())), also put the ring
into resharding mode to mirror a real replay — e.g. invoke the ring's resharding
entrypoint (call the same method used in normal start, such as
holder.ring.start_resharding(...) or the ring's equivalent API) with arguments
that match the persisted state so the ring is already marked as resharding
before calling holder.start_resharding_unchecked(key.clone(), None). This
ensures the test validates idempotence when both persisted ReshardState and an
already-resharding router exist.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@lib/collection/src/shards/shard_holder/resharding.rs`:
- Around line 997-1011: The test currently seeds only holder.resharding_state
with expected_state but doesn't mark the router/ring as already in resharding
mode, so replay safety of ring.start_resharding isn't exercised; after writing
the resharding_state (the block that sets *state =
Some(expected_state.clone())), also put the ring into resharding mode to mirror
a real replay — e.g. invoke the ring's resharding entrypoint (call the same
method used in normal start, such as holder.ring.start_resharding(...) or the
ring's equivalent API) with arguments that match the persisted state so the ring
is already marked as resharding before calling
holder.start_resharding_unchecked(key.clone(), None). This ensures the test
validates idempotence when both persisted ReshardState and an already-resharding
router exist.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ebe56e04-22dc-4c90-9f22-168aebb8cea7

📥 Commits

Reviewing files that changed from the base of the PR and between a52e6e5 and ecae489.

📒 Files selected for processing (1)
  • lib/collection/src/shards/shard_holder/resharding.rs

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
lib/collection/src/shards/shard_holder/resharding.rs (1)

167-170: Make match expressions exhaustive by replacing catch-all _ arms with explicit patterns.

Lines 167–170 and 480–483 use catch-all arms when matching Option<T>. Expand these to explicitly handle Some(_) and None branches for clarity and consistency with repository guidelines on exhaustive pattern matching.

Proposed changes
         self.resharding_state.write_optional(|state| {
             let new_state =
                 ReshardState::new(uuid, direction, peer_id, shard_id, shard_key.clone());
             match state {
                 Some(existing) if *existing == new_state => None,
-                _ => Some(Some(new_state)),
+                Some(_) => Some(Some(new_state)),
+                None => Some(Some(new_state)),
             }
         })?;
         self.resharding_state.write_optional(|state| match state {
             Some(state) if state.matches(&resharding_key) => Some(None),
-            _ => None,
+            Some(_) => None,
+            None => None,
         })?;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/collection/src/shards/shard_holder/resharding.rs` around lines 167 - 170,
The match on state in resharding.rs should avoid a catch-all `_` arm; update the
expression in the function containing the fragment (the match that currently
does `match state { Some(existing) if *existing == new_state => None, _ =>
Some(Some(new_state)), }`) to explicitly handle `Some(_)` and `None` (e.g.,
`Some(existing) if *existing == new_state => None, Some(_) =>
Some(Some(new_state)), None => Some(Some(new_state))`) and make the analogous
change for the other match at the second occurrence around the `lines 480–483`
so both matches are exhaustive and use explicit `Some(_)` and `None` patterns
instead of `_`.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@lib/collection/src/shards/shard_holder/resharding.rs`:
- Around line 167-170: The match on state in resharding.rs should avoid a
catch-all `_` arm; update the expression in the function containing the fragment
(the match that currently does `match state { Some(existing) if *existing ==
new_state => None, _ => Some(Some(new_state)), }`) to explicitly handle
`Some(_)` and `None` (e.g., `Some(existing) if *existing == new_state => None,
Some(_) => Some(Some(new_state)), None => Some(Some(new_state))`) and make the
analogous change for the other match at the second occurrence around the `lines
480–483` so both matches are exhaustive and use explicit `Some(_)` and `None`
patterns instead of `_`.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 52ddf8d2-a83e-43e4-bc92-2d71d62bcd88

📥 Commits

Reviewing files that changed from the base of the PR and between ecae489 and 8f105de.

📒 Files selected for processing (1)
  • lib/collection/src/shards/shard_holder/resharding.rs

@timvisee timvisee merged commit 4702c5d into dev Apr 30, 2026
15 checks passed
@timvisee timvisee deleted the resharding-shard-holder-idempotent branch April 30, 2026 12:20
timvisee added a commit that referenced this pull request May 8, 2026
* Make resharding state transitions idempotent on replay

Why: consensus entries may be re-applied after a crash (partial state
on disk) or during raft recovery. The unchecked state-transition
helpers used `debug_assert!` to require a specific starting state, so
a replay would panic in debug or silently overwrite in release.

How to apply: use write_optional so the state file is only touched
when the in-memory state needs to change. This also avoids unnecessary
fsyncs when a replay is a no-op.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

* Skip replica set creation on resharding start replay

Why: on replay, `create_replica_set` would call `create_shard_dir`,
which removes and recreates the existing shard directory -- wiping
any shard contents that had already been migrated or written since
the first apply.

How to apply: check `contains_shard(shard_id)` before creating the
replica set and pass `None` to `start_resharding_unchecked` when a
replica set with the target shard id is already present. Also
relaxes `check_start_resharding` to return `Ok` (instead of a
swallowed `bad_request`) when a matching resharding state is
persisted, so the caller can fall through each idempotent step.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

* Derive resharding shard count from shard id for idempotency

Why: the start/finish/abort paths used ++/-- on the persisted shard
number with a `debug_assert` pinning the expected starting value. On
replay (e.g. after a crash between the shard holder mutation and the
config save) this either panics or produces a wrong count.

Since resharding always targets the last shard (auto sharding assigns
contiguous ids from zero), the target count is a pure function of the
shard id: `shard_id + 1` for start up, `shard_id` for finish down and
abort up. Set it directly and skip the save when it already matches,
so replay converges to the same value without touching the file.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

* Fall through resharding finish/abort steps on replay

Why: the early return on `resharding_state.is_none()` assumed that a
missing state means the operation was fully applied. But a crash can
leave the state cleared while the shard drop, key mapping removal, or
shard count update are still pending. Short-circuiting then skips the
reconciling work those replays need to do.

How to apply: drop the early return so every step runs; each step is
already individually idempotent (check_*, drop_and_remove_shard,
remove_shard_from_key_mapping, the set-based shard count update).
The abort_resharding down-invalidation path now reads nodes from the
router regardless of its variant, so a replay over an already-rolled-
back ring doesn't trip the removed debug_assert!s.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

* Skip shard key mapping write when nothing to remove

Why: `write_optional` returns `Some` whenever the shard key is
present, even if the shard id we're removing is already gone. That
still triggers a JSON rewrite and a cache notification on every
replay of a finished finish/abort.

How to apply: check that the shard id is actually in the set before
returning `Some`. If the set is missing or already doesn't contain
the id, return `None` so the on-disk file and in-memory data are
left untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

* Add tests for resharding replay idempotency

Covers the paths that must not error or panic on replay:
- `check_start_resharding` when matching state is already present
- `start_resharding_unchecked` preserves matching state verbatim
- `finish_resharding_unchecked` when state is already cleared

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

* Simplify finish_resharding_unchecked closure

Replace a manual `match` on `Option` with `as_ref().map(...)` to
satisfy clippy::manual_map.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

* test: add a test for resharding replay (#8839)

* [ai] add a debug_assert on shard_number (#8846)

* Update lib/collection/src/shards/shard_holder/resharding.rs

Co-authored-by: Roman Titov <[email protected]>

* Reformat

---------

Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
Co-authored-by: tellet-q <[email protected]>
Co-authored-by: Roman Titov <[email protected]>
@timvisee timvisee mentioned this pull request May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants