Make resharding operations (shard holder) idempotent#8789
Conversation
Why: consensus entries may be re-applied after a crash (partial state on disk) or during raft recovery. The unchecked state-transition helpers used `debug_assert!` to require a specific starting state, so a replay would panic in debug or silently overwrite in release. How to apply: use write_optional so the state file is only touched when the in-memory state needs to change. This also avoids unnecessary fsyncs when a replay is a no-op. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Why: on replay, `create_replica_set` would call `create_shard_dir`, which removes and recreates the existing shard directory -- wiping any shard contents that had already been migrated or written since the first apply. How to apply: check `contains_shard(shard_id)` before creating the replica set and pass `None` to `start_resharding_unchecked` when a replica set with the target shard id is already present. Also relaxes `check_start_resharding` to return `Ok` (instead of a swallowed `bad_request`) when a matching resharding state is persisted, so the caller can fall through each idempotent step. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Why: the start/finish/abort paths used ++/-- on the persisted shard number with a `debug_assert` pinning the expected starting value. On replay (e.g. after a crash between the shard holder mutation and the config save) this either panics or produces a wrong count. Since resharding always targets the last shard (auto sharding assigns contiguous ids from zero), the target count is a pure function of the shard id: `shard_id + 1` for start up, `shard_id` for finish down and abort up. Set it directly and skip the save when it already matches, so replay converges to the same value without touching the file. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Why: the early return on `resharding_state.is_none()` assumed that a missing state means the operation was fully applied. But a crash can leave the state cleared while the shard drop, key mapping removal, or shard count update are still pending. Short-circuiting then skips the reconciling work those replays need to do. How to apply: drop the early return so every step runs; each step is already individually idempotent (check_*, drop_and_remove_shard, remove_shard_from_key_mapping, the set-based shard count update). The abort_resharding down-invalidation path now reads nodes from the router regardless of its variant, so a replay over an already-rolled- back ring doesn't trip the removed debug_assert!s. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Why: `write_optional` returns `Some` whenever the shard key is present, even if the shard id we're removing is already gone. That still triggers a JSON rewrite and a cache notification on every replay of a finished finish/abort. How to apply: check that the shard id is actually in the set before returning `Some`. If the set is missing or already doesn't contain the id, return `None` so the on-disk file and in-memory data are left untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Covers the paths that must not error or panic on replay: - `check_start_resharding` when matching state is already present - `start_resharding_unchecked` preserves matching state verbatim - `finish_resharding_unchecked` when state is already cleared Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Replace a manual `match` on `Option` with `as_ref().map(...)` to satisfy clippy::manual_map. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
|
This now includes the test from #8839 |
This comment was marked as resolved.
This comment was marked as resolved.
Co-authored-by: Roman Titov <[email protected]>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
lib/collection/src/shards/shard_holder/resharding.rs (1)
997-1011: ⚡ Quick winMake this replay test seed the ring into resharding mode too.
A real replay after a successful
start_reshardinghas both persistedReshardStateand an already-resharding router. This test seeds only the state, so it would still pass even if a secondring.start_resharding(...)were not replay-safe.Suggested test adjustment
// Pre-seed matching state to simulate a replay after a successful // start had already persisted the resharding state. + holder + .rings + .get_mut(&key.shard_key) + .unwrap() + .start_resharding(key.shard_id, key.direction); holder .resharding_state .write(|state| { *state = Some(expected_state.clone()); })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/collection/src/shards/shard_holder/resharding.rs` around lines 997 - 1011, The test currently seeds only holder.resharding_state with expected_state but doesn't mark the router/ring as already in resharding mode, so replay safety of ring.start_resharding isn't exercised; after writing the resharding_state (the block that sets *state = Some(expected_state.clone())), also put the ring into resharding mode to mirror a real replay — e.g. invoke the ring's resharding entrypoint (call the same method used in normal start, such as holder.ring.start_resharding(...) or the ring's equivalent API) with arguments that match the persisted state so the ring is already marked as resharding before calling holder.start_resharding_unchecked(key.clone(), None). This ensures the test validates idempotence when both persisted ReshardState and an already-resharding router exist.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@lib/collection/src/shards/shard_holder/resharding.rs`:
- Around line 997-1011: The test currently seeds only holder.resharding_state
with expected_state but doesn't mark the router/ring as already in resharding
mode, so replay safety of ring.start_resharding isn't exercised; after writing
the resharding_state (the block that sets *state =
Some(expected_state.clone())), also put the ring into resharding mode to mirror
a real replay — e.g. invoke the ring's resharding entrypoint (call the same
method used in normal start, such as holder.ring.start_resharding(...) or the
ring's equivalent API) with arguments that match the persisted state so the ring
is already marked as resharding before calling
holder.start_resharding_unchecked(key.clone(), None). This ensures the test
validates idempotence when both persisted ReshardState and an already-resharding
router exist.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ebe56e04-22dc-4c90-9f22-168aebb8cea7
📒 Files selected for processing (1)
lib/collection/src/shards/shard_holder/resharding.rs
There was a problem hiding this comment.
🧹 Nitpick comments (1)
lib/collection/src/shards/shard_holder/resharding.rs (1)
167-170: Make match expressions exhaustive by replacing catch-all_arms with explicit patterns.Lines 167–170 and 480–483 use catch-all arms when matching
Option<T>. Expand these to explicitly handleSome(_)andNonebranches for clarity and consistency with repository guidelines on exhaustive pattern matching.Proposed changes
self.resharding_state.write_optional(|state| { let new_state = ReshardState::new(uuid, direction, peer_id, shard_id, shard_key.clone()); match state { Some(existing) if *existing == new_state => None, - _ => Some(Some(new_state)), + Some(_) => Some(Some(new_state)), + None => Some(Some(new_state)), } })?;self.resharding_state.write_optional(|state| match state { Some(state) if state.matches(&resharding_key) => Some(None), - _ => None, + Some(_) => None, + None => None, })?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/collection/src/shards/shard_holder/resharding.rs` around lines 167 - 170, The match on state in resharding.rs should avoid a catch-all `_` arm; update the expression in the function containing the fragment (the match that currently does `match state { Some(existing) if *existing == new_state => None, _ => Some(Some(new_state)), }`) to explicitly handle `Some(_)` and `None` (e.g., `Some(existing) if *existing == new_state => None, Some(_) => Some(Some(new_state)), None => Some(Some(new_state))`) and make the analogous change for the other match at the second occurrence around the `lines 480–483` so both matches are exhaustive and use explicit `Some(_)` and `None` patterns instead of `_`.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@lib/collection/src/shards/shard_holder/resharding.rs`:
- Around line 167-170: The match on state in resharding.rs should avoid a
catch-all `_` arm; update the expression in the function containing the fragment
(the match that currently does `match state { Some(existing) if *existing ==
new_state => None, _ => Some(Some(new_state)), }`) to explicitly handle
`Some(_)` and `None` (e.g., `Some(existing) if *existing == new_state => None,
Some(_) => Some(Some(new_state)), None => Some(Some(new_state))`) and make the
analogous change for the other match at the second occurrence around the `lines
480–483` so both matches are exhaustive and use explicit `Some(_)` and `None`
patterns instead of `_`.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 52ddf8d2-a83e-43e4-bc92-2d71d62bcd88
📒 Files selected for processing (1)
lib/collection/src/shards/shard_holder/resharding.rs
* Make resharding state transitions idempotent on replay Why: consensus entries may be re-applied after a crash (partial state on disk) or during raft recovery. The unchecked state-transition helpers used `debug_assert!` to require a specific starting state, so a replay would panic in debug or silently overwrite in release. How to apply: use write_optional so the state file is only touched when the in-memory state needs to change. This also avoids unnecessary fsyncs when a replay is a no-op. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * Skip replica set creation on resharding start replay Why: on replay, `create_replica_set` would call `create_shard_dir`, which removes and recreates the existing shard directory -- wiping any shard contents that had already been migrated or written since the first apply. How to apply: check `contains_shard(shard_id)` before creating the replica set and pass `None` to `start_resharding_unchecked` when a replica set with the target shard id is already present. Also relaxes `check_start_resharding` to return `Ok` (instead of a swallowed `bad_request`) when a matching resharding state is persisted, so the caller can fall through each idempotent step. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * Derive resharding shard count from shard id for idempotency Why: the start/finish/abort paths used ++/-- on the persisted shard number with a `debug_assert` pinning the expected starting value. On replay (e.g. after a crash between the shard holder mutation and the config save) this either panics or produces a wrong count. Since resharding always targets the last shard (auto sharding assigns contiguous ids from zero), the target count is a pure function of the shard id: `shard_id + 1` for start up, `shard_id` for finish down and abort up. Set it directly and skip the save when it already matches, so replay converges to the same value without touching the file. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * Fall through resharding finish/abort steps on replay Why: the early return on `resharding_state.is_none()` assumed that a missing state means the operation was fully applied. But a crash can leave the state cleared while the shard drop, key mapping removal, or shard count update are still pending. Short-circuiting then skips the reconciling work those replays need to do. How to apply: drop the early return so every step runs; each step is already individually idempotent (check_*, drop_and_remove_shard, remove_shard_from_key_mapping, the set-based shard count update). The abort_resharding down-invalidation path now reads nodes from the router regardless of its variant, so a replay over an already-rolled- back ring doesn't trip the removed debug_assert!s. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * Skip shard key mapping write when nothing to remove Why: `write_optional` returns `Some` whenever the shard key is present, even if the shard id we're removing is already gone. That still triggers a JSON rewrite and a cache notification on every replay of a finished finish/abort. How to apply: check that the shard id is actually in the set before returning `Some`. If the set is missing or already doesn't contain the id, return `None` so the on-disk file and in-memory data are left untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * Add tests for resharding replay idempotency Covers the paths that must not error or panic on replay: - `check_start_resharding` when matching state is already present - `start_resharding_unchecked` preserves matching state verbatim - `finish_resharding_unchecked` when state is already cleared Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * Simplify finish_resharding_unchecked closure Replace a manual `match` on `Option` with `as_ref().map(...)` to satisfy clippy::manual_map. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * test: add a test for resharding replay (#8839) * [ai] add a debug_assert on shard_number (#8846) * Update lib/collection/src/shards/shard_holder/resharding.rs Co-authored-by: Roman Titov <[email protected]> * Reformat --------- Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]> Co-authored-by: tellet-q <[email protected]> Co-authored-by: Roman Titov <[email protected]>
Fixes #8709
Extends #8522
In all resharding operations, make each individual step idempotent. If a resharding operation has been partially applied, we fall through applied steps and still apply the remaining steps.
This fixes resharding on consensus WAL replay. We've seen resharding consensus state inconsistencies in chaos testing. I have a strong signal these are happening because of broken consensus WAL replay.
This also prevents creating too many shards, losing a shard, or generally breaking the shard holder when (re)applying resharding consensus operations. Now this works with exact shard IDs (idempotent) rather than adding or subtracting (not idempotent) a shard.
I'd like to extensively test this on chaos testing before we include this in a new release. Resharding is very complicated, and it's hard to fully comprehend all changes and understand all the possible side effects.
I recommend to review per commit.
Tasks
All Submissions:
devbranch. Did you create your branch fromdev?New Feature Submissions:
cargo +nightly fmt --allcommand prior to submission?cargo clippy --workspace --all-featurescommand?Changes to Core Features: