Skip to content

Allow peer to bootstrap with used URI if empty#8301

Merged
timvisee merged 6 commits into
devfrom
consensus-join-reuse-uri
Mar 23, 2026
Merged

Allow peer to bootstrap with used URI if empty#8301
timvisee merged 6 commits into
devfrom
consensus-join-reuse-uri

Conversation

@timvisee

@timvisee timvisee commented Mar 5, 2026

Copy link
Copy Markdown
Member

Implemented by AI.

Allow a new peer to bootstrap with a used URI as long as no shards belong to that URI. This allows peers to rejoin the cluster if it somehow failed to de-register itself from consensus. This normally isn't easily possible if storage was already wiped.

Note this only allows a peer to rejoin if it currently holds no shards. If it does have shards, bootstrapping is aborted. It's important not to lose any shards.

Specifically rejoining does:

  1. check if peer is empty
  2. remove peer ID and URI for old peer
  3. add new peer ID and URI

Some tests are included to assert behavior.

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you formatted your code locally using cargo +nightly fmt --all command prior to submission?
  3. Have you checked your code using cargo clippy --workspace --all-features command?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

timvisee added 5 commits March 5, 2026 17:14
Prompt:

I have a Qdrant cluster in distributed mode with some nodes registered
in consensus. If I bootstrap a new node with an URL that is already used
it is rejected and an error is returned. I would like to change this
behavior and allow this to happen. This would effectively replace a peer
because internally we'd drop the existing peer first, and then we'd add
the new peer so we can reuse the same peer URL. We must still reject
bootstrapping with the same URL if the peer that used the URL before us
still has any shards on it.
Prompt:

Add two tests to assert the new behavior.

The first test should:
- create a cluster
- create a collection
- bootstrap a new peer
- kill and delete the local data for this peer without removing it from consensus
- bootstrap a new peer but reuse the URI of the peer we just killed to rejoin
- bootstrapping is expected to succeed

The second test should:
- create a cluster
- create a collection
- bootstrap a new peer but reuse the URI of the last node
- bootstrapping is expected to fail because the node being replaced has data on it
@timvisee timvisee force-pushed the consensus-join-reuse-uri branch from 94ca810 to 10151e8 Compare March 9, 2026 10:12
@timvisee timvisee marked this pull request as ready for review March 9, 2026 10:13

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/consensus_tests/test_cluster_rejoin.py (1)

415-518: Cover both URI-source branches in these new regressions.

Unlike the existing same-URI regression earlier in this file, these two tests are not parameterized over uris_in_env, so they exercise only one add_peer_to_known path. Running the new replace/reject cases in both modes would give much better coverage for the new behavior.

🧪 Minimal coverage update
+@pytest.mark.parametrize("uris_in_env", [False, True])
-def test_replace_peer_without_shards_same_uri(tmp_path: pathlib.Path):
+def test_replace_peer_without_shards_same_uri(tmp_path: pathlib.Path, uris_in_env):
@@
-    peer_api_uris, peer_dirs, bootstrap_uri = start_cluster(tmp_path, N_PEERS, port_seed=10000)
+    peer_api_uris, _peer_dirs, bootstrap_uri = start_cluster(
+        tmp_path,
+        N_PEERS,
+        port_seed=10000,
+        uris_in_env=uris_in_env,
+    )
@@
-    extra_peer_api_uri = start_peer(extra_peer_dir, "peer_extra.log", bootstrap_uri, port=extra_peer_port)
+    extra_peer_api_uri = start_peer(
+        extra_peer_dir,
+        "peer_extra.log",
+        bootstrap_uri,
+        port=extra_peer_port,
+        uris_in_env=uris_in_env,
+    )
@@
-    new_extra_peer_api_uri = start_peer(new_extra_peer_dir, "peer_extra_replaced.log", bootstrap_uri, port=extra_peer_port)
+    new_extra_peer_api_uri = start_peer(
+        new_extra_peer_dir,
+        "peer_extra_replaced.log",
+        bootstrap_uri,
+        port=extra_peer_port,
+        uris_in_env=uris_in_env,
+    )

+@pytest.mark.parametrize("uris_in_env", [False, True])
-def test_reject_replace_peer_with_shards_same_uri(tmp_path: pathlib.Path):
+def test_reject_replace_peer_with_shards_same_uri(tmp_path: pathlib.Path, uris_in_env):
@@
-    peer_api_uris, peer_dirs, bootstrap_uri = start_cluster(tmp_path, N_PEERS, port_seed=10000)
+    peer_api_uris, peer_dirs, bootstrap_uri = start_cluster(
+        tmp_path,
+        N_PEERS,
+        port_seed=10000,
+        uris_in_env=uris_in_env,
+    )
@@
-    new_url = start_peer(peer_dirs[-1], "peer_replaced_rejected.log", bootstrap_uri, port=broken_peer_port)
+    start_peer(
+        peer_dirs[-1],
+        "peer_replaced_rejected.log",
+        bootstrap_uri,
+        port=broken_peer_port,
+        uris_in_env=uris_in_env,
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/consensus_tests/test_cluster_rejoin.py` around lines 415 - 518, Add
coverage for both URI-source branches by running each test case with uris_in_env
set to both True and False; wrap the existing logic in
test_replace_peer_without_shards_same_uri and
test_reject_replace_peer_with_shards_same_uri in a small loop or
parameterization that toggles the environment mode and reboots the cluster for
each mode, so both add_peer_to_known code paths are exercised when calling
start_cluster/start_peer; ensure you restore/clear the environment between
iterations and reference the test functions
(test_replace_peer_without_shards_same_uri,
test_reject_replace_peer_with_shards_same_uri) and helper functions
(start_cluster, start_peer, add_peer_to_known) when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/consensus_tests/test_cluster_rejoin.py`:
- Around line 415-518: Add coverage for both URI-source branches by running each
test case with uris_in_env set to both True and False; wrap the existing logic
in test_replace_peer_without_shards_same_uri and
test_reject_replace_peer_with_shards_same_uri in a small loop or
parameterization that toggles the environment mode and reboots the cluster for
each mode, so both add_peer_to_known code paths are exercised when calling
start_cluster/start_peer; ensure you restore/clear the environment between
iterations and reference the test functions
(test_replace_peer_without_shards_same_uri,
test_reject_replace_peer_with_shards_same_uri) and helper functions
(start_cluster, start_peer, add_peer_to_known) when making the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ec8c8b22-44d9-4c79-9ad5-5cc787fbe8ee

📥 Commits

Reviewing files that changed from the base of the PR and between 0fc5bbf and 10151e8.

📒 Files selected for processing (3)
  • lib/storage/src/content_manager/consensus_manager.rs
  • src/tonic/api/raft_api.rs
  • tests/consensus_tests/test_cluster_rejoin.py

Comment thread src/tonic/api/raft_api.rs
Comment on lines +95 to +129
let existing_peer_id = self
.consensus_state
.peer_address_by_id()
.into_iter()
.find(|(id, peer_uri)| *peer_uri == uri && *id != peer.id)
.map(|(id, _)| id);

if let Some(old_peer_id) = existing_peer_id {
let consensus_state = self.consensus_state.clone();
let has_shards = tokio::task::spawn_blocking(move || {
// Must use spawn_blocking for peer_has_shards
consensus_state.peer_has_shards(old_peer_id)
})
.await
.map_err(|err| Status::internal(format!("Failed to check shards: {err}")))?;

if has_shards {
return Err(Status::failed_precondition(format!(
"peer URI {uri} already used by peer {old_peer_id} which still has shards, \
remove its shards first or use a different URI",
)));
}

log::info!(
"Peer {} is replacing peer {old_peer_id} which has no shards ({uri})",
peer.id,
);

self.consensus_state
.propose_consensus_op_with_await(ConsensusOperations::RemovePeer(old_peer_id), None)
.await
.map_err(|err| {
Status::internal(format!("Failed to remove old peer {old_peer_id}: {err}"))
})?;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Move the replace-if-empty guard into a single consensus-side mutation.

This precondition is checked on the API side and then enforced later via separate RemovePeer and subsequent AddPeer proposals. Because identical in-flight ops are coalesced by propose_consensus_op_with_await, two concurrent joiners for the same URI can both ride the same RemovePeer(old_peer_id) and then each send their own AddPeer; shard ownership can also change after peer_has_shards returns but before the removal is committed. Please linearize the URI + shard-empty check on the Raft side (single replace op or apply-time recheck), otherwise correctness depends on timing.

@qdrant qdrant deleted a comment from coderabbitai Bot Mar 9, 2026
coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai Bot Mar 9, 2026
@timvisee timvisee merged commit 8b1ca25 into dev Mar 23, 2026
18 checks passed
@timvisee timvisee deleted the consensus-join-reuse-uri branch March 23, 2026 08:59
generall pushed a commit that referenced this pull request Mar 26, 2026
* [ai] Allow peer to bootstrap with existing URI if peer has no shards

Prompt:

I have a Qdrant cluster in distributed mode with some nodes registered
in consensus. If I bootstrap a new node with an URL that is already used
it is rejected and an error is returned. I would like to change this
behavior and allow this to happen. This would effectively replace a peer
because internally we'd drop the existing peer first, and then we'd add
the new peer so we can reuse the same peer URL. We must still reject
bootstrapping with the same URL if the peer that used the URL before us
still has any shards on it.

* [ai] Add test to assert new behavior, can rejoin if empty

Prompt:

Add two tests to assert the new behavior.

The first test should:
- create a cluster
- create a collection
- bootstrap a new peer
- kill and delete the local data for this peer without removing it from consensus
- bootstrap a new peer but reuse the URI of the peer we just killed to rejoin
- bootstrapping is expected to succeed

The second test should:
- create a cluster
- create a collection
- bootstrap a new peer but reuse the URI of the last node
- bootstrapping is expected to fail because the node being replaced has data on it

* [ai] Attempt to fix new tests

* Reformat

* [ai] Fix deadlock when rejoining with same peer URI

* [ai] Add test to ensure existing peer stops consensus on replace
KShivendu pushed a commit that referenced this pull request Mar 30, 2026
* [ai] Allow peer to bootstrap with existing URI if peer has no shards

Prompt:

I have a Qdrant cluster in distributed mode with some nodes registered
in consensus. If I bootstrap a new node with an URL that is already used
it is rejected and an error is returned. I would like to change this
behavior and allow this to happen. This would effectively replace a peer
because internally we'd drop the existing peer first, and then we'd add
the new peer so we can reuse the same peer URL. We must still reject
bootstrapping with the same URL if the peer that used the URL before us
still has any shards on it.

* [ai] Add test to assert new behavior, can rejoin if empty

Prompt:

Add two tests to assert the new behavior.

The first test should:
- create a cluster
- create a collection
- bootstrap a new peer
- kill and delete the local data for this peer without removing it from consensus
- bootstrap a new peer but reuse the URI of the peer we just killed to rejoin
- bootstrapping is expected to succeed

The second test should:
- create a cluster
- create a collection
- bootstrap a new peer but reuse the URI of the last node
- bootstrapping is expected to fail because the node being replaced has data on it

* [ai] Attempt to fix new tests

* Reformat

* [ai] Fix deadlock when rejoining with same peer URI

* [ai] Add test to ensure existing peer stops consensus on replace
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants