Don't lock shard holder for a long time in stream records transfer#8374
Merged
timvisee merged 4 commits intoMar 13, 2026
Merged
Conversation
9 tasks
timvisee
commented
Mar 12, 2026
Comment on lines
+137
to
+162
| // Read batch under shard holder lock | ||
| let prepared_batch = { | ||
| let shard_holder = shard_holder.read().await; | ||
|
|
||
| let Some(replica_set) = shard_holder.get_shard(shard_id) else { | ||
| // Forward proxy gone?! | ||
| // That would be a programming error. | ||
| return Err(CollectionError::service_error(format!( | ||
| "Shard {shard_id} is not found" | ||
| ))); | ||
| }; | ||
|
|
||
| replica_set | ||
| .read_transfer_batch(offset, TRANSFER_BATCH_SIZE, None, merge_points) | ||
| .await? | ||
|
|
||
| // Shard holder lock is dropped here, but the forward proxy update lock is still | ||
| // held inside the prepared batch, preventing concurrent updates between reading | ||
| // and sending. | ||
| }; | ||
|
|
||
| let result = replica_set | ||
| .transfer_batch(offset, TRANSFER_BATCH_SIZE, None, merge_points) | ||
| .await?; | ||
| // Send batch to remote shard without holding the shard holder lock. | ||
| // This is important because sending can take a very long time (especially the last | ||
| // batch which waits for the remote to fully process). Holding the shard holder lock | ||
| // during this time would block all other operations on the collection. | ||
| let result = prepared_batch.send(&remote_shard).await?; |
Member
Author
There was a problem hiding this comment.
Critical section is here.
First we get a prepared_batch which is quick.
Then we transfer the batch and wait for application. This can take a very long time. And now we don't hold the shard holder lock anymore.
468aadf to
bd8beac
Compare
bd8beac to
bde82cd
Compare
generall
approved these changes
Mar 12, 2026
agourlay
approved these changes
Mar 12, 2026
timvisee
added a commit
that referenced
this pull request
Mar 13, 2026
* Add basic test that shows shard holder is locked for too long * Rework test, move into stream records test * Fix typos * Don't lock shard holder for a long time in stream records transfer (#8374) * [ai] Detach shard holder lock from sending update batch * Fix clippy warning * Move stream records delay up to sleep in the middle of the batch * remove unused code --------- Co-authored-by: Andrey Vasnetsov <[email protected]> --------- Co-authored-by: Andrey Vasnetsov <[email protected]>
generall
added a commit
that referenced
this pull request
Mar 26, 2026
* Add basic test that shows shard holder is locked for too long * Rework test, move into stream records test * Fix typos * Don't lock shard holder for a long time in stream records transfer (#8374) * [ai] Detach shard holder lock from sending update batch * Fix clippy warning * Move stream records delay up to sleep in the middle of the batch * remove unused code --------- Co-authored-by: Andrey Vasnetsov <[email protected]> --------- Co-authored-by: Andrey Vasnetsov <[email protected]>
KShivendu
pushed a commit
that referenced
this pull request
Mar 30, 2026
* Add basic test that shows shard holder is locked for too long * Rework test, move into stream records test * Fix typos * Don't lock shard holder for a long time in stream records transfer (#8374) * [ai] Detach shard holder lock from sending update batch * Fix clippy warning * Move stream records delay up to sleep in the middle of the batch * remove unused code --------- Co-authored-by: Andrey Vasnetsov <[email protected]> --------- Co-authored-by: Andrey Vasnetsov <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix #8373
Splits transferring batches in stream records transfers into two. This ensures the shard holder read lock is only held for a very short time.
Stages:
The changes are a little bit invasive, but I don't see a nicer way of handling this problem.
All Submissions:
devbranch. Did you create your branch fromdev?New Feature Submissions:
cargo +nightly fmt --allcommand prior to submission?cargo clippy --workspace --all-featurescommand?