Queue proxy shard transfer uses pipelining by agourlay · Pull Request #8656 · qdrant/qdrant

agourlay · 2026-04-13T15:26:04Z

Summary

During shard snapshot (and WAL delta) transfer, the backfill phase drains queued WAL operations to the remote via transfer_all_missed_updates.

The current implementation is fully sequential:

read a batch from WAL
send it over the network
wait for ACK
then read the next batch.

This PR overlaps WAL reads with network sends using tokio::join!; while the current batch is being sent to the remote, the next batch is prefetched from the WAL concurrently.

This eliminates the WAL read latency between network round-trips.

Changes

Split transfer_wal_batch into read_wal_batch and send_wal_batch
Restructure transfer_all_missed_updates to pipeline non-last batches
Last batch still acquires update_lock and re-reads WAL for correctness
Cancel safety preserved (idempotent re-sends on retry)

Design

Before: READ₁ → SEND₁ → READ₂ → SEND₂ → ... → READ_N → SEND_N(last)
After: READ₁ → [SEND₁ + READ₂] → [SEND₂ + READ₃] → ... → SEND_N(last)

The last batch is always handled under update_lock with a fresh WAL read to guarantee no writes are missed.

Prefetched batches that appear to be "last" are discarded and re-read under the lock.

Peak memory usage increases from ~1 to ~2 batch buffers (~64 MiB) during pipelined transfers.

Test plan

cargo build -p collection
cargo test -p collection transfer
Consensus integration tests (test_shard_snapshot_transfer.py)
Performance benchmark

timvisee · 2026-04-14T14:34:52Z

+                // is guaranteed to return reached_end = true.
+                let _update_lock = self.update_lock.lock().await;
+                let transfer_from = self.transfer_from.load(Ordering::Relaxed);
+                let wal_batch = self.read_wal_batch(transfer_from).await?;


This does not guarantee we read everything.

If we've accumulated a lot more operations before taking the update lock, the batch we get here may be missing the tail.

timvisee · 2026-04-14T14:47:51Z

+            if is_last {
+                // For the last batch, acquire update_lock to prevent new writes from accumulating
+                // on the WAL, then re-read to ensure we capture everything.
+                // Since update_lock blocks all update() calls, the WAL is frozen and the re-read
+                // is guaranteed to return reached_end = true.
+                let _update_lock = self.update_lock.lock().await;
+                let transfer_from = self.transfer_from.load(Ordering::Relaxed);
+                let wal_batch = self.read_wal_batch(transfer_from).await?;
+                self.send_wal_batch(&wal_batch, true).await?;
+                break;
+            }


I find this separate send branch a bit confusing.

I have the following proposal to simplify it: #8675

Please feel free to pick the one you like best.

qdrant-cloud-bot

Good optimization — the pipelining approach is clean and the second commit properly addresses Tim's concern about the re-read under lock needing to loop to completion.

A few observations:

1. Spurious WaitUntil::Segment on a non-final batch

When a batch read without the lock has reached_end = true, the code acquires update_lock and sends it with is_last = true → WaitUntil::Segment. But if writes arrived between that unlocked read and lock acquisition, this isn't actually the last batch. The subsequent batches (under lock) will use WaitUntil::Wal until the true last one gets Segment again.

This is harmless (one extra segment-level wait, rarely triggered) but worth a comment to avoid confusion on future reads. Consider deriving is_last as batch.reached_end && update_lock.is_some() instead — that way only batches read under the lock can be treated as truly final, and you avoid the unnecessary Segment wait.

2. Prefetch on the truly-last batch is a no-op read

When update_lock is held and reached_end = true, the tokio::join! still fires read_wal_batch(next_from) which will return an empty batch that's immediately used to exit the loop on the next iteration. This is cheap, so probably not worth optimizing away, but a short comment like // prefetch will return empty; handled on next iteration would help readability.

3. send_wal_batch takes is_last but also uses wal_batch.total and transfer_from

send_wal_batch reads self.transfer_from at the top, but this value could have been updated by the previous call's self.transfer_from.store(idx + 1, ...) inside the same function. This works correctly because each send updates it on success and the next send reads the new value, but the "Load → use for is_first" pattern is a bit non-obvious. The is_first check works because only the very first call has transfer_from == self.started_at. Fine as-is, just noting the implicit coupling.

4. Minor: WalBatch.total can go stale (CodeRabbit's nitpick)

Agree this is cosmetic only — progress may briefly wobble but can't cause a bug. Not worth fixing unless you want to.

Overall LGTM — the lock management is correct and the pipelining design is sound. The key invariant — "once update_lock is held, no new writes accumulate, and we drain all remaining batches under the lock" — holds.

timvisee

One remark:

* Queue proxy shard transfer uses pipelining * make sure last WAL read is made under update_lock * simplify

Queue proxy shard transfer uses pipelining

8f6f680

agourlay force-pushed the perf-queue-proxy-shard-transfer-pipeline branch from 7c68cde to 8f6f680 Compare April 14, 2026 07:18

agourlay marked this pull request as ready for review April 14, 2026 13:59

agourlay requested a review from timvisee April 14, 2026 13:59

This comment was marked as resolved.

Sign in to view

timvisee mentioned this pull request Apr 14, 2026

Use single read/send in loop, first iteration only reads #8675

Closed

3 tasks

timvisee reviewed Apr 14, 2026

View reviewed changes

qdrant deleted a comment from coderabbitai Bot Apr 14, 2026

make sure last WAL read is made under update_lock

aa2161f

qdrant-cloud-bot reviewed Apr 15, 2026

View reviewed changes

timvisee mentioned this pull request Apr 15, 2026

Send batches from futures stream, use buffering to prefetch reads #8687

Closed

3 tasks

qdrant deleted a comment from coderabbitai Bot Apr 16, 2026

timvisee approved these changes Apr 16, 2026

View reviewed changes

Comment thread lib/collection/src/shards/queue_proxy_shard.rs Outdated

simplify

09d1007

This comment was marked as resolved.

Sign in to view

qdrant deleted a comment from coderabbitai Bot Apr 16, 2026

agourlay merged commit c01d16f into dev Apr 16, 2026
15 checks passed

agourlay deleted the perf-queue-proxy-shard-transfer-pipeline branch April 16, 2026 10:32

timvisee pushed a commit that referenced this pull request May 8, 2026

Queue proxy shard transfer uses pipelining (#8656)

85aceef

* Queue proxy shard transfer uses pipelining * make sure last WAL read is made under update_lock * simplify

timvisee mentioned this pull request May 8, 2026

Bump version to 1.18.0 #8959

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Queue proxy shard transfer uses pipelining#8656

Queue proxy shard transfer uses pipelining#8656
agourlay merged 3 commits into
devfrom
perf-queue-proxy-shard-transfer-pipeline

agourlay commented Apr 13, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

timvisee Apr 14, 2026

Uh oh!

timvisee Apr 14, 2026

Uh oh!

qdrant-cloud-bot left a comment

Uh oh!

timvisee left a comment

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

agourlay commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Design

Test plan

Uh oh!

This comment was marked as resolved.

Uh oh!

timvisee Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

timvisee Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

qdrant-cloud-bot left a comment

Choose a reason for hiding this comment

Uh oh!

timvisee left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

agourlay commented Apr 13, 2026 •

edited

Loading