fix: piece state tracking and peer connection handling improvement#532
Merged
fix: piece state tracking and peer connection handling improvement#532
Conversation
Use init_logging from tracing_subscriber_config_utils to write DEBUG-level logs to /tmp/rqbit-simulate-traffic/testserver.log by default. Configurable via TESTSERVER_LOG_FILE and TESTSERVER_LOG_FILE_RUST_LOG env vars. Also document log file locations for both devserver and testserver in CLAUDE.md with guidance to always filter logs with rg instead of reading directly.
When a piece is stolen from a peer, the on_steal function sends Cancel messages but didn't remove the chunks from the victim's inflight_requests. This caused "we already requested ChunkInfo" warnings when the stealer died and the original peer re-reserved the same piece. Now uses retain() to both send cancels and remove chunks in a single pass, which is also more efficient (no intermediate Vec allocation).
When an outgoing peer connection dies, we set the peer to Dead state and spawn a wait_for_peer task that sleeps for backoff before requeuing. If an incoming connection from the same peer arrives during this sleep, the peer transitions from Dead to Live. When wait_for_peer wakes up and finds the peer Live instead of Dead, this is not a bug - it is a valid race condition where the peer reconnected to us. Previously this produced an error log. Now we gracefully handle all non-Dead states by simply skipping the requeue since the peer is either already connected, already queued, or no longer needed.
Documents the shared state structures (ChunkTracker, inflight_pieces, inflight_requests), their intended invariants, and state transitions. The previous fix (f61c1bd - remove stolen chunks from peer's inflight_requests) only partially addressed the 'already requested ChunkInfo' warning. It helps when: - Peer A steals from live peer B - Peer A dies - Peer B re-reserves the piece But it doesn't help when the victim peer DIES (not just loses pieces): - When peer B dies, its LivePeerState is consumed (taken) - on_steal's with_live_mut(B, ...) returns None - nothing to clean up - Pieces stay in inflight_pieces owned by dead peer B - mark_piece_broken_if_not_have adds pieces to queue_pieces - INVARIANT VIOLATED: pieces now in BOTH inflight_pieces AND queue_pieces - Peer A steals from dead B, requests chunks - reserve_next_needed_piece returns same piece (still in queue_pieces) - Warning fires when trying to insert already-present chunks The root cause is that on_peer_dead() doesn't clean up inflight_pieces. This doc captures the full picture before deciding on proper fix approach.
Add PieceTracker abstraction that wraps ChunkTracker + inflight_pieces to encapsulate piece state transitions and maintain invariants. Key changes: - New piece_tracker.rs with PieceTracker struct and unit tests - Consolidated reserve_next_needed_piece + try_steal_old_slow_piece into single acquire_piece method with steal(10x) -> reserve -> steal(3x) - Fixed bug where peer death left pieces in both queue_pieces AND inflight_pieces by adding release_pieces_owned_by() method - Fixed steal bug where stealing peer was not checked for having the piece - Updated STATE.md to reflect new architecture The invariant that pieces are in exactly one state (HAVE, IN_FLIGHT, QUEUED, or NOT_NEEDED) is now enforced by PieceTracker methods.
…n testserver - Torrent names are now unique by using index-based selection with cycle suffix - Each session (peers and main) now has explicit peer_id passed to SessionOptions - Root spans include peer_id for easier log tracing across sessions
The acquire_next_piece function was calling on_steal inside a with_live_mut closure, which holds a DashMap write lock on the peer. Since on_steal accesses other peers (with_peer, with_live_mut), this created a nested lock pattern that could deadlock when multiple threads try to access different peers shards simultaneously. Fix by storing the steal info and processing it after the closure returns, when no peer lock is held. Also expanded STATE.md lock ordering documentation with the critical rule about never accessing other peers while holding a peer lock.
Main session listens on port 50000, peer sessions on 50001, 50002, etc. Also includes port in tracing spans for log filtering.
When a peer disconnects while their upload requests are waiting in the rate-limited queue, we now detect this via tx.closed() and skip those requests instead of waiting forever for rate limit clearance. This helps reduce wasted time on dead peers but doesn't fully solve the queue starvation problem where one peer can monopolize the upload queue while others wait.
The delete modal was showing "Torrent #1" instead of the actual name because TorrentActions was using detailsResponse (which may not be loaded) instead of the torrent name from the list. Also simplified component props throughout: - TorrentActions now takes torrent object instead of separate props - Removed unused detailsResponse from TorrentCardContent and TorrentCard - PeersTab now takes torrent object instead of separate id/stats - Removed redundant TorrentToDelete type from DeleteTorrentModal
b572104 to
c84604e
Compare
c84604e to
2d31467
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Various fixes