Forward sync columns by root #7946

pawanjay176 · 2025-08-27T02:09:19Z

Issue Addressed

N/A

The Problem

Our current strategy of syncing blocks + columns by range works roughly as follows for each batch:

Find a peer from the current SyncingChain to fetch blocks from and send a BlocksByRange request
Find peer(s) from the global peer pool that should have columns for the same batch based on their status message and the columns they are supposed to custody , and send them a by DataColumnsByRange request at the same time
Once we get responses for all blocks + columns components, try to couple them by checking that the block_root and the kzg_commitment matches. If the coupling failed, try to re-request the failed columns.
Send them for processing and try to progress the chain

This strategy works decently well when the chain is finalizing as most of our peers are on the same chain. However, in times of non-finality we need to potentially sync multiple head chains.
This leads to issues with our current approach because the block peer and the data column peers might have a different view of the canonical chain due to multiple heads. So when we use the above approach, it is possible that the block peer returns us a batch of blocks for chain A while some or all data column peers send us the batch of data columns for a different chain B. Different data column peers might also be following different chains.

We initially tried to get around this problem by selecting column peers only from within the current SyncingChain. Each SyncingChain represents a head_root that we are trying to sync to and we group peers based on same head_root. That way, we know for sure that the block and column peers are on the same chain. This works in theory, but in practice, during long periods of non-finality, we tend to create multiple head chains based on the head_root and split the global peerset. Pre-fulu, this isn't a big deal since all peers are supposed to have all the blob data.
But splitting peers with peerdas is a big challenge due to not all peers having the full data available. There are supernodes, but during bad network conditions, supernodes would be getting way too many requests and not even have any incoming peer slots. As we saw on fusaka devnets, this strategy leads to sync getting stalled and not progressing.

Proposed Changes

1. Use `DataColumnsByRoot` instead of `DataColumnsByRange` to fetch columns for forward sync

This is the main change. The new strategy would go as follows:

Find a peer from the current SyncingChain to fetch blocks from and send a BlocksByRange request
Hold off on requesting for columns until we receive the response for the above blocks request
Once we get the blocks response, extract all the block_roots and trigger a DataColumnsByRoot request for every block in the response that has any blobs based on the expected_kzg_commitments field.
Since we request by root, we know what we are expecting in the response. The column peer's view of the canonical chain might be chain A, but if we ask for chain B and they have chain B in their fork choice, they can still serve us what we need.
We couple the block + column responses and send them for processing as before.

(4) kinda assumes that most synced/advanced peers would have different chains in their fork choice to be able to serve specific by root requests. My hunch is that this is true, but we should validate this in a devnet 4 like chain split scenario.

Note: that we currently use this by root strategy only for forward sync, not for backfill. Backfill has to deal with only a single canonical chain so byrange requests should work well there.

2. ResponsiblePeers to attribute peer fault correctly

Adds the ResponsiblePeers struct which stores the block and the column peers that we made the download requests to.
For most of our peer attributable errors, the processing error indicates whether the block peer was at fault or if the column peer was at fault.

We now communicate this information back to sync and downscore specific peers based on the fault type. This imo, is an improvement over current unstable where most of the time, we attribute fault to the peer that "completed" the request by being the last peer to respond.
Due to this ambiguity in fault attribution, we weren't downscoring pretty serious processing errors like InvalidKzgProofs, InvalidExecutionPayload etc. I think this PR attributes the errors to the right peers. Reviewers please check that this claim is actually true.

3. Make `AwaitingDownload` an allowable in-between state

Note: This has been extracted to its own PR here and merged #7984
Prior to peerdas, a batch should never have been in AwaitingDownload state because we immediataly try to move from AwaitingDownload to Downloading state by sending batches. This was always possible as long as we had peers in the SyncingChain in the pre-peerdas world.

However, this is no longer the case as a batch can be stuck waiting in AwaitingDownload state if we have no peers to request the columns from. This PR makes AwaitingDownload to be an allowable in between state. If a batch is found to be in this state, then we attempt to send the batch instead of erroring like before.
Note to reviewer: We need to make sure that this doesn't lead to a bunch of batches stuck in AwaitingDownload state if the chain can be progressed.

jimmygchen · 2025-10-10T03:19:14Z

beacon_node/network/src/sync/backfill_sync/mod.rs

+                            for (peer, columns) in batch_peers.data_columns.iter() {
+                                for faulty_column in faulty_columns {
+                                    if columns.contains(faulty_column) {
+                                        network.report_peer(*peer, *penalty, "faulty_batch");


This may cause an insta-ban on a supernode peer if we request all columns from them and if they fail to serve them.
Is it intentional to penalise peer once per faulty column?

jimmygchen · 2025-10-10T03:24:18Z

beacon_node/network/src/sync/backfill_sync/mod.rs

+                    // Penalize the peer appropriately.
+                    match faulty_component {
+                        Some(FaultyComponent::Blocks) | Some(FaultyComponent::Blobs) => {
+                            network.report_peer(


It looks like we're penalising the block peer again below?
https://github.com/sigp/lighthouse/blob/5c562c6543353aea0e4c71cf496cc3f362d7d47e/beacon_node/network/src/sync/backfill_sync/mod.rs#L652-L656participating_peers

I don't see where we are inserting into participating_peers, maybe the field can be removed too?

I would add a function that takes penalty, faulty_component, batch_peers and returns something like Vec<(PeerId, Penalty).

This function first accumulates at most one penalty (or penalty type) per peer so we prevent the issue that jimmy points out and you can dedup this logic between range sync and backfill sync

jimmygchen · 2025-10-10T03:28:56Z

beacon_node/network/src/sync/network_context/requests/data_columns_by_root.rs

+                columns: columns.clone(),
+            })
+            .collect();
+        assert!(ids.len() <= 32);


Why do we need this assertion? (should probably remove or use debug_assert! instead)

Should a be remament of a test, this line must be removed

jimmygchen · 2025-10-10T03:35:29Z

beacon_node/network/src/sync/network_context/requests/data_columns_by_root.rs

+            .items
+            .values()
+            .flatten()
+            .any(|d| d.index == data_column.index && d.block_root() == block_root)


I think we can do something like this instead of iterating through all the values?

if self .items.get(&block_root).is_some_and(|d| d.iter().any(|d| d.index == data_column.index))

jimmygchen · 2025-10-10T03:39:04Z

beacon_node/network/src/sync/network_context.rs

        let resp = self.blocks_by_range_requests.on_response(id, rpc_event);
+        match &resp {
+            Some(Ok((blocks, _))) => {
+                // On receving a successful response for a blocks by range request,


Suggested change

// On receving a successful response for a blocks by range request,

// On receiving a successful response for a blocks by range request,

jimmygchen · 2025-10-10T03:43:50Z

beacon_node/network/src/sync/block_sidecar_coupling.rs

                        error: format!("No columns for block {block_root:?} with data"),
                        faulty_peers: responsible_peers,
-                        action: PeerAction::LowToleranceError,
+                        // The block peer might be malcicious so don't downscore the column peer too bad


typo

Suggested change

// The block peer might be malcicious so don't downscore the column peer too bad

// The block peer might be malicious so don't downscore the column peer too bad

jimmygchen · 2025-10-10T03:44:06Z

beacon_node/network/src/sync/block_sidecar_coupling.rs

                        error: format!("Peers did not return column for block_root {block_root:?} {naughty_peers:?}"),
                        faulty_peers: naughty_peers,
-                        action: PeerAction::LowToleranceError,
+                        // The block peer might be malcicious so don't downscore the column peer too bad


typo

Suggested change

// The block peer might be malcicious so don't downscore the column peer too bad

// The block peer might be malicious so don't downscore the column peer too bad

jimmygchen · 2025-10-10T05:26:46Z

beacon_node/network/src/sync/network_context.rs

+                &custody_indices,
+                &synced_peers,
+                active_requests.clone(),
+                &HashSet::new(),


Do we deprioritise previously failed peers? if we only penalise but not deprioritise we may end up retrying the peer until we ban them right?

jimmygchen · 2025-10-10T05:31:10Z

beacon_node/network/src/sync/range_sync/batch.rs

+    AwaitingProcessing(BatchPeers, Vec<RpcBlock<E>>, Instant),
    /// The batch is being processed.
-    Processing(Attempt),
+    Processing(Attempt, BatchPeers),


It seems like we're currently using attempt.peer_id for prioritising block and column peers when sending requests and we ignore other BatchPeers that we've attempted. Would it be useful to replace the peer_idfield with batch_peers, so we can prioritise based on attempts with column peers too?

Note that we penalise the block peer (attempt.peer_id) of failed attempts quite harshly, even if the retry was caused by bad column peer. I think we may want to track batch peers in attempts:

lighthouse/beacon_node/network/src/sync/range_sync/chain.rs

Lines 677 to 715 in 5c562c6

BatchState::AwaitingValidation(processed_attempt) => {

for attempt in batch.attempts() {

// The validated batch has been re-processed

if attempt.hash != processed_attempt.hash {

// The re-downloaded version had a different block peer

if processed_attempt.peer_id != attempt.peer_id {

// A different peer sent the correct batch, the previous peer did not

// We negatively score the original peer.

let action = PeerAction::LowToleranceError;

debug!(

batch_epoch = %id, score_adjustment = %action,

original_peer = %attempt.peer_id, new_peer = %processed_attempt.peer_id,

"Re-processed batch validated. Scoring original peer"

);

network.report_peer(

attempt.peer_id,

action,

"batch_reprocessed_original_peer",

);

} else {

// The same peer corrected it's previous mistake. There was an error, so we

// negative score the original peer.

let action = PeerAction::MidToleranceError;

debug!(

batch_epoch = %id,

score_adjustment = %action,

original_peer = %attempt.peer_id,

new_peer = %processed_attempt.peer_id,

"Re-processed batch validated by the same peer"

);

network.report_peer(

attempt.peer_id,

action,

"batch_reprocessed_same_peer",

);

}

}

}

}

jimmygchen · 2025-10-10T05:32:34Z

beacon_node/network/src/sync/range_sync/batch.rs

+            BatchState::AwaitingDownload
+            | BatchState::Failed
+            | BatchState::Downloading(..)
+            | BatchState::AwaitingValidation(..) => None,


would the responsible_peers be useful when batch state is AwaitingValidation?

commit 9b2de09 Author: Pawan Dhananjay <[email protected]> Date: Mon Oct 6 17:22:44 2025 -0700 Rethink peer scoring commit 5c562c6 Author: Pawan Dhananjay <[email protected]> Date: Mon Oct 6 11:17:57 2025 -0700 Fix some more issues commit 15df3d2 Merge: c491856 26575c5 Author: Pawan Dhananjay <[email protected]> Date: Thu Oct 2 13:15:59 2025 -0700 Merge branch 'unstable' into blocks-then-columns commit c491856 Merge: 826a06e ffa7b2b Author: Pawan Dhananjay <[email protected]> Date: Thu Sep 25 16:23:27 2025 -0700 Merge branch 'unstable' into blocks-then-columns commit 826a06e Author: Pawan Dhananjay <[email protected]> Date: Thu Sep 25 16:21:55 2025 -0700 Fix variant name commit 421e954 Author: Pawan Dhananjay <[email protected]> Date: Thu Sep 25 16:05:45 2025 -0700 Revert "Revert type change in UnexpectedRequestId" This reverts commit 6ea14016f3d164456bc4c3cae0355ab532fe1a86. commit 3f8998f Author: Lion - dapplion <[email protected]> Date: Thu Sep 25 05:52:27 2025 +0200 Only mark block lookups as pending if block is importing from gossip (sigp#8112) - PR sigp#8045 introduced a regression of how lookup sync interacts with the da_checker. Now in unstable block import from the HTTP API also insert the block in the da_checker while the block is being execution verified. If lookup sync finds the block in the da_checker in `NotValidated` state it expects a `GossipBlockProcessResult` message sometime later. That message is only sent after block import in gossip. I confirmed in our node's logs for 4/4 cases of stuck lookups are caused by this sequence of events: - Receive block through API, insert into da_checker in fn process_block in put_pre_execution_block - Create lookup and leave in AwaitingDownload(block in processing cache) state - Block from HTTP API finishes importing - Lookup is left stuck Closes sigp#8104 - sigp#8110 was my initial solution attempt but we can't send the `GossipBlockProcessResult` event from the `http_api` crate without adding new channels, which seems messy. For a given node it's rare that a lookup is created at the same time that a block is being published. This PR solves sigp#8104 by allowing lookup sync to import the block twice in that case. Co-Authored-By: dapplion <[email protected]> commit d99df0a Author: Jimmy Chen <[email protected]> Date: Thu Sep 25 12:52:07 2025 +1000 Only send data coumn subnet discovery requests after peerdas is scheduled (sigp#8109) sigp#8105 (to be confirmed) I noticed a large number of failed discovery requests after deploying latest `unstable` to some of our testnet and mainnet nodes. This is because of a recent PeerDAS change to attempt to maintain sufficient peers across data column subnets - this shouldn't be enabled on network without peerdas scheduled, otherwise it will keep retrying discovery on these subnets and never succeed. Also removed some unused files. Co-Authored-By: Jimmy Chen <[email protected]> Co-Authored-By: Jimmy Chen <[email protected]> commit cf46d10 Author: Pawan Dhananjay <[email protected]> Date: Thu Sep 25 15:54:32 2025 -0700 Fix issues from review commit c2aa4ae Author: dapplion <[email protected]> Date: Thu Sep 25 23:58:31 2025 +0200 Revert type change in UnexpectedRequestId commit 7488755 Author: dapplion <[email protected]> Date: Thu Sep 25 22:09:06 2025 +0200 De-duplicate data columns by root request type commit 7650032 Author: dapplion <[email protected]> Date: Thu Sep 25 23:52:47 2025 +0200 Rename DataColumnsFromRange commit 4b0b655 Author: Pawan Dhananjay <[email protected]> Date: Wed Sep 24 12:31:06 2025 -0700 Fix nits commit aa6a1bc Author: Pawan Dhananjay <[email protected]> Date: Wed Sep 24 12:16:05 2025 -0700 Create a custom penalize_sync_peer method for clarity commit 27195ca Merge: 4a59d35 af27402 Author: Pawan Dhananjay <[email protected]> Date: Wed Sep 24 11:16:27 2025 -0700 Merge branch 'unstable' into blocks-then-columns commit 4a59d35 Merge: 2f35c36 5928407 Author: Pawan Dhananjay <[email protected]> Date: Wed Sep 17 15:15:44 2025 -0700 Merge branch 'unstable' into blocks-then-columns commit 2f35c36 Author: Pawan Dhananjay <[email protected]> Date: Wed Sep 17 15:13:40 2025 -0700 Add some metrics commit b3b3756 Author: Pawan Dhananjay <[email protected]> Date: Wed Sep 17 15:01:46 2025 -0700 Fix tests commit e3aed89 Author: Pawan Dhananjay <[email protected]> Date: Tue Sep 16 17:46:21 2025 -0700 Remove retry test that we do not use anymore commit baee27a Merge: 9db4c30 191570e Author: Pawan Dhananjay <[email protected]> Date: Tue Sep 16 16:39:30 2025 -0700 Merge branch 'unstable' into blocks-then-columns commit 9db4c30 Author: Pawan Dhananjay <[email protected]> Date: Tue Sep 16 16:38:19 2025 -0700 Fix small bug commit 08bba3f Author: Pawan Dhananjay <[email protected]> Date: Thu Sep 11 18:24:25 2025 -0700 fmt commit cffbd34 Author: Pawan Dhananjay <[email protected]> Date: Thu Sep 11 18:19:16 2025 -0700 Reduce code duplication commit bf09d57 Author: Pawan Dhananjay <[email protected]> Date: Thu Sep 11 15:06:07 2025 -0700 Fix some issues from lion's review commit 04398ad Author: Pawan Dhananjay <[email protected]> Date: Mon Sep 8 15:37:26 2025 -0700 Fix some more issues from review commit 4f62a9c Merge: e259ecd 8ec2640 Author: Pawan Dhananjay <[email protected]> Date: Fri Sep 5 13:00:47 2025 -0700 Merge branch 'unstable' into blocks-then-columns commit e259ecd Author: Pawan Dhananjay <[email protected]> Date: Fri Sep 5 12:58:49 2025 -0700 More renamings commit 6a2a33d Author: Pawan Dhananjay <[email protected]> Date: Fri Sep 5 12:48:05 2025 -0700 Fix some issues from review commit e0d8f04 Author: Pawan Dhananjay <[email protected]> Date: Thu Sep 4 18:09:07 2025 -0700 Tests compile commit 7e91eeb Merge: 29c2f83 9d2f55a Author: Pawan Dhananjay <[email protected]> Date: Thu Sep 4 18:08:52 2025 -0700 Merge branch 'unstable' into blocks-then-columns commit 29c2f83 Author: Pawan Dhananjay <[email protected]> Date: Tue Sep 2 13:29:19 2025 -0700 Cleanup SyncingChain commit 8458df6 Author: Pawan Dhananjay <[email protected]> Date: Mon Sep 1 12:53:01 2025 -0700 Attempt sending awaitingDownload batches when restarting sync commit 7a6d0d9 Author: Pawan Dhananjay <[email protected]> Date: Sun Aug 31 19:07:39 2025 -0700 Force processing_target request commit 4f60e86 Author: Pawan Dhananjay <[email protected]> Date: Sun Aug 31 14:54:44 2025 -0700 Add logs to debug stuck range sync commit b07bc6d Author: Pawan Dhananjay <[email protected]> Date: Fri Aug 29 16:18:53 2025 -0700 Force requests if batch buffer is full under certain conditions commit 19b0a5c Merge: 8e1337d 438fb65 Author: Pawan Dhananjay <[email protected]> Date: Thu Aug 28 23:25:14 2025 -0700 Merge branch 'unstable' into blocks-then-columns commit 8e1337d Merge: da1aaba a134d43 Author: Pawan Dhananjay <[email protected]> Date: Thu Aug 28 18:03:44 2025 -0700 Merge branch 'unstable' into blocks-then-columns commit da1aaba Author: Pawan Dhananjay <[email protected]> Date: Wed Aug 27 17:34:27 2025 -0700 Revise error tolerance commit 7331323 Author: Pawan Dhananjay <[email protected]> Date: Wed Aug 27 17:30:46 2025 -0700 AwaitingValidation state only needs block peer commit 8386bd9 Merge: b4bc7fe d235f2c Author: Pawan Dhananjay <[email protected]> Date: Wed Aug 27 17:08:42 2025 -0700 Merge branch 'unstable' into blocks-then-columns commit b4bc7fe Author: Pawan Dhananjay <[email protected]> Date: Wed Aug 27 14:30:46 2025 -0700 Remove more debug logs commit 05adb71 Author: Pawan Dhananjay <[email protected]> Date: Wed Aug 27 14:26:28 2025 -0700 Fix bug with partial column responses before all column requests sent commit a97cf88 Author: Pawan Dhananjay <[email protected]> Date: Tue Aug 26 18:53:51 2025 -0700 Add docs commit 27d0b36 Author: Pawan Dhananjay <[email protected]> Date: Tue Aug 26 16:06:27 2025 -0700 Remove debug statements commit 90d319f Merge: 7c214f5 3e78034 Author: Pawan Dhananjay <[email protected]> Date: Tue Aug 26 15:38:47 2025 -0700 Merge branch 'unstable' into blocks-then-columns commit 7c214f5 Merge: 52762b9 daf1c7c Author: Pawan Dhananjay <[email protected]> Date: Mon Aug 25 11:01:40 2025 -0700 Merge branch 'unstable' into blocks-then-columns commit 52762b9 Author: Pawan Dhananjay <[email protected]> Date: Fri Aug 22 11:52:56 2025 -0700 Handle 0 blobs per epoch case commit da27441 Merge: 521778b cee30d8 Author: Pawan Dhananjay <[email protected]> Date: Thu Aug 21 11:02:22 2025 -0700 Merge branch 'unstable' into blocks-then-columns commit 521778b Author: Pawan Dhananjay <[email protected]> Date: Thu Aug 21 07:03:58 2025 -0700 Random logs commit 4540195 Author: Pawan Dhananjay <[email protected]> Date: Wed Aug 13 17:33:47 2025 -0700 Request columns from global peer pool commit fdce537 Author: Pawan Dhananjay <[email protected]> Date: Wed Aug 20 18:37:44 2025 -0700 Implement responsible peer tracking commit 17c4e34 Author: Pawan Dhananjay <[email protected]> Date: Wed Aug 20 14:18:15 2025 -0700 Avoid root requests for backfill sync commit 1a0df30 Author: Pawan Dhananjay <[email protected]> Date: Wed Aug 20 14:14:48 2025 -0700 Also penalize all batch peers for availability check errors commit 6da924b Author: Pawan Dhananjay <[email protected]> Date: Wed Aug 20 14:14:26 2025 -0700 Fix bug in initialization code commit 68cce37 Author: Pawan Dhananjay <[email protected]> Date: Wed Aug 20 14:14:11 2025 -0700 Try to avoid chains failing for rpc errors commit ca9cfd5 Author: Pawan Dhananjay <[email protected]> Date: Tue Aug 19 12:17:59 2025 -0700 Small fixes commit 5337e46 Author: Pawan Dhananjay <[email protected]> Date: Mon Aug 18 22:59:38 2025 -0700 Add a function to retry column requests that could not be made commit 9455153 Author: Pawan Dhananjay <[email protected]> Date: Mon Aug 18 14:48:44 2025 -0700 Without retries commit 6bd8944 Author: Pawan Dhananjay <[email protected]> Date: Fri Aug 15 14:23:48 2025 -0700 Reduce backfill buffer size commit 156449c Author: Pawan Dhananjay <[email protected]> Date: Thu Aug 14 09:10:28 2025 -0700 Increase columns_by_root quota commit 836f9c6 Author: Pawan Dhananjay <[email protected]> Date: Wed Aug 13 16:37:12 2025 -0700 Priorotize status v2 commit 490b627 Author: Pawan Dhananjay <[email protected]> Date: Wed Aug 13 16:36:56 2025 -0700 Penalize if invalid EL block

jimmygchen · 2025-10-10T06:47:04Z

beacon_node/network/src/sync/range_sync/chain.rs

                    batch_epoch = %batch_id,
                    batch_state = ?batch.state(),
-                    %peer_id,
+                    ?batch_peers,


Just a note that the debug string for this could be quite long - if we eventually get trace to log integration, we would be able to easily navigate to all the requests in the traces.

Right now we could also see the peer id of individual requests in the spans.

Yeah, it's not necessary. We can work out what peer served what both with traces and with the request IDs that contain the parent request ID

Same for the Batch download error log

jimmygchen · 2025-10-10T06:48:28Z

beacon_node/network/src/sync/range_sync/chain.rs

                });
            }
-            self.send_batch(network, batch_id)
+            self.attempt_send_awaiting_download_batches(network, "injecting error")


Why do we attempt all awaiting download batches here?

jimmygchen · 2025-10-10T06:51:08Z

beacon_node/network/src/sync/range_sync/chain.rs

                },
            }
+        } else {
+            debug!(?self.to_be_downloaded, ?self.processing_target, "Did not get batch");


These are both already logged, given we enter the span at the beginning of the function - all span fields here will be logged

lighthouse/beacon_node/network/src/sync/range_sync/chain.rs

Lines 137 to 143 in 5c562c6

fields(

chain_id = %id,

start_epoch = %start_epoch,

target_head_slot = %target_head_slot,

target_head_root = %target_head_root,

chain_type = ?chain_type,

)

jimmygchen · 2025-10-10T12:46:06Z

beacon_node/network/src/sync/block_sidecar_coupling.rs

@@ -1,21 +1,25 @@
+use crate::sync::network_context::MAX_COLUMN_RETRIES;


It's weird MAX_COLUMN_RETRIES is placed in network_context, maybe we can move it to this file (the only place it's used)?

jimmygchen · 2025-10-10T12:57:17Z

beacon_node/network/src/sync/block_sidecar_coupling.rs

+    DataColumnsFromRoot {
+        requests: HashMap<
+            DataColumnsByRootRequestId,
+            ByRangeRequest<DataColumnsByRootRequestId, DataColumnSidecarList<E>>,


Its reads a bit confusing to have ByRangeRequest under DataColumnsFromRoot, should we rename ByRangeRequest to something like DataColumnBatchRequest

jimmygchen · 2025-10-10T12:59:37Z

beacon_node/network/src/sync/block_sidecar_coupling.rs

+                request_to_column_indices,
+                expected_custody_columns,
+            } => {
+                for (request, indices) in column_requests {


i think calling this request_id is less confusing here?

jimmygchen · 2025-10-10T13:02:01Z

beacon_node/network/src/sync/block_sidecar_coupling.rs

+            RangeBlockDataRequest::DataColumnsFromRoot { requests, .. } => {
                let req = requests
                    .get_mut(&req_id)
                    .ok_or(format!("unknown data columns by range req_id {req_id}"))?;


Suggested change

.ok_or(format!("unknown data columns by range req_id {req_id}"))?;

.ok_or(format!("unknown data columns by root req_id {req_id}"))?;

jimmygchen · 2025-10-10T13:08:03Z

beacon_node/network/src/sync/block_sidecar_coupling.rs

    }
-
-    #[test]
-    fn max_retries_exceeded_behavior() {


are we no longer able to test retry behaviour?

jimmygchen · 2025-10-10T13:20:37Z

beacon_node/network/src/sync/network_context.rs

            expect_max_responses,
            DataColumnsByRootRequestItems::new(request),
-            // Span is tracked in `self.custody_columns_by_root_requests` in the
-            // `ActiveCustodyRequest` struct.


The comment is still accurate - we track custody column by root requests in self.custody_columns_by_root_requests, therefore we don't try to track another span here. Do we track the range by root request anywhere?

dapplion · 2025-10-10T23:51:01Z

beacon_node/network/src/sync/network_context/requests/data_columns_by_root.rs

+                ?data_column,
+                existing_items=?self.items,
+                "Duplicated data",
+            );


Do we need this trace after completing the testing of the PR?

dapplion · 2025-10-11T00:04:21Z

beacon_node/network/src/sync/network_context.rs

            DataColumnsByRootRequestItems::new(request),
-            // Span is tracked in `self.custody_columns_by_root_requests` in the
-            // `ActiveCustodyRequest` struct.
            Span::none(),


I think this argument is ultimately un-used, can be dropped from the struct

mergify · 2025-11-15T04:40:53Z

Hi @pawanjay176, this pull request has been closed automatically due to 30 days of inactivity. If you’d like to continue working on it, feel free to reopen at any time.

mergify · 2025-12-18T02:14:15Z

Hi @pawanjay176, this pull request has been closed automatically due to 30 days of inactivity. If you’d like to continue working on it, feel free to reopen at any time.

pawanjay176 added 20 commits August 14, 2025 09:07

Penalize if invalid EL block

490b627

Priorotize status v2

836f9c6

Increase columns_by_root quota

156449c

Reduce backfill buffer size

6bd8944

Without retries

9455153

Add a function to retry column requests that could not be made

5337e46

Small fixes

ca9cfd5

Try to avoid chains failing for rpc errors

68cce37

Fix bug in initialization code

6da924b

Also penalize all batch peers for availability check errors

1a0df30

Avoid root requests for backfill sync

17c4e34

Implement responsible peer tracking

fdce537

Request columns from global peer pool

4540195

Random logs

521778b

Merge branch 'unstable' into blocks-then-columns

da27441

Handle 0 blobs per epoch case

52762b9

Merge branch 'unstable' into blocks-then-columns

7c214f5

Merge branch 'unstable' into blocks-then-columns

90d319f

Remove debug statements

27d0b36

Add docs

a97cf88

jimmygchen added syncing v8.0.0-rc.0 Q3 2025 release for Fusaka on Holesky labels Aug 27, 2025

pawanjay176 added 7 commits August 27, 2025 14:26

Fix bug with partial column responses before all column requests sent

05adb71

Remove more debug logs

b4bc7fe

Merge branch 'unstable' into blocks-then-columns

8386bd9

AwaitingValidation state only needs block peer

7331323

Revise error tolerance

da1aaba

Merge branch 'unstable' into blocks-then-columns

8e1337d

Merge branch 'unstable' into blocks-then-columns

19b0a5c

jimmygchen self-requested a review August 29, 2025 08:35

jimmygchen reviewed Oct 10, 2025

View reviewed changes

dapplion reviewed Oct 10, 2025

View reviewed changes

dapplion reviewed Oct 11, 2025

View reviewed changes

jimmygchen added v8.1.0 Post-Fulu release and removed v8.0.0 Q4 2025 Fusaka Mainnet Release labels Oct 16, 2025

mergify bot closed this Nov 15, 2025

mergify bot added the stale Stale PRs that have been inactive and is now outdated label Nov 15, 2025

jimmygchen reopened this Nov 18, 2025

mergify bot closed this Dec 18, 2025

	// On receving a successful response for a blocks by range request,
	// On receiving a successful response for a blocks by range request,

	// The block peer might be malcicious so don't downscore the column peer too bad
	// The block peer might be malicious so don't downscore the column peer too bad

	BatchState::AwaitingValidation(processed_attempt) => {
	for attempt in batch.attempts() {
	// The validated batch has been re-processed
	if attempt.hash != processed_attempt.hash {
	// The re-downloaded version had a different block peer
	if processed_attempt.peer_id != attempt.peer_id {
	// A different peer sent the correct batch, the previous peer did not
	// We negatively score the original peer.
	let action = PeerAction::LowToleranceError;
	debug!(
	batch_epoch = %id, score_adjustment = %action,
	original_peer = %attempt.peer_id, new_peer = %processed_attempt.peer_id,
	"Re-processed batch validated. Scoring original peer"
	);
	network.report_peer(
	attempt.peer_id,
	action,
	"batch_reprocessed_original_peer",
	);
	} else {
	// The same peer corrected it's previous mistake. There was an error, so we
	// negative score the original peer.
	let action = PeerAction::MidToleranceError;
	debug!(
	batch_epoch = %id,
	score_adjustment = %action,
	original_peer = %attempt.peer_id,
	new_peer = %processed_attempt.peer_id,
	"Re-processed batch validated by the same peer"
	);
	network.report_peer(
	attempt.peer_id,
	action,
	"batch_reprocessed_same_peer",
	);
	}
	}
	}
	}

	fields(
	chain_id = %id,
	start_epoch = %start_epoch,
	target_head_slot = %target_head_slot,
	target_head_root = %target_head_root,
	chain_type = ?chain_type,
	)

		@@ -1,21 +1,25 @@
		use crate::sync::network_context::MAX_COLUMN_RETRIES;

	.ok_or(format!("unknown data columns by range req_id {req_id}"))?;
	.ok_or(format!("unknown data columns by root req_id {req_id}"))?;

Forward sync columns by root #7946

Forward sync columns by root #7946

Uh oh!

Conversation

pawanjay176 commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue Addressed

The Problem

Proposed Changes

1. Use DataColumnsByRoot instead of DataColumnsByRange to fetch columns for forward sync

2. ResponsiblePeers to attribute peer fault correctly

3. Make AwaitingDownload an allowable in-between state

Uh oh!

jimmygchen Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dapplion Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Nov 15, 2025

Uh oh!

mergify bot commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

pawanjay176 commented Aug 27, 2025 •

edited

Loading

1. Use `DataColumnsByRoot` instead of `DataColumnsByRange` to fetch columns for forward sync

3. Make `AwaitingDownload` an allowable in-between state

jimmygchen Oct 10, 2025 •

edited

Loading

dapplion Oct 10, 2025 •

edited

Loading