Skip to content

Conversation

@jimmygchen
Copy link
Member

@jimmygchen jimmygchen commented Oct 16, 2025

Issue Addressed

This PR reverts #8179.

It turns out that the fix was invalid because an unknown root is always not a finalized descendant:

// An unknown root is not a finalized descendant. This line can only
// be reached if the user supplies a root that is not known to fork
// choice.
return false;

so for any data columns with unknown parents, it will always penalise the gossip peer and disconnect it pretty quickly. On a small network, the node may lose all of its peers.

The impact is pretty obvious when the peer count is small and sync speed is slow, and is therefore easily reproducible by running a fresh supernode on devnet-3.

This isn't as obvious on a live testnet like holesky / sepolia, we haven't noticed this, probably due to its high peer count and sync speed - the nodes might be able to reach head quickly before losing too many peers.

Proposed Changes

The previous behaviour isn't ideal but safe: triggering unknown parent lookup and penalise the bad peer if it happens to be malicious or faulty. So for now it's safer to revert the change and plan for a proper fix after the v8 release.

@jimmygchen jimmygchen added bug Something isn't working ready-for-review The code is ready for review v8.0.0 Q4 2025 Fusaka Mainnet Release labels Oct 16, 2025
Copy link
Member

@michaelsproul michaelsproul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I think this is the best we can do on short notice.

Would appreciate another set of eyes on this, as I missed the issue with the original PR!

// Do not process a column that does not descend from the finalized root.
// We just loaded the parent_block, so we can be sure that it exists in fork choice.
if !fork_choice.is_finalized_checkpoint_or_descendant(block_parent_root) {
return Err(GossipDataColumnError::NotFinalizedDescendant { block_parent_root });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is unreachable, but I'm ok with keeping the PR as a straight-up revert, and tracking the fix here:

@mergify
Copy link

mergify bot commented Oct 16, 2025

Some required checks have failed. Could you please take a look @jimmygchen? 🙏

@mergify mergify bot added waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-review The code is ready for review labels Oct 16, 2025
@jimmygchen jimmygchen added ready-for-merge This PR is ready to merge. and removed waiting-on-author The reviewer has suggested changes and awaits thier implementation. labels Oct 16, 2025
@mergify mergify bot added the queued label Oct 16, 2025
mergify bot added a commit that referenced this pull request Oct 16, 2025
@mergify mergify bot merged commit 76a37a0 into sigp:unstable Oct 16, 2025
59 of 60 checks passed
@mergify mergify bot removed the queued label Oct 16, 2025
jchavarri pushed a commit to jchavarri/lighthouse that referenced this pull request Oct 21, 2025
This PR reverts sigp#8179.

It turns out that the fix was invalid because an unknown root is always not a finalized descendant:

https://github.com/sigp/lighthouse/blob/522bd9e9c6ac167f2231525e937c9ebbcb86cf6e/consensus/proto_array/src/proto_array.rs#L976-L979

so for any data columns with unknown parents, it will always penalise the gossip peer and disconnect it pretty quickly. On a small network, the node may lose all of its peers.

The impact is pretty obvious when the peer count is small and sync speed is slow, and is therefore easily reproducible by running a fresh supernode on devnet-3.

This isn't as obvious on a live testnet like holesky / sepolia, we haven't noticed this, probably due to its high peer count and sync speed - the nodes might be able to reach head quickly before losing too many peers.


  The previous behaviour isn't ideal but safe:  triggering unknown parent lookup and penalise the bad peer if it happens to be malicious or faulty. So for now it's safer to revert the change and plan for a proper fix after the v8 release.


Co-Authored-By: Jimmy Chen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready-for-merge This PR is ready to merge. v8.0.0 Q4 2025 Fusaka Mainnet Release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants