-
Notifications
You must be signed in to change notification settings - Fork 957
Revert incorrect fix made in #8179 #8215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
michaelsproul
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I think this is the best we can do on short notice.
Would appreciate another set of eyes on this, as I missed the issue with the original PR!
| // Do not process a column that does not descend from the finalized root. | ||
| // We just loaded the parent_block, so we can be sure that it exists in fork choice. | ||
| if !fork_choice.is_finalized_checkpoint_or_descendant(block_parent_root) { | ||
| return Err(GossipDataColumnError::NotFinalizedDescendant { block_parent_root }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is unreachable, but I'm ok with keeping the PR as a straight-up revert, and tracking the fix here:
|
Some required checks have failed. Could you please take a look @jimmygchen? 🙏 |
This PR reverts sigp#8179. It turns out that the fix was invalid because an unknown root is always not a finalized descendant: https://github.com/sigp/lighthouse/blob/522bd9e9c6ac167f2231525e937c9ebbcb86cf6e/consensus/proto_array/src/proto_array.rs#L976-L979 so for any data columns with unknown parents, it will always penalise the gossip peer and disconnect it pretty quickly. On a small network, the node may lose all of its peers. The impact is pretty obvious when the peer count is small and sync speed is slow, and is therefore easily reproducible by running a fresh supernode on devnet-3. This isn't as obvious on a live testnet like holesky / sepolia, we haven't noticed this, probably due to its high peer count and sync speed - the nodes might be able to reach head quickly before losing too many peers. The previous behaviour isn't ideal but safe: triggering unknown parent lookup and penalise the bad peer if it happens to be malicious or faulty. So for now it's safer to revert the change and plan for a proper fix after the v8 release. Co-Authored-By: Jimmy Chen <[email protected]>
Issue Addressed
This PR reverts #8179.
It turns out that the fix was invalid because an unknown root is always not a finalized descendant:
lighthouse/consensus/proto_array/src/proto_array.rs
Lines 976 to 979 in 522bd9e
so for any data columns with unknown parents, it will always penalise the gossip peer and disconnect it pretty quickly. On a small network, the node may lose all of its peers.
The impact is pretty obvious when the peer count is small and sync speed is slow, and is therefore easily reproducible by running a fresh supernode on devnet-3.
This isn't as obvious on a live testnet like holesky / sepolia, we haven't noticed this, probably due to its high peer count and sync speed - the nodes might be able to reach head quickly before losing too many peers.
Proposed Changes
The previous behaviour isn't ideal but safe: triggering unknown parent lookup and penalise the bad peer if it happens to be malicious or faulty. So for now it's safer to revert the change and plan for a proper fix after the v8 release.