-
Notifications
You must be signed in to change notification settings - Fork 957
Fix bug in fork calculation at fork boundaries #8121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Kurtosis config I was using to test the change: participants_matrix:
el:
- el_type: besu
el_image: ethpandaops/besu:main
cl:
- cl_type: grandine
cl_image: ethpandaops/grandine:develop
supernode: true
cl_log_level: debug
count: 3
- cl_type: lighthouse
cl_image: lighthouse-local:latest
supernode: true
cl_log_level: debug
- cl_type: lighthouse
cl_image: ethpandaops/lighthouse:unstable
supernode: true
cl_log_level: debug
network_params:
fulu_fork_epoch: 1
additional_services:
- dora
- spamoor
global_log_level: debug
spamoor_params:
image: ethpandaops/spamoor:master
max_mem: 4000
spammers:
- scenario: eoatx
config:
throughput: 200
- scenario: blobs
config:
throughput: 20The unstable lighthouse node stalls at slot 32 (fulu slot) while this branch progresses fine |
|
I was also able to repro the bug on current unstable, with pawans fix we are able to progress the chain just fine |
|
Nice find! I'll have to think about the fix a bit more. Just to understand this scenario, this didn't happen on other paths (blob / column), because we lighthouse/beacon_node/beacon_chain/src/block_verification.rs Lines 957 to 977 in a08fd93
In gossip block verification > So does the failure here mean the proposer indices for fork (epoch |
|
|
||
| // Advance the state into the same epoch as the block. Use the "partial" method since state | ||
| // roots are not important for proposer/attester shuffling. | ||
| partial_state_advance(state, Some(state_root), target_slot, spec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is required in all cases as we have 2 epochs of proposer lookahead from Fulu.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might only be required at fork boundaries, like when the target_epoch fork > state.current_fork
or maybe only at the fulu fork transition, i'm not fully across the fulu changes to proposer shuffling yet tbh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it should only be required at the boundary, but I am fine with the "safer" fix or just always doing the state advance. We can come back to optimise it later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The optimised version would look something like setting minimum epoch to max(minimum_epoch, fulu_fork_epoch) when target_epoch >= fulu_fork_eppch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pawan mentioned that the proposer index was correct but the fork used for signing was incorrect (electra).
Can we simply return self.spec.fork_at_epoch(proposal_epoch) here instead of state.fork()?
lighthouse/beacon_node/beacon_chain/src/beacon_chain.rs
Lines 6609 to 6614 in 4ceccd2
| let proposers = state.get_beacon_proposer_indices(proposal_epoch, &self.spec)?; | |
| Ok::<_, E>(EpochBlockProposers::new( | |
| proposal_epoch, | |
| state.fork(), | |
| proposers, | |
| )) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think that would work. Do we actually have a fork_at_epoch method that takes into account multiple forks activating at the same epoch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think advancing the state is the more conservative approach. We can optimise further later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there's a few other places where we are inserting into the beacon proposer cache which might also potentially need to use the fork_at_epoch method if we don't advance the state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also to confirm, we don't need to build the committee cache after advancing the state like here right?
lighthouse/beacon_node/beacon_chain/src/block_verification.rs
Lines 2065 to 2066 in c754234
| state.build_committee_cache(RelativeEpoch::Previous, spec)?; | |
| state.build_committee_cache(RelativeEpoch::Current, spec)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also to confirm, we don't need to build the committee cache after advancing the state like here right?
Hmm good point. These caches are not required in general for the proposer cache, but we may need them in block verification. It wouldn't hurt to add them back here (I removed them by not using cheap_state_advance):
lighthouse/beacon_node/beacon_chain/src/block_verification.rs
Lines 962 to 975 in c754234
| // The proposer index was *not* cached and we must load the parent in order to | |
| // determine the proposer index. | |
| let (mut parent, _) = load_parent(block.clone(), chain)?; | |
| let parent_state_root = if let Some(state_root) = parent.beacon_state_root { | |
| state_root | |
| } else { | |
| // This is potentially a little inefficient, although we are likely to need | |
| // the state's hash eventually (if the block is valid), and we are also likely | |
| // to already have the hash cached (if fetched from the state cache). | |
| parent.pre_state.canonical_root()? | |
| }; | |
| let parent_state = parent.pre_state.clone(); | |
| opt_parent = Some(parent); | |
| Ok((parent_state_root, parent_state)) |
Although I think we end up doing it here anyway:
lighthouse/beacon_node/beacon_chain/src/block_verification.rs
Lines 1187 to 1192 in c754234
| let state = cheap_state_advance_to_obtain_committees::<_, BlockError>( | |
| &mut parent.pre_state, | |
| parent.beacon_state_root, | |
| block.slot(), | |
| &chain.spec, | |
| )?; |
|
Good catch! I think another fix might be just to compute what the fork "will be" at some future epoch? So use something like But yeah maybe just advancing the state fully into the new epoch is safer and clearer. We probably don't need the optimisation where we use the lookahead to advance less |
@pawanjay176 pointed out that the proposer lookahead was correct, and it was to do with the proposer.fork
|
jimmygchen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pawanjay176 @eserilev
Great turnaround 👏 as discussed let’s merge this safer fix and optimise later!
Follow-up to the bug fixed in: - #8121 This fixes the root cause of that bug, which was introduced by me in: - #8101 Lion identified the issue here: - #8101 (comment) In the methods that compute the proposer shuffling decision root, ensure we don't use lookahead for the Fulu fork epoch itself. This is accomplished by checking if Fulu is enabled at `epoch - 1`, i.e. if `epoch > fulu_fork_epoch`. I haven't updated the methods that _compute_ shufflings to use these new corrected bounds (e.g. `BeaconState::compute_proposer_indices`), although we could make this change in future. The `get_beacon_proposer_indices` method already gracefully handles the Fulu boundary case by using the `proposer_lookahead` field (if initialised). Co-Authored-By: Michael Sproul <[email protected]>
N/A In sigp#8101 , when we modified the logic to get the proposer index post fulu, we seem to have missed advancing the state at the fork boundaries to get the right `Fork` for signature verification. This led to lighthouse failing all gossip verification right after transitioning to fulu that was observed on the holesky shadow fork ``` Sep 26 14:24:00.088 DEBUG Rejected gossip block error: "InvalidSignature(ProposerSignature)", graffiti: "grandine-geth-super-1", slot: 640 Sep 26 14:24:00.099 WARN Could not verify block for gossip. Rejecting the block error: InvalidSignature(ProposerSignature) ``` I'm not completely sure this is the correct fix, but this fixes the issue with `InvalidProposerSignature` on the holesky shadow fork. Thanks to @eserilev for helping debug this Co-Authored-By: Pawan Dhananjay <[email protected]>
Follow-up to the bug fixed in: - sigp#8121 This fixes the root cause of that bug, which was introduced by me in: - sigp#8101 Lion identified the issue here: - sigp#8101 (comment) In the methods that compute the proposer shuffling decision root, ensure we don't use lookahead for the Fulu fork epoch itself. This is accomplished by checking if Fulu is enabled at `epoch - 1`, i.e. if `epoch > fulu_fork_epoch`. I haven't updated the methods that _compute_ shufflings to use these new corrected bounds (e.g. `BeaconState::compute_proposer_indices`), although we could make this change in future. The `get_beacon_proposer_indices` method already gracefully handles the Fulu boundary case by using the `proposer_lookahead` field (if initialised). Co-Authored-By: Michael Sproul <[email protected]>
Issue Addressed
N/A
Proposed Changes
In #8101 , when we modified the logic to get the proposer index post fulu, we seem to have missed advancing the state at the fork boundaries to get the right
Forkfor signature verification.This led to lighthouse failing all gossip verification right after transitioning to fulu that was observed on the holesky shadow fork
I'm not completely sure this is the correct fix, but this fixes the issue with
InvalidProposerSignatureon the holesky shadow fork.Thanks to @eserilev for helping debug this