Skip to content

Conversation

@pawanjay176
Copy link
Member

@pawanjay176 pawanjay176 commented Sep 26, 2025

Issue Addressed

N/A

Proposed Changes

In #8101 , when we modified the logic to get the proposer index post fulu, we seem to have missed advancing the state at the fork boundaries to get the right Fork for signature verification.
This led to lighthouse failing all gossip verification right after transitioning to fulu that was observed on the holesky shadow fork

Sep 26 14:24:00.088 DEBUG Rejected gossip block                         error: "InvalidSignature(ProposerSignature)", graffiti: "grandine-geth-super-1", slot: 640 
Sep 26 14:24:00.099 WARN  Could not verify block for gossip. Rejecting the block  error: InvalidSignature(ProposerSignature)

I'm not completely sure this is the correct fix, but this fixes the issue with InvalidProposerSignature on the holesky shadow fork.

Thanks to @eserilev for helping debug this

@pawanjay176 pawanjay176 added ready-for-review The code is ready for review v8.0.0-rc.0 Q3 2025 release for Fusaka on Holesky labels Sep 26, 2025
@pawanjay176
Copy link
Member Author

Kurtosis config I was using to test the change:

participants_matrix:
  el:
    - el_type: besu
      el_image: ethpandaops/besu:main
  cl:
    - cl_type: grandine
      cl_image: ethpandaops/grandine:develop
      supernode: true
      cl_log_level: debug
      count: 3
    - cl_type: lighthouse
      cl_image: lighthouse-local:latest
      supernode: true
      cl_log_level: debug
    - cl_type: lighthouse
      cl_image: ethpandaops/lighthouse:unstable
      supernode: true
      cl_log_level: debug


network_params:
  fulu_fork_epoch: 1

additional_services:
  - dora
  - spamoor

global_log_level: debug
spamoor_params:
  image: ethpandaops/spamoor:master
  max_mem: 4000
  spammers:
    - scenario: eoatx
      config:
        throughput: 200
    - scenario: blobs
      config:
        throughput: 20

The unstable lighthouse node stalls at slot 32 (fulu slot) while this branch progresses fine

@eserilev
Copy link
Member

I was also able to repro the bug on current unstable, with pawans fix we are able to progress the chain just fine

@jimmygchen
Copy link
Member

Nice find! I'll have to think about the fix a bit more.

Just to understand this scenario, this didn't happen on other paths (blob / column), because we state_provider function returns the advanced state to slot instead of parent state here:

let proposer = chain.with_proposer_cache::<_, BlockError>(
proposer_shuffling_decision_block,
block_epoch,
|proposers| proposers.get_slot::<T::EthSpec>(block_slot),
|| {
// The proposer index was *not* cached and we must load the parent in order to
// determine the proposer index.
let (mut parent, _) = load_parent(block.clone(), chain)?;
let parent_state_root = if let Some(state_root) = parent.beacon_state_root {
state_root
} else {
// This is potentially a little inefficient, although we are likely to need
// the state's hash eventually (if the block is valid), and we are also likely
// to already have the hash cached (if fetched from the state cache).
parent.pre_state.canonical_root()?
};
let parent_state = parent.pre_state.clone();
opt_parent = Some(parent);
Ok((parent_state_root, parent_state))
},
)?;

In gossip block verification > with_proposer_cache > ensure_state_can_determine_proposers_for_epoch didn't try to advance the electra parent state to the same slot as target_epoch (fulu epoch) because the state should have sufficient lookahead.

So does the failure here mean the proposer indices for fork (epoch N) calculated from the parent state (epoch N-1) is different to the proposer_lookahead computed during fulu fork transition? And we can't rely on the lookahead computed from the electra parent state?


// Advance the state into the same epoch as the block. Use the "partial" method since state
// roots are not important for proposer/attester shuffling.
partial_state_advance(state, Some(state_root), target_slot, spec)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is required in all cases as we have 2 epochs of proposer lookahead from Fulu.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might only be required at fork boundaries, like when the target_epoch fork > state.current_fork

or maybe only at the fulu fork transition, i'm not fully across the fulu changes to proposer shuffling yet tbh

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it should only be required at the boundary, but I am fine with the "safer" fix or just always doing the state advance. We can come back to optimise it later

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optimised version would look something like setting minimum epoch to max(minimum_epoch, fulu_fork_epoch) when target_epoch >= fulu_fork_eppch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pawan mentioned that the proposer index was correct but the fork used for signing was incorrect (electra).

Can we simply return self.spec.fork_at_epoch(proposal_epoch) here instead of state.fork()?

let proposers = state.get_beacon_proposer_indices(proposal_epoch, &self.spec)?;
Ok::<_, E>(EpochBlockProposers::new(
proposal_epoch,
state.fork(),
proposers,
))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think that would work. Do we actually have a fork_at_epoch method that takes into account multiple forks activating at the same epoch?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think advancing the state is the more conservative approach. We can optimise further later

Copy link
Member

@eserilev eserilev Sep 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there's a few other places where we are inserting into the beacon proposer cache which might also potentially need to use the fork_at_epoch method if we don't advance the state

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also to confirm, we don't need to build the committee cache after advancing the state like here right?

state.build_committee_cache(RelativeEpoch::Previous, spec)?;
state.build_committee_cache(RelativeEpoch::Current, spec)?;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also to confirm, we don't need to build the committee cache after advancing the state like here right?

Hmm good point. These caches are not required in general for the proposer cache, but we may need them in block verification. It wouldn't hurt to add them back here (I removed them by not using cheap_state_advance):

// The proposer index was *not* cached and we must load the parent in order to
// determine the proposer index.
let (mut parent, _) = load_parent(block.clone(), chain)?;
let parent_state_root = if let Some(state_root) = parent.beacon_state_root {
state_root
} else {
// This is potentially a little inefficient, although we are likely to need
// the state's hash eventually (if the block is valid), and we are also likely
// to already have the hash cached (if fetched from the state cache).
parent.pre_state.canonical_root()?
};
let parent_state = parent.pre_state.clone();
opt_parent = Some(parent);
Ok((parent_state_root, parent_state))

Although I think we end up doing it here anyway:

let state = cheap_state_advance_to_obtain_committees::<_, BlockError>(
&mut parent.pre_state,
parent.beacon_state_root,
block.slot(),
&chain.spec,
)?;

@michaelsproul
Copy link
Member

michaelsproul commented Sep 26, 2025

Good catch! I think another fix might be just to compute what the fork "will be" at some future epoch? So use something like state.fork_at_epoch(epoch) rather than state.fork()?

But yeah maybe just advancing the state fully into the new epoch is safer and clearer. We probably don't need the optimisation where we use the lookahead to advance less

@jimmygchen
Copy link
Member

So does the failure here mean the proposer indices for fork (epoch N) calculated from the parent state (epoch N-1) is different to the proposer_lookahead computed during fulu fork transition? And we can't rely on the lookahead computed from the electra parent state?

@pawanjay176 pointed out that the proposer lookahead was correct, and it was to do with the proposer.fork

the proposer index that was calculated is correct.
the Fork for the signature verification that we pushed in the beacon proposers cache is incorrect
we retrieved it here
https://github.com/sigp/lighthouse/blob/unstable/beacon_node/beacon_chain/src/block_verification.rs#L979

Copy link
Member

@jimmygchen jimmygchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pawanjay176 @eserilev
Great turnaround 👏 as discussed let’s merge this safer fix and optimise later!

@jimmygchen jimmygchen added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Sep 28, 2025
@mergify mergify bot added the queued label Sep 28, 2025
@mergify mergify bot merged commit edcfee6 into sigp:unstable Sep 28, 2025
37 checks passed
@mergify mergify bot removed the queued label Sep 28, 2025
mergify bot pushed a commit that referenced this pull request Sep 29, 2025
Follow-up to the bug fixed in:

- #8121

This fixes the root cause of that bug, which was introduced by me in:

- #8101

Lion identified the issue here:

- #8101 (comment)


  In the methods that compute the proposer shuffling decision root, ensure we don't use lookahead for the Fulu fork epoch itself. This is accomplished by checking if Fulu is enabled at `epoch - 1`, i.e. if `epoch > fulu_fork_epoch`.

I haven't updated the methods that _compute_ shufflings to use these new corrected bounds (e.g. `BeaconState::compute_proposer_indices`), although we could make this change in future. The `get_beacon_proposer_indices` method already gracefully handles the Fulu boundary case by using the `proposer_lookahead` field (if initialised).


Co-Authored-By: Michael Sproul <[email protected]>
michaelsproul added a commit to michaelsproul/lighthouse that referenced this pull request Sep 30, 2025
lmnzx pushed a commit to lmnzx/lighthouse that referenced this pull request Nov 4, 2025
N/A


  In sigp#8101 , when we modified the logic to get the proposer index post fulu, we seem to have missed advancing the state at the fork boundaries to get the right `Fork` for signature verification.
This led to lighthouse failing all gossip verification right after transitioning to fulu that was observed on the holesky shadow fork
```
Sep 26 14:24:00.088 DEBUG Rejected gossip block                         error: "InvalidSignature(ProposerSignature)", graffiti: "grandine-geth-super-1", slot: 640
Sep 26 14:24:00.099 WARN  Could not verify block for gossip. Rejecting the block  error: InvalidSignature(ProposerSignature)
```

I'm not completely sure this is the correct fix, but this fixes the issue with `InvalidProposerSignature` on the holesky shadow fork.

Thanks to @eserilev for helping debug this


Co-Authored-By: Pawan Dhananjay <[email protected]>
lmnzx pushed a commit to lmnzx/lighthouse that referenced this pull request Nov 4, 2025
Follow-up to the bug fixed in:

- sigp#8121

This fixes the root cause of that bug, which was introduced by me in:

- sigp#8101

Lion identified the issue here:

- sigp#8101 (comment)


  In the methods that compute the proposer shuffling decision root, ensure we don't use lookahead for the Fulu fork epoch itself. This is accomplished by checking if Fulu is enabled at `epoch - 1`, i.e. if `epoch > fulu_fork_epoch`.

I haven't updated the methods that _compute_ shufflings to use these new corrected bounds (e.g. `BeaconState::compute_proposer_indices`), although we could make this change in future. The `get_beacon_proposer_indices` method already gracefully handles the Fulu boundary case by using the `proposer_lookahead` field (if initialised).


Co-Authored-By: Michael Sproul <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-merge This PR is ready to merge. v8.0.0-rc.0 Q3 2025 release for Fusaka on Holesky

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants