Skip to content

Conversation

@michaelsproul
Copy link
Member

Proposed Changes

Merge the release-v7.0.0 branch back into unstable. This unblocks CI on unstable.

There were a few merge conflicts resolved related to sync-tolerance-epochs, so I will wait for CI to pass on this branch before merging it manually (to preserve history and make future merges easier).

Do not merge this PR using mergify.

jimmygchen and others added 8 commits February 26, 2025 18:29
…ing (#7030)

Related to #6880, an issue that's usually observed on local devnets with small number of nodes.

When testing range sync, I usually shutdown a node for some period of time and restart it again. However, if it's within `SYNC_TOLERANCE_EPOCHS` (8), Lighthouse would consider the node as synced, and if it may attempt to produce a block if requested by a validator - on a local devnet, nodes frequently produce blocks - when this happens, the node ends up producing a block that would revert finality and would get disconnected from peers immediately.

NOTE: This is PR#7030 cherry-picked from `unstable` to `release-v7.0.0`.

Run Lighthouse BN with this flag to override:

```
--sync-tolerance--epoch 0
```
Cleaned up and isolated version of the `--disable-attesting` flag for the VC, from the `holesky-rescue` branch:

- #7041

I figured we don't need the `--disable-attesting` flag on the BN for now, and it was a much more invasive impl.
Adds the `--long-timeouts-multiplier` flag.
Allows granular control for VC timeouts which has proved useful in Holesky.
This change makes the `total_difficulty` field in `ExecutionBlock` an `Option<Uint256>` since newer clients are no longer including the `totalDifficulty` field.

I think this will fix #6937 but I was actually more focused on the builder registration case described below.

In our [builder-playground](https://github.com/flashbots/builder-playground) we setup a local devnet using lighthouse, reth, and mev-boost-relay.  After upgrading to reth 1.2.0 and lighthouse v7.0.0.beta.0 for Pectra, we noticed that the validator registration process was _sometimes_ failing with:

```
Feb 25 23:35:25.038 ERRO Unable to publish proposer preparation to all beacon nodes, error: Some endpoints failed, num_failed: 1 http://localhost:3500/ => RequestFailed(ServerMessage(ErrorMessage { code: 400, message: "BAD_REQUEST: error updating proposer preparations: ForkchoiceUpdate(EngineError(Api { error: Json(Error(\"missing field `totalDifficulty`\", line: 0, column: 0)) }))", stacktraces: [] })), service: preparation
Feb 25 23:35:25.099 WARN Unable to publish validator registrations to the builder network, error: Some endpoints failed, num_failed: 1 http://localhost:3500/ => RequestFailed(ServerMessage(ErrorMessage { code: 400, message: "BAD_REQUEST: error updating proposer preparations: ForkchoiceUpdate(EngineError(Api { error: Json(Error(\"missing field `totalDifficulty`\", line: 0, column: 0)) }))", stacktraces: [] })), service: preparation
```

What was even more confusing, was that it was sometimes working, which actually led to a wild goose chase thinking it was a networking issue.  However, when tracing through the LH code, I came across this comment:

https://github.com/sigp/lighthouse/blob/70194dfc6a3f4d10c9059610f889ff5a4e863a6a/beacon_node/beacon_chain/src/beacon_chain.rs#L6048-L6049

This explained why it sometimes worked, in our playground we run lighthouse with `--prepare-payload-lookahead 8000` thus there was always a 4-second window where the call wasn't made.

But, if the call was made, then this code would 100% fail with updated reth:

https://github.com/sigp/lighthouse/blob/unstable/beacon_node/execution_layer/src/lib.rs#L1688-L1692

Which would then mapped to a `Error::ForkchoiceUpdate` in `update_execution_engine_forkchoice`.


  Anyways, the fix was to make `total_difficulty` Optional, and then to update any code paths where it was used.  In doing so, I assume that if the EL doesn't include total difficulty then the chain is already post-merge.
#7044)

Replace the `2 + 2 == 5` hacks from `holesky-rescue` and use the existing `sync_tolerance_epochs` flag to control the proposer prep routines.
Unblock CI by ignoring cargo audit failures. IMO we need to merge some stuff to get our PR backlog under control and can't wait for audit fixes. I've opened issues to address the actual audit failures:

- #7090
- #7091
@michaelsproul michaelsproul added ready-for-review The code is ready for review do-not-merge labels Mar 10, 2025
@michaelsproul michaelsproul merged commit b4e79ed into unstable Mar 10, 2025
30 checks passed
@michaelsproul michaelsproul deleted the back-merge-v7-to-unstable branch March 10, 2025 09:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-review The code is ready for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants