Skip to content

Conversation

@michaelsproul
Copy link
Member

Issue Addressed

Closes:

This should reduce Lighthouse's block proposal times on Holesky and prevent us getting reorged.

Proposed Changes

  • Allow the head state to be advanced further than 1 slot. This lets us avoid epoch processing on hot paths including block production, by having new epoch boundaries pre-computed and available in the state cache.
  • Use the finalized state to prune the op pool. We were previously using the head state and trying to infer slashing/exit relevance based on exit_epoch. However some exit epochs are far in the future, despite occurring recently.

@michaelsproul michaelsproul added optimization Something to make Lighthouse run more efficiently. v8.0.0 Q4 2025 Fusaka Mainnet Release labels Oct 2, 2025
@michaelsproul
Copy link
Member Author

Combined into one PR for ease of testing (and I've already been testing the state advance as part of testing the proposer shuffling cleanup). Can separate into 2 PRs if desired.

Copy link
Member

@jimmygchen jimmygchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good to me, just a few comments and a question to help me understand this better.

|_, validator| validator.exit_epoch <= head_state.finalized_checkpoint().epoch,
head_state,
|_, validator| {
validator.slashed || validator.exit_epoch <= finalized_state.current_epoch()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch here

// Protect against advancing a state more than a single slot.
//
// Advancing more than one slot without storing the intermediate state would corrupt the
// database. Future works might store intermediate states inside this function.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand this comment about corrupting the database, could you explain this and why is this no longer an issue?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I think "corruption" is not quite the right word here (even though I wrote this comment long ago 🤣). Our database pruning used to only be capable of pruning states that were ancestors of some head state. This means that advanced states that would later have blocks applied to them (e.g. advanced state for slot 35 before it has block 35 applied) would not be considered by the pruning algorithm. So they would stay on disk forever taking up space.

Now our pruning constructs a dag based on the state summaries, so it sees these abandoned advanced states and deletes them

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pruning seems to be working well and keeping the number of states on disk bounded:

Oct 07 03:07:13.232 DEBUG Extra pruning information new_finalized_checkpoint: Checkpoint { epoch: Epoch(166396), root: 0xd4883165eb3c4517578e3cef0090d7d9539841363c89b8e535a72b835db71840 }, new_finalized_state_root: 0x066c326a4d9f1ce3f9aab95799cab7accb3c07be00ff27391ec95d4db0a46f44, split_prior_to_migration: Split { slot: Slot(5324640), state_root: 0x79d2d1444290075d7346f1b5751df60a1fffef517dfc309bfdad09da86b00c1a, block_root: 0x20f2824c49d82d7735de9ec1d31805c8db8eb796c8ceaa4b17644bb466b2b559 }, newly_finalized_blocks: 33, newly_finalized_state_roots: 33, newly_finalized_states_min_slot: 5324640, required_finalized_diff_state_slots: [Slot(5324672), Slot(5324288), Slot(5322752), Slot(5316608), Slot(5308416), Slot(5242880), Slot(4194304)], kept_summaries_for_hdiff: [(0x202c4218875820e2b660bb5b4006cd70675c941ff95aa0ead165412e3bc0db2b, Slot(4194304)), (0x1d18004bf5fbf21e0d36d78f73242e12f9a4a8377ce14f83f4735101eb0dcfbc, Slot(5242880)), (0x6c96c8f0cd07df7abd7c6de27246a4b473542307f1e996df8d7f99d3bdc570d2, Slot(5308416)), (0xf9a30950d10706cd6f6e473372c0ecd54c11de457823c8baf88efa5e0de47b42, Slot(5316608)), (0xb6326f522e323dea61109403950277853b70cad81c35102fc41d71b9679b563b, Slot(5322752)), (0x60d6523dc58f09bde2507762b384ad8417f32225d121d9164d60c49968b3f744, Slot(5324288))], state_summaries_count: 176, state_summaries_dag_roots: [(0xb6326f522e323dea61109403950277853b70cad81c35102fc41d71b9679b563b, DAGStateSummary { slot: Slot(5322752), latest_block_root: 0x4aed3a69a36b7c8a3c126422b0757b70f530cf412fa996d416bacc0f374fb6af, latest_block_slot: Slot(5322750), previous_state_root: 0x21d1041bb81ba8f287448bf0e8d2e79f0b7d330f067928c70765f37c00e943d3 }), (0x1d18004bf5fbf21e0d36d78f73242e12f9a4a8377ce14f83f4735101eb0dcfbc, DAGStateSummary { slot: Slot(5242880), latest_block_root: 0xb1861ab5afce868a483d5bded463b0b485a29340dbaa2e2c4112ec2a2e33190b, latest_block_slot: Slot(5242880), previous_state_root: 0x1e5b11982292867f894c4186fbea86d2d6f9a3ead7de0beb19c8f846e3fe0283 }), (0x202c4218875820e2b660bb5b4006cd70675c941ff95aa0ead165412e3bc0db2b, DAGStateSummary { slot: Slot(4194304), latest_block_root: 0xedb6e6a0f982ad53c7a343f939fb0bfecdef0e466dd526be7905b24b583e28ce, latest_block_slot: Slot(4194302), previous_state_root: 0x8d7edcd77a1e4b13358624aa2a725f7e0125998f5f1d897fce273a03f96a8edc }), (0x6c96c8f0cd07df7abd7c6de27246a4b473542307f1e996df8d7f99d3bdc570d2, DAGStateSummary { slot: Slot(5308416), latest_block_root: 0x41c6db8c7ec97303c7f3a47d5e28d06205d0021334735721a0fe113567da045c, latest_block_slot: Slot(5308416), previous_state_root: 0xec3e4406f735f395187b92cd669809b4a0b1086e14c053afa30a30b897d3db77 }), (0x79d2d1444290075d7346f1b5751df60a1fffef517dfc309bfdad09da86b00c1a, DAGStateSummary { slot: Slot(5324640), latest_block_root: 0x20f2824c49d82d7735de9ec1d31805c8db8eb796c8ceaa4b17644bb466b2b559, latest_block_slot: Slot(5324640), previous_state_root: 0x78eaf0323a2c6bcc24c86df213478753d8ddccf0e3ee7a6b28ab2496aae1dc2e }), (0xf9a30950d10706cd6f6e473372c0ecd54c11de457823c8baf88efa5e0de47b42, DAGStateSummary { slot: Slot(5316608), latest_block_root: 0xf40285e8ca23f92bac183c5b0a7a9d2f445fc0f971b106e48a5f6e59bebbe53f, latest_block_slot: Slot(5316607), previous_state_root: 0xdfa0919e52a05aba02cfe3c302db82ab15f464d590f17e10c28971072096f9dc }), (0x60d6523dc58f09bde2507762b384ad8417f32225d121d9164d60c49968b3f744, DAGStateSummary { slot: Slot(5324288), latest_block_root: 0xcc7d8ebf20ab724e39618afff3408b130e71da7c467465961094896ed244c696, latest_block_slot: Slot(5324287), previous_state_root: 0xe88783c0e1e742daaf12240c0d37d5daf0f82ebaa79a6498d21c7b9b287d76ef })], finalized_and_descendant_state_roots_of_finalized_checkpoint: 112, blocks_to_prune: 0, states_to_prune: 58

We are pruning down to 112 states here, which is about the right number (around ~2 per slot for the 2 epochs of unfinalized chain). We get ~2 states because of the advanced and unadvanced versions. The majority of these are tiny because they are just stored using the block-reply strategy - the only data on disk for them is the block itself and the state summary.

@jimmygchen jimmygchen added the waiting-on-author The reviewer has suggested changes and awaits thier implementation. label Oct 2, 2025
@michaelsproul michaelsproul added ready-for-review The code is ready for review and removed waiting-on-author The reviewer has suggested changes and awaits thier implementation. labels Oct 7, 2025
Copy link
Member

@jimmygchen jimmygchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic looks correct to me and the block production metrics on testnet nodes look pretty good 🎉

@jimmygchen jimmygchen added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Oct 8, 2025
@mergify mergify bot added the queued label Oct 8, 2025
mergify bot added a commit that referenced this pull request Oct 8, 2025
@mergify mergify bot merged commit 13dfa92 into sigp:unstable Oct 8, 2025
37 checks passed
@mergify mergify bot removed the queued label Oct 8, 2025
jchavarri pushed a commit to jchavarri/lighthouse that referenced this pull request Oct 21, 2025
Closes:

- sigp#4412

This should reduce Lighthouse's block proposal times on Holesky and prevent us getting reorged.


  - [x] Allow the head state to be advanced further than 1 slot. This lets us avoid epoch processing on hot paths including block production, by having new epoch boundaries pre-computed and available in the state cache.
- [x] Use the finalized state to prune the op pool. We were previously using the head state and trying to infer slashing/exit relevance based on `exit_epoch`. However some exit epochs are far in the future, despite occurring recently.


Co-Authored-By: Michael Sproul <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

optimization Something to make Lighthouse run more efficiently. ready-for-merge This PR is ready to merge. v8.0.0 Q4 2025 Fusaka Mainnet Release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants