Skip to content

Conversation

@dapplion
Copy link
Collaborator

@dapplion dapplion commented Nov 15, 2025

Issue Addressed

We want to not require checkpoint sync starts to include the required custody data columns, and instead fetch them from p2p.

Closes #6837

Proposed Changes

The checkpoint sync slot can:

  1. Be the first slot in the epoch, such that the epoch of the block == the start checkpoint epoch
  2. Be in an epoch prior to the start checkpoint epoch

In both cases backfill sync already fetches that epoch worth of blocks with current code. This PR modifies the backfill import filter function to allow to re-importing the oldest block slot in the DB.

I feel this solution is sufficient unless I'm missing something. I have not tested this yet! Michael has tested this and it works.

@dapplion dapplion requested a review from jxs as a code owner November 15, 2025 01:05
@michaelsproul michaelsproul added the v8.0.1 Cheeky patch release for Fulu label Nov 17, 2025
@michaelsproul michaelsproul self-assigned this Nov 18, 2025
@michaelsproul michaelsproul changed the base branch from unstable to release-v8.0 November 18, 2025 00:18
@michaelsproul michaelsproul force-pushed the reimport-checkpoint-sync-block branch from 8183e08 to a6607e5 Compare November 18, 2025 00:20
@michaelsproul
Copy link
Member

Retargeting at release-v8.0 so we can include this in the v8.0.1 release and prevent gappy DBs

@michaelsproul michaelsproul force-pushed the reimport-checkpoint-sync-block branch from aeefac0 to e1adef7 Compare November 18, 2025 04:13
@michaelsproul
Copy link
Member

michaelsproul commented Nov 18, 2025

The only downside to this change that I can see is that we end up reimporting blocks which do not need importing:

Nov 17 08:06:55.062 INFO Lighthouse started version: "Lighthouse/v8.0.0-0341fd6"
Nov 18 14:36:32.830 INFO Lighthouse started version: "Lighthouse/v8.0.0-226609c"
Nov 18 14:39:30.480 DEBUG Re-importing historic block block_root: 0xdbf8afdfbc3be1c714bcbc451f9569c14bb68e6bd6a85dea820bf6d2758f66cd, slot: 1768544
Nov 18 14:57:30.481 DEBUG Re-importing historic block block_root: 0xf1f00e894410b571727b00b79bce5b2c1c953773ff7ff66373cc764b7b1e1760, slot: 1760896
Nov 18 14:57:42.476 DEBUG Re-importing historic block block_root: 0xf1f00e894410b571727b00b79bce5b2c1c953773ff7ff66373cc764b7b1e1760, slot: 1760896
Nov 18 14:57:54.469 DEBUG Re-importing historic block block_root: 0xf1f00e894410b571727b00b79bce5b2c1c953773ff7ff66373cc764b7b1e1760, slot: 1760896
Nov 18 14:58:54.475 DEBUG Re-importing historic block block_root: 0x3e4bf54454eba9ee1c3210bb6266aad08277b1c95ce5b21d55fe19327ceddd51, slot: 1760576
Nov 18 15:25:05.419 INFO Lighthouse started version: "Lighthouse/v8.0.0-e1adef7"
Nov 18 15:28:06.410 DEBUG Re-importing historic block block_root: 0xd828ed672bc8190c602562c376d0649e8c254d7eea786cb32cbdd51ac0332fb3, slot: 1768800
Nov 18 15:31:18.547 DEBUG Re-importing historic block block_root: 0x1c6c4e6c96f2e86206ce3ee955527163d2b567a4b092d955b82371a9a57feaa9, slot: 1767424

These logs are from two checkpoint syncs on Hoodi. We can see backfill reimporting the first anchor block at slot 1768544, and then continuing to re-import epoch boundary blocks as it backfills further. It even reimported slot 1760896 three times.

The second sync was from slot 1768800 which we see getting reimported, and then again backfill reimports another block 1767424.

I think this is probably fine for now, it might just have a small impact on disk I/O. It doesn't seem we are reimporting on every backfill batch, it must be something related to how batches are handled and what we get from peers.

This is not straight-forward to fix because we don't have easy access to the slot of the anchor block. The AnchorInfo includes the anchor_slot which is the slot of the checkpoint state. I.e. in the case of skipped slots, it is ahead of the checkpoint block.

@michaelsproul michaelsproul added the ready-for-review The code is ready for review label Nov 18, 2025
@michaelsproul
Copy link
Member

michaelsproul commented Nov 18, 2025

Confirmed working in the skipped slot case as well, just needed to fix an unrelated bug in unaligned checkpoint sync (commits: 173b719 ebf151f). IDK why this wasn't caught by the unaligned checkpoint sync test, I've opened an issue for that here: #8426.

Logs:

Nov 18 15:55:48.713 INFO Lighthouse started version: "Lighthouse/v8.0.0-173b719"
Nov 18 16:00:58.148 DEBUG Re-importing historic block block_root: 0xa8d0dc0b55435377bc8d2739dd7a7870811f8bce0435548f1c01fc8e8e1e8bfb, slot: 1768670

This was using --checkpoint-state state_1768672.ssz --checkpoint-block block_1768670.ssz on Hoodi.

Copy link
Member

@michaelsproul michaelsproul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to merge whenever

@michaelsproul
Copy link
Member

The unaligned checkpoint sync fix broke some tests, will fix.

@mergify
Copy link

mergify bot commented Nov 18, 2025

Some required checks have failed. Could you please take a look @dapplion? 🙏

@mergify mergify bot added waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-review The code is ready for review labels Nov 18, 2025
@michaelsproul
Copy link
Member

Re-tested again on latest commit, for both aligned+unaligned cases. All working well.

@michaelsproul michaelsproul added ready-for-review The code is ready for review and removed waiting-on-author The reviewer has suggested changes and awaits thier implementation. labels Nov 19, 2025
Copy link
Member

@jimmygchen jimmygchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 🎉

@jimmygchen jimmygchen added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Nov 19, 2025
@mergify mergify bot added the queued label Nov 19, 2025
mergify bot added a commit that referenced this pull request Nov 19, 2025
@mergify
Copy link

mergify bot commented Nov 19, 2025

This pull request has been removed from the queue for the following reason: checks failed.

The merge conditions cannot be satisfied due to failing checks:

You can check the last failing draft PR here: #8433.

You may have to fix your CI before adding the pull request to the queue again.
If you update this pull request, to fix the CI, it will automatically be requeued once the queue conditions match again.
If you think this was a flaky issue instead, you can requeue the pull request, without updating it, by posting a @mergifyio requeue comment.

@mergify mergify bot added dequeued and removed queued labels Nov 19, 2025
@michaelsproul
Copy link
Member

@mergify requeue

@mergify
Copy link

mergify bot commented Nov 19, 2025

requeue

✅ The queue state of this pull request has been cleaned. It can be re-embarked automatically

@mergify mergify bot added queued and removed dequeued labels Nov 19, 2025
mergify bot added a commit that referenced this pull request Nov 19, 2025
@mergify mergify bot merged commit 74b8c02 into release-v8.0 Nov 19, 2025
36 checks passed
@mergify mergify bot deleted the reimport-checkpoint-sync-block branch November 19, 2025 11:00
@mergify mergify bot removed the queued label Nov 19, 2025
jimmygchen added a commit to jimmygchen/lighthouse that referenced this pull request Nov 26, 2025
  Verify that checkpoint sync works when blobs are not provided during
  initialization. Backfill should fetch blobs/columns via P2P as enabled
  by sigp#8417.
mergify bot pushed a commit that referenced this pull request Nov 26, 2025
Since merging this PR, we don't need `--checkpoint-blobs`, even prior to Fulu:

- #8417

This PR removes the mandatory check for blobs prior to Fulu, enabling simpler manual checkpoint sync.


Co-Authored-By: Michael Sproul <[email protected]>

Co-Authored-By: Jimmy Chen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-merge This PR is ready to merge. v8.0.1 Cheeky patch release for Fulu

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants