Skip to content

Conversation

@jimmygchen
Copy link
Member

@jimmygchen jimmygchen commented Sep 15, 2025

Issue Addressed

This PR consolidates the reqresp_pre_import_cache into the data_availability_checker for the following reasons:

This PR also maintains the following functionalities

  • Serving pre-executed blocks over RPC, and they're now served from the data_availability_checker instead.
  • Using the cache for de-duplicating lookup requests.

@jimmygchen jimmygchen requested a review from jxs as a code owner September 15, 2025 07:55
@jimmygchen jimmygchen added ready-for-review The code is ready for review code-quality bug Something isn't working labels Sep 15, 2025
…e block around for longer and serving non-executed blocks isn't really necessary.

/// Remove any block components from the *processing cache* if we no longer require them. If the
/// block was imported full or erred, we no longer require them.
fn remove_notified(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need to handle the case where block import was errored and remove them from cache.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The damage to leave the invalid block in the cache is that we might serve the block over RPC after knowing it's invalid. Consider the case where we remove the block from the da_checker after noticing it's invalid and then someone triggers lookup sync and we fetch it again. We will serve the block over RPC while verification is happening for the second time.

I don't see that big of an issue with leaving the invalid block in the da_checker until LRU evicts it. Otherwise, we can leave it there but mark it as "INVALID"

Copy link
Member Author

@jimmygchen jimmygchen Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking out loud -

If the block is invalid we will downscore the peer.

If we leave the invalid block in the cache

  • we might serve the invalid block over RPC (we already are doing it before execution) - however, we are unlikely to get downscored because we will never attest or build on top of thisblock.
  • we avoid downloading the invalid block again, even if a peer sends us a child block / attestation

If we remove the invalid block from the cache

  • we still serve the invalid block over RPC before execution, but NOT after it failed validation
  • a peer might get us to fetch again and and serve us the invalid block, we will downscore them after execution
  • it doesn't' really give us much because we could still serve the invalid block over RPC before execution, and even if we remove it, it could come back again

I don't see that big of an issue with leaving the invalid block in the da_checker until LRU evicts it.

Yes agree. I actually think leaving it in the cache to prevent the lookup is better.

Otherwise, we can leave it there but mark it as "INVALID"

A block that does not transition into executed block will never become available so we don't really need to mark them I believe.

Copy link
Member Author

@jimmygchen jimmygchen Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising these good points, I'm now inclined to leave it as it is, as it doesn't seem to help us that much by removing it, as we don't really benefit from doing it for extra code and complexity.

Copy link
Member Author

@jimmygchen jimmygchen Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding more thoughts after seeing the failed block_in_processing_cache_becomes_invalid test.

If the block fails execution initially for whatever reason (e.g. engine offline), and we keep it in the cache, then there's a chance that the chain might get stuck, because we will never perform lookup until the block is evicted from the cache. I think this is not great, I'll add code to remove it so we re-trigger a new lookup in this case.

if it is indeed an invalid block, peer will get penalised for getting us to do the lookup.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing failed execution block in 5f2e85c

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will never perform lookup until the block is evicted from the cache

Just to be clear, is this because we cannot re-trigger the execution of a block sitting in the da checker? not fully clear on this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly - because lookup will not re-download the block if it's in the DA checker:

match self.chain.get_block_process_status(&block_root) {
// Unknown block, continue request to download
BlockProcessStatus::Unknown => {}
// Block is known are currently processing, expect a future event with the result of
// processing.
BlockProcessStatus::NotValidated { .. } => {
// Lookup sync event safety: If the block is currently in the processing cache, we
// are guaranteed to receive a `SyncMessage::GossipBlockProcessResult` that will
// make progress on this lookup
return Ok(LookupRequestResult::Pending("block in processing cache"));
}
// Block is fully validated. If it's not yet imported it's waiting for missing block
// components. Consider this request completed and do nothing.
BlockProcessStatus::ExecutionValidated { .. } => {
return Ok(LookupRequestResult::NoRequestNeeded(
"block execution validated",
));
}
}

so if it fails the first time because of unexpected error, e.g. EL timeout, then it doesn't gets re-downloaded, until we fallback to range sync

@mergify
Copy link

mergify bot commented Sep 15, 2025

Some required checks have failed. Could you please take a look @jimmygchen? 🙏

@mergify mergify bot added waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-review The code is ready for review labels Sep 15, 2025
@jimmygchen jimmygchen added work-in-progress PR is a work-in-progress v8.0.0-rc.0 Q3 2025 release for Fusaka on Holesky and removed waiting-on-author The reviewer has suggested changes and awaits thier implementation. labels Sep 15, 2025
Copy link
Collaborator

@dapplion dapplion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice simplification! Can't find any issues with it after mutliple passes

}

self.data_availability_checker
.put_pre_execution_block(block_root, unverified_block.block_cloned())?;
Copy link
Collaborator

@dapplion dapplion Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting that if the block is unverified, so it can be 10 MB. The current max size of the cache is 32 blocks so there's no risk of OOM.

}

/// Inserts a pre-execution block into the cache.
/// This does NOT override an existing executed block.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting that this will override a PreExecution block, but it is fine since the block root is check so the data is exactly the same. Do we care about controlling more the state transitions of this enum?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this - it only insert it if the block is None, so if it's already set it will not override. Am I missing something?

I considered more controlled state transition but i think it's enough to ensure one direction in following order:

None -> Pre execution -> Executed

We can never override an executed block

@jimmygchen jimmygchen added ready-for-review The code is ready for review and removed work-in-progress PR is a work-in-progress labels Sep 17, 2025
@jimmygchen jimmygchen requested a review from dapplion September 18, 2025 07:40
Copy link
Member

@pawanjay176 pawanjay176 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good to me. Just a question.
Also, would be good to test this on the devnet with stopping the EL node and checking if it recovers as expected.

// chain to get stuck temporarily if the block is canonical. Therefore we remove
// it from the cache if execution fails.
self.data_availability_checker
.remove_block_on_execution_error(&block_root);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree its safer to just chuck it out of the cache and let lookup fetch the block regardless of the type of the execution error.

@jimmygchen
Copy link
Member Author

This is looking good to me. Just a question. Also, would be good to test this on the devnet with stopping the EL node and checking if it recovers as expected.

Yep I was able to see a block lookup immediately sent after engine is restarted

[cl-1-lighthouse-geth] Sep 19 04:08:48.641 INFO  New block received                            slot: 61844, root: 0x1405fffdae52a9caff2ffd5f75fa13a2ec98804ce9a14b3445e3cf0c1f341c8a
[cl-1-lighthouse-geth] Sep 19 04:08:54.001 INFO  Synced                                        peers: "206", exec_hash: "0x785f3ae2c9317692a6cc3184d7a9e8a60c6c0a6a92a0502ec1dc480ddc07b9ea (verified)", finalized_root: 0xfaec29bf7d6f46d6800dba641bf72eea1cbd810a209c2974fe8c9ab6538987cf, finalized_epoch: 1930, epoch: 1932, block: "   …  empty", slot: 61844
[cl-1-lighthouse-geth] Sep 19 04:09:06.000 INFO  Synced                                        peers: "200", exec_hash: "0x785f3ae2c9317692a6cc3184d7a9e8a60c6c0a6a92a0502ec1dc480ddc07b9ea (verified)", finalized_root: 0xfaec29bf7d6f46d6800dba641bf72eea1cbd810a209c2974fe8c9ab6538987cf, finalized_epoch: 1930, epoch: 1932, block: "   …  empty", slot: 61845
[cl-1-lighthouse-geth] Sep 19 04:09:10.869 INFO  Execution engine online
[cl-1-lighthouse-geth] Sep 19 04:09:10.869 INFO  Issuing forkchoiceUpdated                     forkchoice_state: ForkchoiceState { head_block_hash: 0x785f3ae2c9317692a6cc3184d7a9e8a60c6c0a6a92a0502ec1dc480ddc07b9ea, safe_block_hash: 0x2ef47dbb5c2a4ea7910e8a99567cb8c18e874ffa72c3d3aa161a8bfa0a118f4c, finalized_block_hash: 0x6b024dc3cba5a5aa69c80f5bcd468512a27e88caafa6098e2c6e8fb6a3c6edd6 }
[cl-1-lighthouse-geth] Sep 19 04:09:13.350 INFO  New RPC block received                        slot: 61844, hash: 0x1405fffdae52a9caff2ffd5f75fa13a2ec98804ce9a14b3445e3cf0c1f341c8a, block_root: 0x1405fffdae52a9caff2ffd5f75fa13a2ec98804ce9a14b3445e3cf0c1f341c8a

@jimmygchen jimmygchen added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Sep 19, 2025
mergify bot added a commit that referenced this pull request Sep 19, 2025
mergify bot added a commit that referenced this pull request Sep 19, 2025
mergify bot added a commit that referenced this pull request Sep 19, 2025
@mergify
Copy link

mergify bot commented Sep 19, 2025

This pull request has been removed from the queue for the following reason: checks failed.

The merge conditions cannot be satisfied due to failing checks:

You can check the last failing draft PR here: #8089.

You may have to fix your CI before adding the pull request to the queue again.
If you update this pull request, to fix the CI, it will automatically be requeued once the queue conditions match again.
If you think this was a flaky issue instead, you can requeue the pull request, without updating it, by posting a @mergifyio requeue comment.

mergify bot added a commit that referenced this pull request Sep 19, 2025
@mergify mergify bot merged commit 78d330e into sigp:unstable Sep 19, 2025
37 checks passed
mergify bot pushed a commit that referenced this pull request Sep 25, 2025
…8112)

- PR #8045 introduced a regression of how lookup sync interacts with the da_checker.

Now in unstable block import from the HTTP API also insert the block in the da_checker while the block is being execution verified. If lookup sync finds the block in the da_checker in `NotValidated` state it expects a `GossipBlockProcessResult` message sometime later. That message is only sent after block import in gossip.

I confirmed in our node's logs for 4/4 cases of stuck lookups are caused by this sequence of events:
- Receive block through API, insert into da_checker in fn process_block in put_pre_execution_block
- Create lookup and leave in AwaitingDownload(block in processing cache) state
- Block from HTTP API finishes importing
- Lookup is left stuck

Closes #8104


  - #8110 was my initial solution attempt but we can't send the `GossipBlockProcessResult` event from the `http_api` crate without adding new channels, which seems messy.

For a given node it's rare that a lookup is created at the same time that a block is being published. This PR solves #8104 by allowing lookup sync to import the block twice in that case.


Co-Authored-By: dapplion <[email protected]>
pawanjay176 pushed a commit to pawanjay176/lighthouse that referenced this pull request Sep 25, 2025
…igp#8112)

- PR sigp#8045 introduced a regression of how lookup sync interacts with the da_checker.

Now in unstable block import from the HTTP API also insert the block in the da_checker while the block is being execution verified. If lookup sync finds the block in the da_checker in `NotValidated` state it expects a `GossipBlockProcessResult` message sometime later. That message is only sent after block import in gossip.

I confirmed in our node's logs for 4/4 cases of stuck lookups are caused by this sequence of events:
- Receive block through API, insert into da_checker in fn process_block in put_pre_execution_block
- Create lookup and leave in AwaitingDownload(block in processing cache) state
- Block from HTTP API finishes importing
- Lookup is left stuck

Closes sigp#8104

  - sigp#8110 was my initial solution attempt but we can't send the `GossipBlockProcessResult` event from the `http_api` crate without adding new channels, which seems messy.

For a given node it's rare that a lookup is created at the same time that a block is being published. This PR solves sigp#8104 by allowing lookup sync to import the block twice in that case.

Co-Authored-By: dapplion <[email protected]>
jimmygchen added a commit to jimmygchen/lighthouse that referenced this pull request Oct 10, 2025
commit 9b2de09
Author: Pawan Dhananjay <[email protected]>
Date:   Mon Oct 6 17:22:44 2025 -0700

    Rethink peer scoring

commit 5c562c6
Author: Pawan Dhananjay <[email protected]>
Date:   Mon Oct 6 11:17:57 2025 -0700

    Fix some more issues

commit 15df3d2
Merge: c491856 26575c5
Author: Pawan Dhananjay <[email protected]>
Date:   Thu Oct 2 13:15:59 2025 -0700

    Merge branch 'unstable' into blocks-then-columns

commit c491856
Merge: 826a06e ffa7b2b
Author: Pawan Dhananjay <[email protected]>
Date:   Thu Sep 25 16:23:27 2025 -0700

    Merge branch 'unstable' into blocks-then-columns

commit 826a06e
Author: Pawan Dhananjay <[email protected]>
Date:   Thu Sep 25 16:21:55 2025 -0700

    Fix variant name

commit 421e954
Author: Pawan Dhananjay <[email protected]>
Date:   Thu Sep 25 16:05:45 2025 -0700

    Revert "Revert type change in UnexpectedRequestId"

    This reverts commit 6ea14016f3d164456bc4c3cae0355ab532fe1a86.

commit 3f8998f
Author: Lion - dapplion <[email protected]>
Date:   Thu Sep 25 05:52:27 2025 +0200

    Only mark block lookups as pending if block is importing from gossip (sigp#8112)

    - PR sigp#8045 introduced a regression of how lookup sync interacts with the da_checker.

    Now in unstable block import from the HTTP API also insert the block in the da_checker while the block is being execution verified. If lookup sync finds the block in the da_checker in `NotValidated` state it expects a `GossipBlockProcessResult` message sometime later. That message is only sent after block import in gossip.

    I confirmed in our node's logs for 4/4 cases of stuck lookups are caused by this sequence of events:
    - Receive block through API, insert into da_checker in fn process_block in put_pre_execution_block
    - Create lookup and leave in AwaitingDownload(block in processing cache) state
    - Block from HTTP API finishes importing
    - Lookup is left stuck

    Closes sigp#8104

      - sigp#8110 was my initial solution attempt but we can't send the `GossipBlockProcessResult` event from the `http_api` crate without adding new channels, which seems messy.

    For a given node it's rare that a lookup is created at the same time that a block is being published. This PR solves sigp#8104 by allowing lookup sync to import the block twice in that case.

    Co-Authored-By: dapplion <[email protected]>

commit d99df0a
Author: Jimmy Chen <[email protected]>
Date:   Thu Sep 25 12:52:07 2025 +1000

    Only send data coumn subnet discovery requests after peerdas is scheduled (sigp#8109)

    sigp#8105 (to be confirmed)

    I noticed a large number of failed discovery requests after deploying latest `unstable` to some of our testnet and mainnet nodes. This is because of a recent PeerDAS change to attempt to maintain sufficient peers across data column subnets - this shouldn't be enabled on network without peerdas scheduled, otherwise it will keep retrying discovery on these subnets and never succeed.

    Also removed some unused files.

    Co-Authored-By: Jimmy Chen <[email protected]>

    Co-Authored-By: Jimmy Chen <[email protected]>

commit cf46d10
Author: Pawan Dhananjay <[email protected]>
Date:   Thu Sep 25 15:54:32 2025 -0700

    Fix issues from review

commit c2aa4ae
Author: dapplion <[email protected]>
Date:   Thu Sep 25 23:58:31 2025 +0200

    Revert type change in UnexpectedRequestId

commit 7488755
Author: dapplion <[email protected]>
Date:   Thu Sep 25 22:09:06 2025 +0200

    De-duplicate data columns by root request type

commit 7650032
Author: dapplion <[email protected]>
Date:   Thu Sep 25 23:52:47 2025 +0200

    Rename DataColumnsFromRange

commit 4b0b655
Author: Pawan Dhananjay <[email protected]>
Date:   Wed Sep 24 12:31:06 2025 -0700

    Fix nits

commit aa6a1bc
Author: Pawan Dhananjay <[email protected]>
Date:   Wed Sep 24 12:16:05 2025 -0700

    Create a custom penalize_sync_peer method for clarity

commit 27195ca
Merge: 4a59d35 af27402
Author: Pawan Dhananjay <[email protected]>
Date:   Wed Sep 24 11:16:27 2025 -0700

    Merge branch 'unstable' into blocks-then-columns

commit 4a59d35
Merge: 2f35c36 5928407
Author: Pawan Dhananjay <[email protected]>
Date:   Wed Sep 17 15:15:44 2025 -0700

    Merge branch 'unstable' into blocks-then-columns

commit 2f35c36
Author: Pawan Dhananjay <[email protected]>
Date:   Wed Sep 17 15:13:40 2025 -0700

    Add some metrics

commit b3b3756
Author: Pawan Dhananjay <[email protected]>
Date:   Wed Sep 17 15:01:46 2025 -0700

    Fix tests

commit e3aed89
Author: Pawan Dhananjay <[email protected]>
Date:   Tue Sep 16 17:46:21 2025 -0700

    Remove retry test that we do not use anymore

commit baee27a
Merge: 9db4c30 191570e
Author: Pawan Dhananjay <[email protected]>
Date:   Tue Sep 16 16:39:30 2025 -0700

    Merge branch 'unstable' into blocks-then-columns

commit 9db4c30
Author: Pawan Dhananjay <[email protected]>
Date:   Tue Sep 16 16:38:19 2025 -0700

    Fix small bug

commit 08bba3f
Author: Pawan Dhananjay <[email protected]>
Date:   Thu Sep 11 18:24:25 2025 -0700

    fmt

commit cffbd34
Author: Pawan Dhananjay <[email protected]>
Date:   Thu Sep 11 18:19:16 2025 -0700

    Reduce code duplication

commit bf09d57
Author: Pawan Dhananjay <[email protected]>
Date:   Thu Sep 11 15:06:07 2025 -0700

    Fix some issues from lion's review

commit 04398ad
Author: Pawan Dhananjay <[email protected]>
Date:   Mon Sep 8 15:37:26 2025 -0700

    Fix some more issues from review

commit 4f62a9c
Merge: e259ecd 8ec2640
Author: Pawan Dhananjay <[email protected]>
Date:   Fri Sep 5 13:00:47 2025 -0700

    Merge branch 'unstable' into blocks-then-columns

commit e259ecd
Author: Pawan Dhananjay <[email protected]>
Date:   Fri Sep 5 12:58:49 2025 -0700

    More renamings

commit 6a2a33d
Author: Pawan Dhananjay <[email protected]>
Date:   Fri Sep 5 12:48:05 2025 -0700

    Fix some issues from review

commit e0d8f04
Author: Pawan Dhananjay <[email protected]>
Date:   Thu Sep 4 18:09:07 2025 -0700

    Tests compile

commit 7e91eeb
Merge: 29c2f83 9d2f55a
Author: Pawan Dhananjay <[email protected]>
Date:   Thu Sep 4 18:08:52 2025 -0700

    Merge branch 'unstable' into blocks-then-columns

commit 29c2f83
Author: Pawan Dhananjay <[email protected]>
Date:   Tue Sep 2 13:29:19 2025 -0700

    Cleanup SyncingChain

commit 8458df6
Author: Pawan Dhananjay <[email protected]>
Date:   Mon Sep 1 12:53:01 2025 -0700

    Attempt sending awaitingDownload batches when restarting sync

commit 7a6d0d9
Author: Pawan Dhananjay <[email protected]>
Date:   Sun Aug 31 19:07:39 2025 -0700

    Force processing_target request

commit 4f60e86
Author: Pawan Dhananjay <[email protected]>
Date:   Sun Aug 31 14:54:44 2025 -0700

    Add logs to debug stuck range sync

commit b07bc6d
Author: Pawan Dhananjay <[email protected]>
Date:   Fri Aug 29 16:18:53 2025 -0700

    Force requests if batch buffer is full under certain conditions

commit 19b0a5c
Merge: 8e1337d 438fb65
Author: Pawan Dhananjay <[email protected]>
Date:   Thu Aug 28 23:25:14 2025 -0700

    Merge branch 'unstable' into blocks-then-columns

commit 8e1337d
Merge: da1aaba a134d43
Author: Pawan Dhananjay <[email protected]>
Date:   Thu Aug 28 18:03:44 2025 -0700

    Merge branch 'unstable' into blocks-then-columns

commit da1aaba
Author: Pawan Dhananjay <[email protected]>
Date:   Wed Aug 27 17:34:27 2025 -0700

    Revise error tolerance

commit 7331323
Author: Pawan Dhananjay <[email protected]>
Date:   Wed Aug 27 17:30:46 2025 -0700

    AwaitingValidation state only needs block peer

commit 8386bd9
Merge: b4bc7fe d235f2c
Author: Pawan Dhananjay <[email protected]>
Date:   Wed Aug 27 17:08:42 2025 -0700

    Merge branch 'unstable' into blocks-then-columns

commit b4bc7fe
Author: Pawan Dhananjay <[email protected]>
Date:   Wed Aug 27 14:30:46 2025 -0700

    Remove more debug logs

commit 05adb71
Author: Pawan Dhananjay <[email protected]>
Date:   Wed Aug 27 14:26:28 2025 -0700

    Fix bug with partial column responses before all column requests sent

commit a97cf88
Author: Pawan Dhananjay <[email protected]>
Date:   Tue Aug 26 18:53:51 2025 -0700

    Add docs

commit 27d0b36
Author: Pawan Dhananjay <[email protected]>
Date:   Tue Aug 26 16:06:27 2025 -0700

    Remove debug statements

commit 90d319f
Merge: 7c214f5 3e78034
Author: Pawan Dhananjay <[email protected]>
Date:   Tue Aug 26 15:38:47 2025 -0700

    Merge branch 'unstable' into blocks-then-columns

commit 7c214f5
Merge: 52762b9 daf1c7c
Author: Pawan Dhananjay <[email protected]>
Date:   Mon Aug 25 11:01:40 2025 -0700

    Merge branch 'unstable' into blocks-then-columns

commit 52762b9
Author: Pawan Dhananjay <[email protected]>
Date:   Fri Aug 22 11:52:56 2025 -0700

    Handle 0 blobs per epoch case

commit da27441
Merge: 521778b cee30d8
Author: Pawan Dhananjay <[email protected]>
Date:   Thu Aug 21 11:02:22 2025 -0700

    Merge branch 'unstable' into blocks-then-columns

commit 521778b
Author: Pawan Dhananjay <[email protected]>
Date:   Thu Aug 21 07:03:58 2025 -0700

    Random logs

commit 4540195
Author: Pawan Dhananjay <[email protected]>
Date:   Wed Aug 13 17:33:47 2025 -0700

    Request columns from global peer pool

commit fdce537
Author: Pawan Dhananjay <[email protected]>
Date:   Wed Aug 20 18:37:44 2025 -0700

    Implement responsible peer tracking

commit 17c4e34
Author: Pawan Dhananjay <[email protected]>
Date:   Wed Aug 20 14:18:15 2025 -0700

    Avoid root requests for backfill sync

commit 1a0df30
Author: Pawan Dhananjay <[email protected]>
Date:   Wed Aug 20 14:14:48 2025 -0700

    Also penalize all batch peers for availability check errors

commit 6da924b
Author: Pawan Dhananjay <[email protected]>
Date:   Wed Aug 20 14:14:26 2025 -0700

    Fix bug in initialization code

commit 68cce37
Author: Pawan Dhananjay <[email protected]>
Date:   Wed Aug 20 14:14:11 2025 -0700

    Try to avoid chains failing for rpc errors

commit ca9cfd5
Author: Pawan Dhananjay <[email protected]>
Date:   Tue Aug 19 12:17:59 2025 -0700

    Small fixes

commit 5337e46
Author: Pawan Dhananjay <[email protected]>
Date:   Mon Aug 18 22:59:38 2025 -0700

    Add a function to retry column requests that could not be made

commit 9455153
Author: Pawan Dhananjay <[email protected]>
Date:   Mon Aug 18 14:48:44 2025 -0700

    Without retries

commit 6bd8944
Author: Pawan Dhananjay <[email protected]>
Date:   Fri Aug 15 14:23:48 2025 -0700

    Reduce backfill buffer size

commit 156449c
Author: Pawan Dhananjay <[email protected]>
Date:   Thu Aug 14 09:10:28 2025 -0700

    Increase columns_by_root quota

commit 836f9c6
Author: Pawan Dhananjay <[email protected]>
Date:   Wed Aug 13 16:37:12 2025 -0700

    Priorotize status v2

commit 490b627
Author: Pawan Dhananjay <[email protected]>
Date:   Wed Aug 13 16:36:56 2025 -0700

    Penalize if invalid EL block
lmnzx pushed a commit to lmnzx/lighthouse that referenced this pull request Nov 4, 2025
…igp#8112)

- PR sigp#8045 introduced a regression of how lookup sync interacts with the da_checker.

Now in unstable block import from the HTTP API also insert the block in the da_checker while the block is being execution verified. If lookup sync finds the block in the da_checker in `NotValidated` state it expects a `GossipBlockProcessResult` message sometime later. That message is only sent after block import in gossip.

I confirmed in our node's logs for 4/4 cases of stuck lookups are caused by this sequence of events:
- Receive block through API, insert into da_checker in fn process_block in put_pre_execution_block
- Create lookup and leave in AwaitingDownload(block in processing cache) state
- Block from HTTP API finishes importing
- Lookup is left stuck

Closes sigp#8104


  - sigp#8110 was my initial solution attempt but we can't send the `GossipBlockProcessResult` event from the `http_api` crate without adding new channels, which seems messy.

For a given node it's rare that a lookup is created at the same time that a block is being published. This PR solves sigp#8104 by allowing lookup sync to import the block twice in that case.


Co-Authored-By: dapplion <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working code-quality ready-for-merge This PR is ready to merge. v8.0.0-rc.0 Q3 2025 release for Fusaka on Holesky

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants