execution/builder: keep polling dry txnpool for txns until we are interrupted #16412

taratorio · 2025-08-01T15:55:58Z

Test "Blob Transaction Ordering, Multiple Clients (Cancun) (erigon)" is flaky (example run - https://github.com/erigontech/erigon/actions/runs/16647657985/job/47112094838)

the test setup is:

2 EL clients
some blob txns are sent to client A others to client B
after they are all sent, payloads are built by client A
expectation is that each block will have 6 blob txns included

we flake because:

we stop block building when the txpool dries out
in the test the time between FCU and GetPayload is 2 seconds
we do not wait the full 2 seconds
sometimes we get the blob txns in time, other times we dont (depends on txpool propagation delay which can vary depending on hardware/resource consumption on the machine, note we run with --sim.parallelism 8 which can also affect this)
if we keep on poking the txpool within the 2 seconds (driven by the CL) we will give more time for the txpool propagations to reach the builder

coincidentally, this may be one reason why we sometimes see Erigon produce emptier blocks on devnets as compared to other clients

…errupted

somnergy · 2025-08-01T20:00:15Z

But then it's gonna keep polling till the end of the slot if txpool is honestly empty. The execution should build a block ideally within the first 2 seconds or so.

Edit: After reading your desc. again, i think you don't mean to pass config.secondsPerSlot as the wait time?

taratorio · 2025-08-04T08:19:13Z

But then it's gonna keep polling till the end of the slot if txpool is honestly empty. The execution should build a block ideally within the first 2 seconds or so.

Edit: After reading your desc. again, i think you don't mean to pass config.secondsPerSlot as the wait time?

The maxBuildTimeSecs is a max upper bound to guarantee goroutine termination in case GetPayload is never called by the CL.

For the 2 sec example - the CL will call GetPayload at 2secs and that will interrupt block building.

somnergy · 2025-08-04T09:58:35Z

But then it's gonna keep polling till the end of the slot if txpool is honestly empty. The execution should build a block ideally within the first 2 seconds or so.
Edit: After reading your desc. again, i think you don't mean to pass config.secondsPerSlot as the wait time?

The maxBuildTimeSecs is a max upper bound to guarantee goroutine termination in case GetPayload is never called by the CL.

For the 2 sec example - the CL will call GetPayload at 2secs and that will interrupt block building.

It may be worth sparing 1 extra second (or half of it) towards the end, as it's unlikely that a block building request came in before the current slot, and block built in the last second is useless.
The timeout could coincide with the FCU timeout.
The timeout could also be counted for the seconds remaining in the slot: i.e. strictly not to exceed param.Timestamp + secondsPerSlot in NewBlockBuilder

taratorio · 2025-08-04T10:55:48Z

But then it's gonna keep polling till the end of the slot if txpool is honestly empty. The execution should build a block ideally within the first 2 seconds or so.
Edit: After reading your desc. again, i think you don't mean to pass config.secondsPerSlot as the wait time?

The maxBuildTimeSecs is a max upper bound to guarantee goroutine termination in case GetPayload is never called by the CL.
For the 2 sec example - the CL will call GetPayload at 2secs and that will interrupt block building.

It may be worth sparing 1 extra second (or half of it) towards the end, as it's unlikely that a block building request came in before the current slot, and block built in the last second is useless.

The timeout could coincide with the FCU timeout.

The timeout could also be counted for the seconds remaining in the slot: i.e. strictly not to exceed param.Timestamp + secondsPerSlot in NewBlockBuilder

The only reason I've added an upper bound is to preserve compute resources and guarantee that the builder goroutines terminate in the case the txpool never dries out and GetPayload is never called. Note that GetPayload not getting called is a rare scenario, so treat this as protection for an extreme case.

Regarding your points:

that's unnecessary complication which brings very little value - 1 sec more or less doesn't matter much - GetPayload will be called at the right time in 99.999% of the time
block building is an async process that terminates when you call GetPayload - don't see how that can be connected with FCU timeout?
this is similar to the first point - yes, we can make the timeout a bit stricter/narrower but I see very little value in doing this as this is just a mechanism to put an upper bound for an extreme and highly unlikely case (since GetPayload will be called in 99.999% of times)

somnergy · 2025-08-04T11:18:58Z

But then it's gonna keep polling till the end of the slot if txpool is honestly empty. The execution should build a block ideally within the first 2 seconds or so.
Edit: After reading your desc. again, i think you don't mean to pass config.secondsPerSlot as the wait time?

The maxBuildTimeSecs is a max upper bound to guarantee goroutine termination in case GetPayload is never called by the CL.
For the 2 sec example - the CL will call GetPayload at 2secs and that will interrupt block building.

It may be worth sparing 1 extra second (or half of it) towards the end, as it's unlikely that a block building request came in before the current slot, and block built in the last second is useless.

The timeout could coincide with the FCU timeout.

The timeout could also be counted for the seconds remaining in the slot: i.e. strictly not to exceed param.Timestamp + secondsPerSlot in NewBlockBuilder

The only reason I've added an upper bound is to preserve compute resources and guarantee that the builder goroutines terminate in the case the txpool never dries out and GetPayload is never called. Note that GetPayload not getting called is a rare scenario, so treat this as protection for an extreme case.

Regarding your points:
* that's unnecessary complication which brings very little value - 1 sec more or less doesn't matter much - GetPayload will be called at the right time in 99.999% of the time

* block building is an async process that terminates when you call GetPayload - don't see how that can be connected with FCU timeout?

* this is similar to the first point - yes, we can make the timeout a bit stricter/narrower but I see very little value in doing this as this is just a mechanism to put an upper bound for an extreme and highly unlikely case (since GetPayload will be called in 99.999% of times)

Well, first of all, 99.999% - that's not a number based on some data i can cross-check. So, I presume that's a crude measure of conviction for the argument. But I agree that in most cases getPayload should be called.
No, I think it's not going to bring a complication here to add a more precise timeout, since you are doing 50ms round trips, or 20 rounds per sec, and potentially 240 more rounds that previously were not there!
But, okay.

taratorio · 2025-08-04T11:36:16Z

and potentially 240 more rounds that previously were not there!

they still won't be there - because GetPayload will be called

Continuation of PR #16412 Co-authored-by: taratorio <[email protected]>

taratorio added 3 commits August 1, 2025 16:26

execution/builder: keep polling dry txnpool for txns until we are int…

1114d59

…errupted

execution/builder: keep polling dry txnpool for txns until we are int…

aaa375d

…errupted

execution/builder: keep polling dry txnpool for txns until we are int…

8da6c98

…errupted

taratorio requested review from eastorski, mh0lt, somnergy and yperbasis August 1, 2025 15:55

tidy

4b116c1

somnergy approved these changes Aug 4, 2025

View reviewed changes

taratorio merged commit 4e4f700 into main Aug 4, 2025
15 checks passed

taratorio deleted the block-building-use-full-interrupt-time branch August 4, 2025 11:33

yperbasis mentioned this pull request Jan 6, 2026

block builder: enhance polling dry txnpool until interrupted #18574

Merged

taratorio added a commit that referenced this pull request Jan 7, 2026

block builder: enhance polling dry txnpool until interrupted (#18574)

ebe427d

Continuation of PR #16412 Co-authored-by: taratorio <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

execution/builder: keep polling dry txnpool for txns until we are interrupted #16412

execution/builder: keep polling dry txnpool for txns until we are interrupted #16412

Uh oh!

taratorio commented Aug 1, 2025

Uh oh!

somnergy commented Aug 1, 2025 •

edited

Loading

Uh oh!

taratorio commented Aug 4, 2025 •

edited

Loading

Uh oh!

somnergy commented Aug 4, 2025

Uh oh!

taratorio commented Aug 4, 2025

Uh oh!

somnergy commented Aug 4, 2025

Uh oh!

Uh oh!

taratorio commented Aug 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

execution/builder: keep polling dry txnpool for txns until we are interrupted #16412

execution/builder: keep polling dry txnpool for txns until we are interrupted #16412

Uh oh!

Conversation

taratorio commented Aug 1, 2025

Uh oh!

somnergy commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

taratorio commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

somnergy commented Aug 4, 2025

Uh oh!

taratorio commented Aug 4, 2025

Uh oh!

somnergy commented Aug 4, 2025

Uh oh!

Uh oh!

taratorio commented Aug 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

somnergy commented Aug 1, 2025 •

edited

Loading

taratorio commented Aug 4, 2025 •

edited

Loading