Skip to content

Conversation

@taratorio
Copy link
Member

@taratorio taratorio commented Sep 12, 2025

depends on #17103

relates to #17025 and #17041

fixes repetetive error:

EROR[09-12|19:15:35.479] Failed to build PoS block                err="[1/3 MiningCreateBlock] wrong head block: 07ac2d701bd19780fc614cb658170f600ce8dd5dc9f4d7558226f43b315ed6fc (current) vs b2d30a3b6226d335a3786207b63ea029f59339551b06bbc528eeb563fbd50f96 (requested)"

we get into this state when (reproduced in the accompanying test):

  1. we get a FCU for canonical block B at height H
  2. and then get a NewPayload for a side chain block B' at block height H
  3. and then get a FCU with head=B and payload attributes to start building block at H+1

the reason for that is:

  • when we process 2. we incorrectly unwind Erigon back to H-1 in order to validate B' at H (validating a chain as part of NewPayload should not affect the canonical chain, ie it should be "in-memory" or in disposable tmp space)
  • when we process 3. we get a quick payload status "VALID" for B at H (because the fork choice stored in our DB still points to B at H) and we then start building the new payload, however due to the incorrect unwind our chaindata DB is at H-1 and so we get the wrong head block error from the mining loop

the same error is seen in the issue with Prysm we had on fusaka-devnet-3 as described in #17025 (although this doesn't fully explain the wrong trie root we get when trying to processes subsequent canonical payloads, so there might be more unwind related bugs hidden elsewhere - tbd)

the same error fixes 1 out of 2 consistently failing tests (the /GetPayloadBodiesByRange \(Sidechain\) \(Paris\) one) that we have in the Hive ethereum/enginge withdrawals suite which we want to fix as per #17041

Ive also seen this error happen sporadically in our Kurtosis tests during reorgs making the tests flaky. This will stabilise those further too.

@taratorio taratorio marked this pull request as ready for review September 12, 2025 18:30
@taratorio taratorio marked this pull request as draft September 12, 2025 18:30
defer tx.Rollback()

tx, err := e.db.BeginRwNosync(ctx)
err = e.unwindToCommonCanonical(tx, header)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the fix is to do this in the disposable tx so the changes are not applied to the db - note tx.Rollback() is explicitly called few lines below

// Throw away the tx and start a new one (do not persist changes to the canonical chain)
tx.Rollback()

taratorio added a commit that referenced this pull request Sep 15, 2025
this is preparation for following PRs which add engine api tests (in
particular for reorg scenarios - relates to
#17025)

it extracts the MockCl from the shutter block building integration test
and generalises it a bit more so it is re-usable in more generic engine
api test scenarios (e.g. used in
#17105)

it also extracts the initialisation logic into a new EngineApiTester
which can also be re-used for different generic engine api test
scenarios (e.g. it can be used for the new "blockchain engine x test
format" - #16562)
Base automatically changed from mock-cl-engine-api-tests to main September 15, 2025 11:30
@taratorio taratorio changed the title [DO-NOT-MERGE] execution: fix incorrect unwinding when validating chain execution: fix incorrect unwinding when validating chain Sep 15, 2025
@taratorio taratorio marked this pull request as ready for review September 15, 2025 11:32
@taratorio taratorio merged commit bab18fe into main Sep 15, 2025
25 of 32 checks passed
@taratorio taratorio deleted the fix-unwinds-on-new-payload branch September 15, 2025 12:48
NazariiDenha pushed a commit that referenced this pull request Oct 24, 2025
this is preparation for following PRs which add engine api tests (in
particular for reorg scenarios - relates to
#17025)

it extracts the MockCl from the shutter block building integration test
and generalises it a bit more so it is re-usable in more generic engine
api test scenarios (e.g. used in
#17105)

it also extracts the initialisation logic into a new EngineApiTester
which can also be re-used for different generic engine api test
scenarios (e.g. it can be used for the new "blockchain engine x test
format" - #16562)
NazariiDenha pushed a commit that referenced this pull request Oct 24, 2025
depends on #17103

relates to #17025 and
#17041

fixes repetetive error:
```
EROR[09-12|19:15:35.479] Failed to build PoS block                err="[1/3 MiningCreateBlock] wrong head block: 07ac2d701bd19780fc614cb658170f600ce8dd5dc9f4d7558226f43b315ed6fc (current) vs b2d30a3b6226d335a3786207b63ea029f59339551b06bbc528eeb563fbd50f96 (requested)"
```

we get into this state when (reproduced in the accompanying test):
1. we get a FCU for canonical block B at height H
2. and then get a NewPayload for a side chain block B' at block height H
3. and then get a FCU with head=B and payload attributes to start
building block at H+1

the reason for that is:
- when we process 2. we incorrectly unwind Erigon back to H-1 in order
to validate B' at H (validating a chain as part of NewPayload should not
affect the canonical chain, ie it should be "in-memory" or in disposable
tmp space)
- when we process 3. we get a quick payload status "VALID" for B at H
(because the fork choice stored in our DB still points to B at H) and we
then start building the new payload, however due to the incorrect unwind
our chaindata DB is at H-1 and so we get the `wrong head block` error
from the mining loop

the same error is seen in the issue with Prysm we had on fusaka-devnet-3
as described in #17025
(although this doesn't fully explain the wrong trie root we get when
trying to processes subsequent canonical payloads, so there might be
more unwind related bugs hidden elsewhere - tbd)

the same error fixes 1 out of 2 consistently failing tests (the
`/GetPayloadBodiesByRange \(Sidechain\) \(Paris\)` one) that we have in
the Hive ethereum/enginge withdrawals suite which we want to fix as per
#17041

Ive also seen this error happen sporadically in our Kurtosis tests
during reorgs making the tests flaky. This will stabilise those further
too.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants