Skip to content

Conversation

@taratorio
Copy link
Member

@taratorio taratorio commented Jul 24, 2025

ProcessNewBlocks runs 10,000 blocks on start up after rm chaindata (doesn't take into account sync.loop.block.limit) - causes quite a lot of bloat on bloatnet
Also Mark reported similar issue while working on parallel exec but for UpdateForkChoice
Leaving this in draft for now - need to run some more tests

@taratorio taratorio changed the title execution/stagedsync: handle sync loop block limit exhaustion [DO-NOT-MERGE] execution/stagedsync: handle sync loop block limit exhaustion Jul 24, 2025
@taratorio taratorio marked this pull request as ready for review July 24, 2025 15:11
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe as this is about db perseveration we should use to break the loop:

executor.readState().SizeEstimate() >= commitThreshold

rather than a hard block count:

if blockNum-startBlockNum+1 >= uint64(cfg.syncCfg.LoopBlockLimit) {
			loopBlockLimitExhausted = true
			break
}

I think that otherwise we'll need to adjust this by circumstance - which means we'll end up breaking more often than we need to for small block

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's something we can improve in the future, dont think it should be done in this PR

Copy link
Contributor

@mh0lt mh0lt Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we're lifting this:

err = e.executionPipeline.RunPrune(e.db, tx, initialCycle)

If so doesn't it imply that this loop belongs there rather than here ?

One other thing I'm not sure of the implication here is that we always run execution with an external tx ? If that is the case we can probably remove the tx handling from exec. (See my comment below around the loop break arbitrator).

One thing that occurs to me is that if we do this we could possibly move flushing upto this level as well, so we do flush and prune in the same place. I think this leads to better resource control. It may also mean that we want to pass a shared domain into the exec process - as that is where updates are currently cached prior to flushing.

One think to note about that - we could move flush/write to a separate goroutine as all the other actions in exec apart from flushing only need a RO transaction - so there is no need for them to run in the same routine.

I think that the SD could now be contained in a TemporalRO tx - but not completely sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we're lifting this:

err = e.executionPipeline.RunPrune(e.db, tx, initialCycle)

If so doesn't it imply that this loop belongs there rather than here ?

Not sure what you mean by this, we're already calling e.executionPipeline.RunPrune in FCU (it is done in the end of the request processing, in a background goroutine) - so nothing is really lifted.

If we've had a case where hasMore=true it means that we've exhausted the exec loop and we should run prune after it.

Copy link
Member Author

@taratorio taratorio Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other thing I'm not sure of the implication here is that we always run execution with an external tx ? If that is the case we can probably remove the tx handling from exec. (See my comment below around the loop break arbitrator).

We should probably keep the internal/external tx concept in the loop. I think FCU at the moment always calls with external tx. However that is easy to improve/change (in a future PR). For example, if we are processing only 1 block (at chain tip) we can pass tx=db.RwTx (external tx) to the exec loop. If we are not on chain tip we can pass an tx=nil (internal tx). But we need to first analyse if that is necessary thing to do or not.

@sudeepdino008
Copy link
Member

sudeepdino008 commented Jul 26, 2025

fyi 1e97169 stops ExecV3 from terminating earlier.

@taratorio
Copy link
Member Author

fyi 1e97169 stops ExecV3 from terminating earlier.

not really, that commit just delays the early termination by +32 blocks (changesetSafeRange)

@taratorio taratorio changed the title [DO-NOT-MERGE] execution/stagedsync: handle sync loop block limit exhaustion execution/stagedsync: handle sync loop block limit exhaustion Jul 30, 2025
@taratorio taratorio merged commit dc3413d into main Aug 1, 2025
14 of 15 checks passed
@taratorio taratorio deleted the loop-block-lim-exhausted branch August 1, 2025 14:39
taratorio added a commit that referenced this pull request Aug 5, 2025
we recently merged #16268 which
should fix the flaky test - let's give it a try and see how it goes
taratorio added a commit that referenced this pull request Aug 7, 2025
ProcessNewBlocks runs 10,000 blocks on start up after rm chaindata
(doesn't take into account sync.loop.block.limit) - causes quite a lot
of bloat on bloatnet
Also Mark reported similar issue while working on parallel exec but for
UpdateForkChoice
Leaving this in draft for now - need to run some more tests
taratorio added a commit that referenced this pull request Aug 7, 2025
#16484)

cherry-pick dc3413d

ProcessNewBlocks runs 10,000 blocks on start up after rm chaindata
(doesn't take into account sync.loop.block.limit) - causes quite a lot
of bloat on bloatnet
Also Mark reported similar issue while working on parallel exec but for
UpdateForkChoice
Leaving this in draft for now - need to run some more tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants