-
Notifications
You must be signed in to change notification settings - Fork 1.5k
execution/stagedsync: handle sync loop block limit exhaustion #16268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe as this is about db perseveration we should use to break the loop:
executor.readState().SizeEstimate() >= commitThreshold
rather than a hard block count:
if blockNum-startBlockNum+1 >= uint64(cfg.syncCfg.LoopBlockLimit) {
loopBlockLimitExhausted = true
break
}
I think that otherwise we'll need to adjust this by circumstance - which means we'll end up breaking more often than we need to for small block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's something we can improve in the future, dont think it should be done in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean we're lifting this:
err = e.executionPipeline.RunPrune(e.db, tx, initialCycle)
If so doesn't it imply that this loop belongs there rather than here ?
One other thing I'm not sure of the implication here is that we always run execution with an external tx ? If that is the case we can probably remove the tx handling from exec. (See my comment below around the loop break arbitrator).
One thing that occurs to me is that if we do this we could possibly move flushing upto this level as well, so we do flush and prune in the same place. I think this leads to better resource control. It may also mean that we want to pass a shared domain into the exec process - as that is where updates are currently cached prior to flushing.
One think to note about that - we could move flush/write to a separate goroutine as all the other actions in exec apart from flushing only need a RO transaction - so there is no need for them to run in the same routine.
I think that the SD could now be contained in a TemporalRO tx - but not completely sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean we're lifting this:
err = e.executionPipeline.RunPrune(e.db, tx, initialCycle)
If so doesn't it imply that this loop belongs there rather than here ?
Not sure what you mean by this, we're already calling e.executionPipeline.RunPrune in FCU (it is done in the end of the request processing, in a background goroutine) - so nothing is really lifted.
If we've had a case where hasMore=true it means that we've exhausted the exec loop and we should run prune after it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One other thing I'm not sure of the implication here is that we always run execution with an external tx ? If that is the case we can probably remove the tx handling from exec. (See my comment below around the loop break arbitrator).
We should probably keep the internal/external tx concept in the loop. I think FCU at the moment always calls with external tx. However that is easy to improve/change (in a future PR). For example, if we are processing only 1 block (at chain tip) we can pass tx=db.RwTx (external tx) to the exec loop. If we are not on chain tip we can pass an tx=nil (internal tx). But we need to first analyse if that is necessary thing to do or not.
|
fyi 1e97169 stops ExecV3 from terminating earlier. |
not really, that commit just delays the early termination by +32 blocks (changesetSafeRange) |
we recently merged #16268 which should fix the flaky test - let's give it a try and see how it goes
ProcessNewBlocks runs 10,000 blocks on start up after rm chaindata (doesn't take into account sync.loop.block.limit) - causes quite a lot of bloat on bloatnet Also Mark reported similar issue while working on parallel exec but for UpdateForkChoice Leaving this in draft for now - need to run some more tests
#16484) cherry-pick dc3413d ProcessNewBlocks runs 10,000 blocks on start up after rm chaindata (doesn't take into account sync.loop.block.limit) - causes quite a lot of bloat on bloatnet Also Mark reported similar issue while working on parallel exec but for UpdateForkChoice Leaving this in draft for now - need to run some more tests
ProcessNewBlocks runs 10,000 blocks on start up after rm chaindata (doesn't take into account sync.loop.block.limit) - causes quite a lot of bloat on bloatnet
Also Mark reported similar issue while working on parallel exec but for UpdateForkChoice
Leaving this in draft for now - need to run some more tests