Etcm 290 invalid branches built by fetcher#787
Conversation
30df020 to
6515c7f
Compare
83da7e5 to
0b54ea8
Compare
|
@ntallar As you can see in the code, there is blacklisting disabled when receiving headers response. I think I would:
WDYT? |
|
It seems that current state of this branch introduces yet another reason for forks - a scenario where:
I see 2 solutions to that problem, the question is which seems to be better:
I'm leaning to the latter as it possibly could be used also in other scenarios where fetcher gets stuck for some reason |
|
Wouldn't that scenario evolve into:
The only problem I see is that blocks being mined in the middle are quite disruptive, as they invalidate the invalidation of the
wdyt? I wouldn't go for your alternatives, this code is already too complicated to add more workarounds, they seem too hacky imho:
|
|
@ntallar Not really. In my case fetcher got stuck because of having a header from non-existing fork and being unable to make any progress. As for approach #2 (cleaning up state/restarting) - indeed it doesn't solve the problem, but we still should be able to monitor such cases (log level error + monitoring) without leading to forking nodes. Please note that such approach to handling irrecoverable errors is at core of actors (which we use here) as well as is quite effective at keeping system operational. |
I'm sorry but I'm still not convinced. It's definitely a good last resource in case we don't manage to fix an issue, but this hasn't yet become a blocker
I see now the issue you mentioned! Shouldn't we do a better handling of our BlockBodies response to address it? A simple alternative is dropping headers from the waitingHeaders when a body of a header is requested but not received. Maybe it can be a next step to #771 |
1cd357b to
a75e7cd
Compare
a75e7cd to
a2839bf
Compare
| PeersClient.Response(fakePeer, BlockBodies(alternativeSecondBlocksBatch.drop(6).map(_.body))) | ||
| ) | ||
|
|
||
| importer.send(blockFetcher, PickBlocks(100)) |
There was a problem hiding this comment.
PickBlocks(100) sounds like magic number
| case PeersClient.Request(msg, _, _) if msg == firstGetBlockBodiesRequest => peersClient.lastSender | ||
| } | ||
|
|
||
| // Block 5 is mined (we could have reached this stage due to invalidation messages sent to the fetcher) |
There was a problem hiding this comment.
i think minedBlock.number == 16
2014d4c to
1d585e6
Compare
…ding requests check when checking for top on fetch side
…ess response, improve peer message logging
… and current state of fetcher
1d585e6 to
02a4b1d
Compare
| waitingHeaders = waiting | ||
| ) | ||
| /** | ||
| * Matches bodies with headers in queue and adding matched bodies to the blocks. |
ntallar
left a comment
There was a problem hiding this comment.
LGTM! Love the comments ❤️
.\\ //.
. \ \ / /.
.\ ,\ /` /,.-
-. \ /'/ / .
` - `-' \ -
'. /.\`
- .-
:`//.'
.`.'
.' BP
Description
One of reasons for forks on cluster deployments is the fact that
BlockFetcherbuilds block sequences which are invalid chanins, making them effectively impossible to import and thus - only local miner is able to provide blocks which are valid chain extension.Proposed Solution
Validate header chains as they are appended to queue, to ensure they are possible to be imported.
Important Changes Introduced
Thing that was out of scope, but seemed to be affecting whole syncing story a lot - when receiving new blocks remove check for fetching.
Testing
Already tested regular sync of whole chain on Mordor - no issues at all.
Spun up a local cluster with 5 miners couple of times:
Update
After resolving issue with fetcher being stuck on headers from rejected fork: