Skip to content

Etcm 290 invalid branches built by fetcher#787

Merged
mirkoAlic merged 9 commits intodevelopfrom
etcm-290-invalid-branches-built-by-fetcher
Nov 18, 2020
Merged

Etcm 290 invalid branches built by fetcher#787
mirkoAlic merged 9 commits intodevelopfrom
etcm-290-invalid-branches-built-by-fetcher

Conversation

@kapke
Copy link
Copy Markdown
Contributor

@kapke kapke commented Nov 9, 2020

Description

One of reasons for forks on cluster deployments is the fact that BlockFetcher builds block sequences which are invalid chanins, making them effectively impossible to import and thus - only local miner is able to provide blocks which are valid chain extension.

Proposed Solution

Validate header chains as they are appended to queue, to ensure they are possible to be imported.

Important Changes Introduced

Thing that was out of scope, but seemed to be affecting whole syncing story a lot - when receiving new blocks remove check for fetching.

Testing

Already tested regular sync of whole chain on Mordor - no issues at all.
Spun up a local cluster with 5 miners couple of times:

  • got one miner fork around block 400, later OS killed 2 containers due to lack of resources (including the forked one), rest 3 continued to work without any problems or forks for over 4000 blocks
  • another time - local cluster with 5 miners - no issues at all for ~200 blocks

Update

After resolving issue with fetcher being stuck on headers from rejected fork:

  • local cluster of 5 miners - no issues at all for ~2500 blocks
  • sync to Mordor - caught top and stayed on it over a night without visible issues

@kapke kapke requested review from mirkoAlic and ntallar November 9, 2020 12:40
@kapke kapke force-pushed the etcm-290-invalid-branches-built-by-fetcher branch from 30df020 to 6515c7f Compare November 9, 2020 12:41
Comment thread src/main/scala/io/iohk/ethereum/blockchain/sync/regular/BlockFetcherState.scala Outdated
@kapke kapke force-pushed the etcm-290-invalid-branches-built-by-fetcher branch from 83da7e5 to 0b54ea8 Compare November 10, 2020 12:09
Comment thread src/main/scala/io/iohk/ethereum/blockchain/sync/regular/BlockFetcher.scala Outdated
@kapke
Copy link
Copy Markdown
Contributor Author

kapke commented Nov 12, 2020

@ntallar As you can see in the code, there is blacklisting disabled when receiving headers response. I think I would:

  • create a JIRA ticket to revise whole response validation story and its relation to blacklisting
  • make blacklisting tests ignored and marked with that ticket
  • remove blacklisting line

WDYT?

Comment thread src/main/scala/io/iohk/ethereum/network/PeerActor.scala Outdated
@ntallar
Copy link
Copy Markdown

ntallar commented Nov 12, 2020

@ntallar As you can see in the code, there is blacklisting disabled when receiving headers response. I think I would:

I agree with that plan! Maybe something like #771 but for the headers

@kapke
Copy link
Copy Markdown
Contributor Author

kapke commented Nov 12, 2020

It seems that current state of this branch introduces yet another reason for forks - a scenario where:

  • fetcher receives a branch
  • some of these blocks get rejected by rest of the network
  • fetcher gets stuck because it's unable to fetch a branch which follows chain in queue

I see 2 solutions to that problem, the question is which seems to be better:

  • just make sure that the chain is contiguous in terms of block numbers - ledger will import whatever possible and then force to fetcher to clear state up in case of no longer continued branch (at cost of blacklisting honest peer)
  • count number of rejected responses and if it passes some threshold - reset fetcher state (or even restart the actor) so that invalid state is being invalidated and the accepted fork should be automatically picked

I'm leaning to the latter as it possibly could be used also in other scenarios where fetcher gets stuck for some reason

@ntallar
Copy link
Copy Markdown

ntallar commented Nov 12, 2020

Wouldn't that scenario evolve into:

  1. BlockImporter imports the blocks from fetcher so he removes it from there
  2. Fetcher would then be able to fetch the blocks from the fork (though they would be unconcatenable)
  3. When the blocks fail to be imported the InvalidateBlocksFrom message would kick in and result in the fetcher going back... eventually being able to solve the fork

The only problem I see is that blocks being mined in the middle are quite disruptive, as they invalidate the invalidation of the InvalidateBlocksFrom message. Maybe the fetcher should keep both

  • a last block: with the last block number imported
  • a block number to fetch

wdyt?

I wouldn't go for your alternatives, this code is already too complicated to add more workarounds, they seem too hacky imho:

  1. I'm not sure which of the components from the ledger rely on the blocks passed being a branch, something could break in the middle if you break that assumption.
  2. This is the safest of the two, though it only hides the problem as we are only restarting periodically

Comment thread src/main/scala/io/iohk/ethereum/blockchain/sync/regular/BlockFetcher.scala Outdated
Comment thread src/main/scala/io/iohk/ethereum/blockchain/sync/regular/BlockFetcher.scala Outdated
@kapke
Copy link
Copy Markdown
Contributor Author

kapke commented Nov 13, 2020

@ntallar Not really. In my case fetcher got stuck because of having a header from non-existing fork and being unable to make any progress.

As for approach #2 (cleaning up state/restarting) - indeed it doesn't solve the problem, but we still should be able to monitor such cases (log level error + monitoring) without leading to forking nodes. Please note that such approach to handling irrecoverable errors is at core of actors (which we use here) as well as is quite effective at keeping system operational.

@ntallar
Copy link
Copy Markdown

ntallar commented Nov 13, 2020

Please note that such approach to handling irrecoverable errors is at core of actors (which we use here) as well as is quite effective at keeping system operational.

I'm sorry but I'm still not convinced. It's definitely a good last resource in case we don't manage to fix an issue, but this hasn't yet become a blocker

@ntallar Not really. In my case fetcher got stuck because of having a header from non-existing fork and being unable to make any progress.

I see now the issue you mentioned! Shouldn't we do a better handling of our BlockBodies response to address it? A simple alternative is dropping headers from the waitingHeaders when a body of a header is requested but not received. Maybe it can be a next step to #771

@kapke kapke force-pushed the etcm-290-invalid-branches-built-by-fetcher branch 3 times, most recently from 1cd357b to a75e7cd Compare November 16, 2020 10:39
@kapke kapke marked this pull request as ready for review November 16, 2020 10:40
@kapke kapke requested review from mirkoAlic and ntallar November 16, 2020 10:45
@jmendiola222 jmendiola222 added bug Something isn't working high priority This PRs should be reviewed and merged ASAP labels Nov 17, 2020
@kapke kapke force-pushed the etcm-290-invalid-branches-built-by-fetcher branch from a75e7cd to a2839bf Compare November 17, 2020 11:55
Comment thread src/main/scala/io/iohk/ethereum/blockchain/sync/regular/BlockFetcher.scala Outdated
Comment thread src/main/scala/io/iohk/ethereum/blockchain/sync/regular/BlockFetcherState.scala Outdated
Comment thread src/main/scala/io/iohk/ethereum/blockchain/sync/regular/BlockFetcherState.scala Outdated
PeersClient.Response(fakePeer, BlockBodies(alternativeSecondBlocksBatch.drop(6).map(_.body)))
)

importer.send(blockFetcher, PickBlocks(100))
Copy link
Copy Markdown
Contributor

@mirkoAlic mirkoAlic Nov 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PickBlocks(100) sounds like magic number

case PeersClient.Request(msg, _, _) if msg == firstGetBlockBodiesRequest => peersClient.lastSender
}

// Block 5 is mined (we could have reached this stage due to invalidation messages sent to the fetcher)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think minedBlock.number == 16

@kapke kapke force-pushed the etcm-290-invalid-branches-built-by-fetcher branch 4 times, most recently from 2014d4c to 1d585e6 Compare November 18, 2020 13:53
@kapke kapke force-pushed the etcm-290-invalid-branches-built-by-fetcher branch from 1d585e6 to 02a4b1d Compare November 18, 2020 14:16
waitingHeaders = waiting
)
/**
* Matches bodies with headers in queue and adding matched bodies to the blocks.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇

Copy link
Copy Markdown
Contributor

@mirkoAlic mirkoAlic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Copy Markdown

@ntallar ntallar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Love the comments ❤️

 .\\            //.
. \ \          / /.
.\  ,\     /` /,.-
 -.   \  /'/ /  .
 ` -   `-'  \  -
   '.       /.\`
      -    .-
      :`//.'
      .`.'
      .' BP 

@mirkoAlic mirkoAlic merged commit 7a1323e into develop Nov 18, 2020
@mirkoAlic mirkoAlic deleted the etcm-290-invalid-branches-built-by-fetcher branch November 18, 2020 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working high priority This PRs should be reviewed and merged ASAP

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants