Call ProcessNewBlock() asynchronously #16323

TheBlueMatt · 2019-07-02T15:47:38Z

This is a bit of a rework of #16175, which is a rework of #12934. Built on top of #16279.

This is a roughly-minimal patchset to get PNB processing blocks in the background in a clean way, however it doesn't actually enable any material parallelism yet - it only sets up a structure for significantly untangling cs_main in coming work.

Next steps:

Notably, ProcessMessages will now call PNB, get a future for the block being fully processed, and then immediately block on cs_main waiting to process reject messages and other CNodeState data. This is rather easy to resolve using the new CPeerState (see comment about how we should slowly move things into CPeerState to reduce cs_main contention).
In parallel, PNB (and, thus, CheckBlock, and the pre-store AcceptBlock bits) should start losing cs_main, this is primarily doable by splitting mapBlockIndex out of cs_main (probably via something like mempool.cs, where a cs_blockindex acts as a read mutex while writers take cs_main. Given the prevalence of cs_main calls just to look up entries in mapBlockIndex, this should improve things pretty significantly.

DrahtBot · 2019-07-02T16:38:10Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#16839 (Replace Connman and BanMan globals with Node local by ryanofsky)
#16834 (Fetch Headers over DNS by TheBlueMatt)
#16762 (Rust-based Backup over-REST block downloader by TheBlueMatt)
#16757 (doc: CChainState return values by MarcoFalke)
#16688 (log: Add validation interface logging by jkczyz)
#16442 (Serve BIP 157 compact filters by jimpo)
#16411 (Signet support by kallewoof)
#16333 (test: Set BIP34Height = 2 for regtest by MarcoFalke)
#16279 (Return the AcceptBlock CValidationState directly in ProcessNewBlock by TheBlueMatt)
#16202 (Refactor network message deserialization by jonasschnelli)
#15206 (Immediately disconnect on invalid net message checksum by jonasschnelli)
#14053 (Add address-based index (attempt 4?) by marcinja)
#8994 (Testchains: Introduce custom chain whose constructor... by jtimon)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

practicalswift · 2019-07-10T10:19:52Z

src/net_processing.cpp

Can PeerState(pfrom->GetId()) return nullptr here?

practicalswift · 2019-07-10T10:20:06Z

src/net_processing.cpp

Can PeerState(pfrom->GetId()) return nullptr here?

practicalswift · 2019-07-10T10:20:48Z

src/validation.h

Should be "completes" :-)

laanwj · 2019-07-10T10:39:43Z

This is a huge structural change. Please, let's discuss concepts and concept ACK here first before jumping to small nits on the current implementation.

TheBlueMatt · 2019-07-18T21:19:59Z

Working on a big structural cleanup, will reopen something new once I have a new PR series.

TheBlueMatt · 2019-07-30T18:07:03Z

Actually I was somewhat confused by some existing bugs triggering on travis. I think this is good as-is, original description applies.

promag

Concept ACK, at least the idea to separate net and validation sounds reasonable to me.

promag · 2019-08-06T01:17:10Z

src/rpc/mining.cpp

A wild block could be mined in between?

Right, but most likely something was mined before the new block got accepted, which is the right error for that. That said, I think there's a bug here that ChainActive().Tip() needs cs_main, which isn't held here.

9fdf05d resolved some lock inversion warnings in denialofservice_tests, but left in a number of cs_main locks that are unnecessary (introducing lock inversion warnings in future changes).

In practice this means that CheckBlock+ContextualCheckBlock are called with a passed-in CValidationState before we move onto connecting the best chain. This makes conceptual sense as these calls represent the DoS checks on a block (ie PoW and malleability) which the caller almost certainly wants to know about right away and shouldn't have to wait on a callback for (and other validationinterface clients shouldn't care about someone submitting bogus malleated blocks to PNB). This also makes it much, much easier to move the best chain activation logic to a background thread as it implies that if PNB returns with a IsValid() CValidationState we don't need to care about trying to process (non-malleated) copies of the block from other peers.

This prepares for making best-chain-activation and disk writes happen in a separate thread from the caller, even though all callsites currently block on the return value immediately.

CNodeState was added for validation-state-tracking, and thus, logically, was protected by cs_main. However, as it has grown to include non-validation state (taking state from CNode), and as we've reduced cs_main usage for other unrelated things, CNodeState is left with lots of cs_main locking in net_processing. In order to ease transition to something new, this adds only a dummy CPeerState which is held as a reference for the duration of message processing. Note that moving things is somewhat tricky pre validation-thread as a consistent lockorder must be kept - we can't take a lock on the new cs_peerstate in anything that's called directly from validation.

Essentially, our goal is to not process anything for the given peer until the block finishes processing (emulating the previous behavior) without actually blocking the ProcessMessages loops. Obviously, in most cases, we'll just go on to the next peer and immediately hit a cs_main lock, blocking us anyway, but this we can slowly improve that state over time by moving things from CNodeState to CPeerState.

Spawn a background thread at startup which validates each block as it comes in from ProcessNewBlock, taking advantage of the new std::future return value to keep tests simple (and the new net_processing handling of such values async already). This makes introducing subtle validationinterface deadlocks much harder as any locks held going into ProcessNewBlock do not interact with (in the form of lockorder restrictions) locks taken in validationinterface callbacks. Note that after this commit, feature_block and feature_assumevalid tests time out due to increased latency between block processing when those blocks do not represent a new best block. This will be resolved in the next commit.

This resolves the performance regression introduced in the previous commit by always waking the message processing thread after each block future resolves. Sadly, this is somewhat awkward - all other validationinterface callbacks represent an actual change to the global validation state, whereas this callback indicates only that a call which one validation "client" made has completed. After going back and forth for some time I didn't see a materially better way to resolve this issue, and luckily its a rather simple change, but its far from ideal. Note that because we absolutely do not want to ever block on a ProcessNewBlock-returned-future, the callback approach is critical.

To keep the API the same (and for simplicity of clients, ie net_processing), this splits AcceptBlock into the do-I-want-this stage, the checking stage, and the writing stage. ProcessNewBlock calls the do-I-want-this and checking (ie malleability checking) stuff, and then dumps blocks that pass into the background thread. In the background, we re-test the do-I-want-this logic but skip the checking stuff, before writing the block to disk and activating the best chain.

As reject messages are required to go out in-order (ie before any further messages are processed), this sadly requires that we further delay re-enabling a peer after a block has been processed by waiting for current validationinterface callbacks to drain. This commit enables further reduction of cs_main in net_processing by allowing us to lock cs_peerstate before cs_main in BlockChecked (ie allows us to move things which are accessed in BlockChecked, including DoS state and rejects into CPeerState and out of CNodeState).

This technically resolves a race where entries are added to mapBlockSource before we know that they're non-malleated and then removed only after PNB returns, though in practice this wasn't an issue since all access to mapBlockSource already held cs_peerstate.

DrahtBot · 2019-09-16T11:36:25Z

Needs rebase

ryanofsky · 2019-11-13T16:55:52Z

Marking up for grabs

http://www.erisian.com.au/bitcoin-core-dev/log-2019-11-12.html#l-394

<BlueMatt> #16323 and #16324 are up for grabs if anyone wants to work on them, but there seems to be ~zero interest in reviewing them, cause they have wonderfully scary titles (despite the code actually being pretty simple) :)

Also

http://www.erisian.com.au/bitcoin-core-dev/log-2019-11-13.html#l-30

<BlueMatt> jnewbery: note that "Add new peer state tracking class to reduce cs_main contention" only vaguely makes sense on its own - the vast majority of the contension is around cs_main for a reason...witout the rest of the patch to make ProcessNewBlock and the like not take cs_main it is a lot of code movement for not very much performance gain.

TheBlueMatt mentioned this pull request Jul 2, 2019

Get cs_main out of the critical path in ProcessMessages #16324

Closed

laanwj added the Validation label Jul 2, 2019

TheBlueMatt force-pushed the 2019-07-background-pnb branch 6 times, most recently from 2f64e4a to 5517e78 Compare July 3, 2019 22:46

practicalswift reviewed Jul 10, 2019

View reviewed changes

src/net_processing.cpp Outdated

Copy link

Contributor

practicalswift Jul 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can PeerState(pfrom->GetId()) return nullptr here?

practicalswift reviewed Jul 10, 2019

View reviewed changes

src/net_processing.cpp Outdated

Copy link

Contributor

practicalswift Jul 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can PeerState(pfrom->GetId()) return nullptr here?

fanquake added the Needs Conceptual Review label Jul 10, 2019

practicalswift reviewed Jul 10, 2019

View reviewed changes

src/validation.h Outdated

Copy link

Contributor

practicalswift Jul 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be "completes" :-)

DrahtBot added the Needs rebase label Jul 16, 2019

TheBlueMatt closed this Jul 18, 2019

TheBlueMatt reopened this Jul 30, 2019

TheBlueMatt force-pushed the 2019-07-background-pnb branch from 5517e78 to a7871c8 Compare July 30, 2019 18:03

DrahtBot removed the Needs rebase label Jul 30, 2019

promag reviewed Aug 6, 2019

View reviewed changes

DrahtBot added the Needs rebase label Aug 15, 2019

TheBlueMatt added 5 commits August 19, 2019 19:39

Remove unnecessary cs_mains in denialofservice_tests

688592f

9fdf05d resolved some lock inversion warnings in denialofservice_tests, but left in a number of cs_main locks that are unnecessary (introducing lock inversion warnings in future changes).

Make ProcessNewBlock return a future instead of an immediate bool

269a060

This prepares for making best-chain-activation and disk writes happen in a separate thread from the caller, even though all callsites currently block on the return value immediately.

TheBlueMatt force-pushed the 2019-07-background-pnb branch from a7871c8 to e11110b Compare August 20, 2019 00:07

DrahtBot removed the Needs rebase label Aug 20, 2019

TheBlueMatt added 5 commits August 19, 2019 23:52

TheBlueMatt force-pushed the 2019-07-background-pnb branch from e11110b to 3a3b13e Compare August 20, 2019 03:54

DrahtBot added the Needs rebase label Sep 16, 2019

TheBlueMatt mentioned this pull request Oct 15, 2019

Return the AcceptBlock CValidationState directly in ProcessNewBlock #16279

Closed

TheBlueMatt closed this Nov 12, 2019

ryanofsky added the Up for grabs label Nov 13, 2019

dongcarl mentioned this pull request May 12, 2020

[WIP] rebase: Call ProcessNewBlock() asynchronously #18963

Closed

2 tasks

fanquake removed Needs rebase Up for grabs labels May 13, 2020

jnewbery mentioned this pull request Aug 6, 2020

Move remaining application layer data to net processing #19398

Closed

16 tasks

bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022

Call ProcessNewBlock() asynchronously #16323

Call ProcessNewBlock() asynchronously #16323

Uh oh!

Conversation

TheBlueMatt commented Jul 2, 2019

Uh oh!

DrahtBot commented Jul 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Conflicts

Uh oh!

practicalswift Jul 10, 2019

Choose a reason for hiding this comment

Uh oh!

practicalswift Jul 10, 2019

Choose a reason for hiding this comment

Uh oh!

practicalswift Jul 10, 2019

Choose a reason for hiding this comment

Uh oh!

laanwj commented Jul 10, 2019

Uh oh!

TheBlueMatt commented Jul 18, 2019

Uh oh!

TheBlueMatt commented Jul 30, 2019

Uh oh!

promag left a comment

Choose a reason for hiding this comment

Uh oh!

promag Aug 6, 2019

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt Aug 6, 2019

Choose a reason for hiding this comment

Uh oh!

DrahtBot commented Sep 16, 2019

Uh oh!

ryanofsky commented Nov 13, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

DrahtBot commented Jul 2, 2019 •

edited

Loading