Prepare block connection logic for headers-first #3370

sipa · 2013-12-08T19:36:16Z

This is a preparation for headers-first, which is on itself useful, but doesn't change the downloading or verification semantics yet.

It does change how the block connection logic works: it always makes sure the oldest valid chain with most work is connected. Instead of doing a "connect the received block now" atomically during process of that block, there is just a generic loop that disconnects and connects block to aim for the assumed best chain - reverting if necessary.

Let's see if pulltester likes this...

laanwj · 2013-12-11T06:18:05Z

I've synced both the testnet and mainnet chain with this patch applied (from the network), I've had no problems, and no apparent difference in speed.

sipa · 2013-12-11T10:49:02Z

@laanwj It's not being able to sync I'm worried about. It must however also deal well with invalid blocks and orphans. I'll do a local pulltester run next weekend to see what goes wrong.

sipa · 2013-12-14T14:38:19Z

Rebased, and fixed two bugs that PullTester found (thanks, @TheBlueMatt!):

Blocks with potential corruption weren't removed from mapAlreadyAskedFor, so they weren't downloaded again.
When checking for a new most-work chain, only blocks that weren't in the previous most-work chain were checked for errors, instead of all blocks that weren't already active.

sipa · 2013-12-20T21:39:50Z

Can I have some reviews? @gavinandresen @laanwj @gmaxwell @jgarzik

laanwj · 2013-12-21T09:45:18Z

src/main.cpp

You're dereferencing a pointer and putting it into a reference, making it impossible to check for NULL, which can be returned from State(). Could this ever be an issue?

The main-specific node state is created when a CNode is created, and destroyed when it is deleted (called from constructor and destructor). This means that for an existing CNode (pto in this case), state will always exist. When using a stored NodeId, the chance exists that it has been deleted in the mean time.

sipa · 2013-12-22T16:48:02Z

Update: mapBlockSource now also stores source information for orphan blocks. That means that we can potentially assign them a DoS score if their block is found to be invalid later on. It also means we can potentially send them a "reject" message for it.

sipa · 2013-12-23T22:43:20Z

Reverted to an older version... let's see if pulltester likes it again :(

TheBlueMatt · 2013-12-23T23:09:11Z

Are you sure it's not pull testers fault?

sipa · 2013-12-23T23:14:04Z

I think it is indeterminism. I've seen report 1, 6 and 8 blocks which differ. Maybe I was just lucky the first time that it worked as expected.

TheBlueMatt · 2013-12-23T23:15:57Z

doesn't mean it's not pull testers not broken... It certainly doesn't take kindly to changing the download algorithm.

sipa · 2013-12-23T23:19:01Z

Right the now, the rule this branch should be following is that in case of multiple valid equal-work chains, the one where the tip block was received first is chosen.

TheBlueMatt · 2013-12-23T23:26:07Z

I'm not sure that that's the same as master in all cases? though I doubt pull tester would identify the difference.

sipa · 2013-12-23T23:30:10Z

I'm not sure that's the same as master either, though I can't immediately comeup with a scenario where it differs.

The question is first: is this rule acceptable (imho, if it differs with the current behaviour, i prefer this clearer rule). If so, can we have pulltester tolerate it? :)

TheBlueMatt · 2013-12-23T23:34:44Z

If the rule is different on any more than non existent cases it's a different network consensus which could be problematic. Ideally pull tester should identify any difference in network consensus unless we can come up with a proof that it can never matter.

sipa · 2013-12-23T23:43:55Z

Right, agree. Though it feels like we're not actually understanding well enough what the behaviour of the current code is, in non-trivial reorganization cases. My intuition says that as long as you select one of the valid highest-pow chains, and there is no way to make a node switch to an equal-work chain when it already has one, we should always converge - but perhaps we need to think harder about that.

In any case, I like the rule "the earliest seen of all highest-pow valid chains". Perhaps this needs to be discussed outside of this pullreq though.

TheBlueMatt · 2013-12-24T00:03:06Z

Yes, no question this needs simulation and study. Until then, I'm not comfortable with any, even subtle, changes to the best chain selection.

laanwj · 2013-12-24T07:52:34Z

Does headers-first download need a behavior change there, or is it just by accident and can it be corrected?

sipa · 2013-12-25T20:04:49Z

@laanwj I'm pretty sure it's possible to recreate the old behavior in a headers-first compatible implementation, but I'd rather not. I like having a clear rule about what the intended behavior is, and have that implemented.

After having thought a bit about this, this is a case where the old and new implementation differ, IIRC:

The following setup exists: A->B and A->C. Now an orphan E is announced (with unknown parent D that builds on B), D is requested, but in the mean time block F (building on C) arrives. We switch to chain A->C->F in any case, but if now D arrives, and E turns out to be invalid, the old implementation would remain on A->C->F, while the new implementation will switch to A->B->D as D arrived before F.
I doubt this is the simplest scenario to trigger a difference, and it would surprise me if that was what PullTester observes.

I still like the "the earliest received of all valid maximal-PoW chains" rule, as it's easy to implement and stateless. The current rules depend on what the current best chain is.

TheBlueMatt · 2013-12-25T20:26:21Z

Yes, actually I believe this (and similar cases) are what pull tester is tickling. The latest run tickles a few cases:

b8 is an invalid block built on an equal-work fork (b7). pull-tester expects to stay on b6 (the previous best chain), but this pull appears to switch to b7.
b11 is the same case with b11 being invalid in a different way (too much fee vs double-spend).
b13/b14 is:

        //     genesis -> b1 (0) -> b2 (1) -> b5 (2) -> b6  (3)
        //                                          \-> b12 (3) -> b13 (4) -> b14 (5)
        //                                              (b12 added last)

Only b14 is invalid so pull tester expects to select b13 but this pull differs somehow (its not entirely clear from the logs what the difference is, and this one is very likely to be pull-tester responding incorrectly to getblocks/getheaders, confusing the download process to be confused)

sipa · 2013-12-25T22:41:10Z

Seems there was a bug in my code - the Comparator for CBlockIndex objects compared pb->nSequenceId with itself rather than with pa->nSequenceId. I remember fixing this bug before - perhaps I lost some git commit. Let's see.

sipa · 2013-12-25T22:56:18Z

Ha...

TheBlueMatt · 2013-12-25T23:05:44Z

Works for me

sipa · 2013-12-27T15:40:38Z

Merged two commits, and added some comments.

Also some earlier changes, which got lost because they were commented on by @mikehearn in commits that have since been rebased:

Changed to a global sequence number for received blocks instead of a timestamp.
Removed interruption points that could result in a non-best-known-chain to be observed externally.

This changes the block processing logic from "try to atomically switch to a new block" to a continuous "(dis)connect a block, aiming for the assumed best chain". This means the smallest atomic operations on the chainstate become individual block connections or disconnections, instead of entire reorganizations. It may mean that we try to reorganize to one block, fail, and rereorganize again to the old block. This is slower, but doesn't require unbounded RAM. It also means that a ConnectBlock which fails may be no longer called from the ProcessBlock which knows which node sent it. To deal with that, a mapBlockSource is kept, and invalid blocks cause asynchronous "reject" messages and banning (if necessary).

BitcoinPullTester · 2014-01-27T21:35:36Z

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/75f51f2a63e0ebe34ab290c2b7141dd240b98c3b for binaries and test log.
This test script verifies pulls every time they are updated. It, however, dies sometimes and fails to test properly. If you are waiting on a test, please check timestamps to verify that the test.log is moving at http://jenkins.bluematt.me/pull-tester/current/
Contact BlueMatt on freenode if something looks broken.

gavinandresen · 2014-01-29T21:48:44Z

ACK.

gavinandresen · 2014-01-29T21:49:12Z

Merging; got ACKs from me and @jgarzik in #3514

Prepare block connection logic for headers-first

This was always quite spammy and so far never useful in debugging.

0bd882b refactor: remove RecursiveMutex cs_nBlockSequenceId (Sebastian Falbesoner) Pull request description: The RecursiveMutex `cs_nBlockSequenceId` is only used at one place in `CChainState::ReceivedBlockTransactions()` to atomically read-and-increment the nBlockSequenceId member: https://github.com/bitcoin/bitcoin/blob/83daf47898f8a79cb20d20316c64becd564cf54c/src/validation.cpp#L2973-L2976 ~~For this simple use-case, we can make the member `std::atomic` instead to achieve the same result (see https://en.cppreference.com/w/cpp/atomic/atomic/operator_arith).~~ ~~This is related to #19303. As suggested in the issue, I first planned to change the `RecursiveMutex` to `Mutex` (still possible if the change doesn't get Concept ACKs), but using a Mutex for this simple operation seems to be overkill. Note that at the time when this mutex was introduced (PR #3370, commit 75f51f2) `std::atomic` were not used in the codebase yet -- according to `git log -S std::atomic` they have first appeared in 2016 (commit 7e908c7), probably also because the compilers didn't support them properly earlier.~~ At this point, the cs_main lock is set, hence we can use a plain int for the member and mark it as guarded by cs_main. ACKs for top commit: Zero-1729: ACK 0bd882b promag: Code review ACK 0bd882b. hebasto: ACK 0bd882b Tree-SHA512: 435271ac8f877074099ddb31436665b500e555f7cab899e5c8414af299b154d1249996be500e8fdeff64e4639bcaf7386e12510b738ec6f20e415e7e35afaea9

0bd882b refactor: remove RecursiveMutex cs_nBlockSequenceId (Sebastian Falbesoner) Pull request description: The RecursiveMutex `cs_nBlockSequenceId` is only used at one place in `CChainState::ReceivedBlockTransactions()` to atomically read-and-increment the nBlockSequenceId member: https://github.com/bitcoin/bitcoin/blob/83daf47898f8a79cb20d20316c64becd564cf54c/src/validation.cpp#L2973-L2976 ~~For this simple use-case, we can make the member `std::atomic` instead to achieve the same result (see https://en.cppreference.com/w/cpp/atomic/atomic/operator_arith).~~ ~~This is related to bitcoin#19303. As suggested in the issue, I first planned to change the `RecursiveMutex` to `Mutex` (still possible if the change doesn't get Concept ACKs), but using a Mutex for this simple operation seems to be overkill. Note that at the time when this mutex was introduced (PR bitcoin#3370, commit 75f51f2) `std::atomic` were not used in the codebase yet -- according to `git log -S std::atomic` they have first appeared in 2016 (commit 7e908c7), probably also because the compilers didn't support them properly earlier.~~ At this point, the cs_main lock is set, hence we can use a plain int for the member and mark it as guarded by cs_main. ACKs for top commit: Zero-1729: ACK 0bd882b promag: Code review ACK 0bd882b. hebasto: ACK 0bd882b Tree-SHA512: 435271ac8f877074099ddb31436665b500e555f7cab899e5c8414af299b154d1249996be500e8fdeff64e4639bcaf7386e12510b738ec6f20e415e7e35afaea9

laanwj reviewed Dec 21, 2013
View reviewed changes

sipa mentioned this pull request Jan 11, 2014

Per-peer block tracking, stalled block download detection, orphan pool limiting #3514

Merged

sipa added 2 commits January 27, 2014 21:13

Move only: extract WriteChainState and UpdatedTip from SetBestChain.

0ec16f3

gavinandresen added a commit that referenced this pull request Jan 29, 2014

Merge pull request #3370 from sipa/headersfirst3

3581abd

Prepare block connection logic for headers-first

gavinandresen merged commit 3581abd into bitcoin:master Jan 29, 2014

sipa mentioned this pull request Mar 18, 2014

Speed limit / throttle network usage #273

Closed

sipa mentioned this pull request Jul 6, 2014

Bugfix: send rejects and apply DoS scoring for errors in direct block validation. #4471

Merged

Bushstar pushed a commit to Bushstar/omnicore that referenced this pull request Apr 8, 2020

Remove logging for waking of select() (bitcoin#3370)

f2ece10

This was always quite spammy and so far never useful in debugging.

theStack mentioned this pull request Aug 28, 2021

refactor: remove RecursiveMutex cs_nBlockSequenceId #22824

Merged

bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021

Prepare block connection logic for headers-first #3370

Prepare block connection logic for headers-first #3370

Uh oh!

Conversation

sipa commented Dec 8, 2013

Uh oh!

laanwj commented Dec 11, 2013

Uh oh!

sipa commented Dec 11, 2013

Uh oh!

sipa commented Dec 14, 2013

Uh oh!

sipa commented Dec 20, 2013

Uh oh!

laanwj Dec 21, 2013

Choose a reason for hiding this comment

Uh oh!

sipa Dec 22, 2013

Choose a reason for hiding this comment

Uh oh!

sipa commented Dec 22, 2013

Uh oh!

sipa commented Dec 23, 2013

Uh oh!

TheBlueMatt commented Dec 23, 2013

Uh oh!

sipa commented Dec 23, 2013

Uh oh!

TheBlueMatt commented Dec 23, 2013

Uh oh!

sipa commented Dec 23, 2013

Uh oh!

TheBlueMatt commented Dec 23, 2013

Uh oh!

sipa commented Dec 23, 2013

Uh oh!

TheBlueMatt commented Dec 23, 2013

Uh oh!

sipa commented Dec 23, 2013

Uh oh!

TheBlueMatt commented Dec 24, 2013

Uh oh!

laanwj commented Dec 24, 2013

Uh oh!

sipa commented Dec 25, 2013

Uh oh!

TheBlueMatt commented Dec 25, 2013

Uh oh!

sipa commented Dec 25, 2013

Uh oh!

sipa commented Dec 25, 2013

Uh oh!

TheBlueMatt commented Dec 25, 2013

Uh oh!

sipa commented Dec 27, 2013

Uh oh!

BitcoinPullTester commented Jan 27, 2014

Uh oh!

gavinandresen commented Jan 29, 2014

Uh oh!

gavinandresen commented Jan 29, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants