Skip to content

Conversation

@sipa
Copy link
Member

@sipa sipa commented Dec 8, 2013

This is a preparation for headers-first, which is on itself useful, but doesn't change the downloading or verification semantics yet.

It does change how the block connection logic works: it always makes sure the oldest valid chain with most work is connected. Instead of doing a "connect the received block now" atomically during process of that block, there is just a generic loop that disconnects and connects block to aim for the assumed best chain - reverting if necessary.

Let's see if pulltester likes this...

@laanwj
Copy link
Member

laanwj commented Dec 11, 2013

I've synced both the testnet and mainnet chain with this patch applied (from the network), I've had no problems, and no apparent difference in speed.

@sipa
Copy link
Member Author

sipa commented Dec 11, 2013

@laanwj It's not being able to sync I'm worried about. It must however also deal well with invalid blocks and orphans. I'll do a local pulltester run next weekend to see what goes wrong.

@sipa
Copy link
Member Author

sipa commented Dec 14, 2013

Rebased, and fixed two bugs that PullTester found (thanks, @TheBlueMatt!):

  • Blocks with potential corruption weren't removed from mapAlreadyAskedFor, so they weren't downloaded again.
  • When checking for a new most-work chain, only blocks that weren't in the previous most-work chain were checked for errors, instead of all blocks that weren't already active.

@sipa
Copy link
Member Author

sipa commented Dec 20, 2013

Can I have some reviews? @gavinandresen @laanwj @gmaxwell @jgarzik

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're dereferencing a pointer and putting it into a reference, making it impossible to check for NULL, which can be returned from State(). Could this ever be an issue?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main-specific node state is created when a CNode is created, and destroyed when it is deleted (called from constructor and destructor). This means that for an existing CNode (pto in this case), state will always exist. When using a stored NodeId, the chance exists that it has been deleted in the mean time.

@sipa
Copy link
Member Author

sipa commented Dec 22, 2013

Update: mapBlockSource now also stores source information for orphan blocks. That means that we can potentially assign them a DoS score if their block is found to be invalid later on. It also means we can potentially send them a "reject" message for it.

@sipa
Copy link
Member Author

sipa commented Dec 23, 2013

Reverted to an older version... let's see if pulltester likes it again :(

@TheBlueMatt
Copy link
Contributor

Are you sure it's not pull testers fault?

@sipa
Copy link
Member Author

sipa commented Dec 23, 2013

I think it is indeterminism. I've seen report 1, 6 and 8 blocks which differ. Maybe I was just lucky the first time that it worked as expected.

@TheBlueMatt
Copy link
Contributor

doesn't mean it's not pull testers not broken... It certainly doesn't take kindly to changing the download algorithm.

@sipa
Copy link
Member Author

sipa commented Dec 23, 2013

Right the now, the rule this branch should be following is that in case of multiple valid equal-work chains, the one where the tip block was received first is chosen.

@TheBlueMatt
Copy link
Contributor

I'm not sure that that's the same as master in all cases? though I doubt pull tester would identify the difference.

@sipa
Copy link
Member Author

sipa commented Dec 23, 2013

I'm not sure that's the same as master either, though I can't immediately comeup with a scenario where it differs.

The question is first: is this rule acceptable (imho, if it differs with the current behaviour, i prefer this clearer rule). If so, can we have pulltester tolerate it? :)

@TheBlueMatt
Copy link
Contributor

If the rule is different on any more than non existent cases it's a different network consensus which could be problematic. Ideally pull tester should identify any difference in network consensus unless we can come up with a proof that it can never matter.

@sipa
Copy link
Member Author

sipa commented Dec 23, 2013

Right, agree. Though it feels like we're not actually understanding well enough what the behaviour of the current code is, in non-trivial reorganization cases. My intuition says that as long as you select one of the valid highest-pow chains, and there is no way to make a node switch to an equal-work chain when it already has one, we should always converge - but perhaps we need to think harder about that.

In any case, I like the rule "the earliest seen of all highest-pow valid chains". Perhaps this needs to be discussed outside of this pullreq though.

@TheBlueMatt
Copy link
Contributor

Yes, no question this needs simulation and study. Until then, I'm not comfortable with any, even subtle, changes to the best chain selection.

@laanwj
Copy link
Member

laanwj commented Dec 24, 2013

Does headers-first download need a behavior change there, or is it just by accident and can it be corrected?

@sipa
Copy link
Member Author

sipa commented Dec 25, 2013

@laanwj I'm pretty sure it's possible to recreate the old behavior in a headers-first compatible implementation, but I'd rather not. I like having a clear rule about what the intended behavior is, and have that implemented.

After having thought a bit about this, this is a case where the old and new implementation differ, IIRC:

  • The following setup exists: A->B and A->C. Now an orphan E is announced (with unknown parent D that builds on B), D is requested, but in the mean time block F (building on C) arrives. We switch to chain A->C->F in any case, but if now D arrives, and E turns out to be invalid, the old implementation would remain on A->C->F, while the new implementation will switch to A->B->D as D arrived before F.
    I doubt this is the simplest scenario to trigger a difference, and it would surprise me if that was what PullTester observes.

I still like the "the earliest received of all valid maximal-PoW chains" rule, as it's easy to implement and stateless. The current rules depend on what the current best chain is.

@TheBlueMatt
Copy link
Contributor

Yes, actually I believe this (and similar cases) are what pull tester is tickling. The latest run tickles a few cases:

  • b8 is an invalid block built on an equal-work fork (b7). pull-tester expects to stay on b6 (the previous best chain), but this pull appears to switch to b7.
  • b11 is the same case with b11 being invalid in a different way (too much fee vs double-spend).
  • b13/b14 is:
        //     genesis -> b1 (0) -> b2 (1) -> b5 (2) -> b6  (3)
        //                                          \-> b12 (3) -> b13 (4) -> b14 (5)
        //                                              (b12 added last)

Only b14 is invalid so pull tester expects to select b13 but this pull differs somehow (its not entirely clear from the logs what the difference is, and this one is very likely to be pull-tester responding incorrectly to getblocks/getheaders, confusing the download process to be confused)

@sipa
Copy link
Member Author

sipa commented Dec 25, 2013

Seems there was a bug in my code - the Comparator for CBlockIndex objects compared pb->nSequenceId with itself rather than with pa->nSequenceId. I remember fixing this bug before - perhaps I lost some git commit. Let's see.

@sipa
Copy link
Member Author

sipa commented Dec 25, 2013

Ha...

@TheBlueMatt
Copy link
Contributor

Works for me

@sipa
Copy link
Member Author

sipa commented Dec 27, 2013

Merged two commits, and added some comments.

Also some earlier changes, which got lost because they were commented on by @mikehearn in commits that have since been rebased:

  • Changed to a global sequence number for received blocks instead of a timestamp.
  • Removed interruption points that could result in a non-best-known-chain to be observed externally.

sipa added 2 commits January 27, 2014 21:13
This changes the block processing logic from "try to atomically switch
to a new block" to a continuous "(dis)connect a block, aiming for the
assumed best chain".

This means the smallest atomic operations on the chainstate become
individual block connections or disconnections, instead of entire
reorganizations. It may mean that we try to reorganize to one block,
fail, and rereorganize again to the old block. This is slower, but
doesn't require unbounded RAM.

It also means that a ConnectBlock which fails may be no longer called
from the ProcessBlock which knows which node sent it. To deal with that,
a mapBlockSource is kept, and invalid blocks cause asynchronous "reject"
messages and banning (if necessary).
@BitcoinPullTester
Copy link

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/75f51f2a63e0ebe34ab290c2b7141dd240b98c3b for binaries and test log.
This test script verifies pulls every time they are updated. It, however, dies sometimes and fails to test properly. If you are waiting on a test, please check timestamps to verify that the test.log is moving at http://jenkins.bluematt.me/pull-tester/current/
Contact BlueMatt on freenode if something looks broken.

@gavinandresen
Copy link
Contributor

ACK.

@gavinandresen
Copy link
Contributor

Merging; got ACKs from me and @jgarzik in #3514

gavinandresen added a commit that referenced this pull request Jan 29, 2014
Prepare block connection logic for headers-first
@gavinandresen gavinandresen merged commit 3581abd into bitcoin:master Jan 29, 2014
Bushstar pushed a commit to Bushstar/omnicore that referenced this pull request Apr 8, 2020
This was always quite spammy and so far never useful in debugging.
maflcko pushed a commit that referenced this pull request Aug 30, 2021
0bd882b refactor: remove RecursiveMutex cs_nBlockSequenceId (Sebastian Falbesoner)

Pull request description:

  The RecursiveMutex `cs_nBlockSequenceId` is only used at one place in `CChainState::ReceivedBlockTransactions()` to atomically read-and-increment the nBlockSequenceId member:

  https://github.com/bitcoin/bitcoin/blob/83daf47898f8a79cb20d20316c64becd564cf54c/src/validation.cpp#L2973-L2976

  ~~For this simple use-case, we can make the member `std::atomic` instead to achieve the same result (see https://en.cppreference.com/w/cpp/atomic/atomic/operator_arith).~~

  ~~This is related to #19303. As suggested in the issue, I first planned to change the `RecursiveMutex` to `Mutex` (still possible if the change doesn't get Concept ACKs), but using a Mutex for this simple operation seems to be overkill. Note that at the time when this mutex was introduced (PR #3370, commit 75f51f2) `std::atomic` were not used in the codebase yet -- according to `git log -S std::atomic` they have first appeared in 2016 (commit 7e908c7), probably also because the compilers didn't support them properly earlier.~~

  At this point, the cs_main lock is set, hence we can use a plain int for the member and mark it as guarded by cs_main.

ACKs for top commit:
  Zero-1729:
    ACK 0bd882b
  promag:
    Code review ACK 0bd882b.
  hebasto:
    ACK 0bd882b

Tree-SHA512: 435271ac8f877074099ddb31436665b500e555f7cab899e5c8414af299b154d1249996be500e8fdeff64e4639bcaf7386e12510b738ec6f20e415e7e35afaea9
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Aug 30, 2021
0bd882b refactor: remove RecursiveMutex cs_nBlockSequenceId (Sebastian Falbesoner)

Pull request description:

  The RecursiveMutex `cs_nBlockSequenceId` is only used at one place in `CChainState::ReceivedBlockTransactions()` to atomically read-and-increment the nBlockSequenceId member:

  https://github.com/bitcoin/bitcoin/blob/83daf47898f8a79cb20d20316c64becd564cf54c/src/validation.cpp#L2973-L2976

  ~~For this simple use-case, we can make the member `std::atomic` instead to achieve the same result (see https://en.cppreference.com/w/cpp/atomic/atomic/operator_arith).~~

  ~~This is related to bitcoin#19303. As suggested in the issue, I first planned to change the `RecursiveMutex` to `Mutex` (still possible if the change doesn't get Concept ACKs), but using a Mutex for this simple operation seems to be overkill. Note that at the time when this mutex was introduced (PR bitcoin#3370, commit 75f51f2) `std::atomic` were not used in the codebase yet -- according to `git log -S std::atomic` they have first appeared in 2016 (commit 7e908c7), probably also because the compilers didn't support them properly earlier.~~

  At this point, the cs_main lock is set, hence we can use a plain int for the member and mark it as guarded by cs_main.

ACKs for top commit:
  Zero-1729:
    ACK 0bd882b
  promag:
    Code review ACK 0bd882b.
  hebasto:
    ACK 0bd882b

Tree-SHA512: 435271ac8f877074099ddb31436665b500e555f7cab899e5c8414af299b154d1249996be500e8fdeff64e4639bcaf7386e12510b738ec6f20e415e7e35afaea9
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants