Skip to content

Conversation

@sipa
Copy link
Member

@sipa sipa commented Mar 16, 2014

There is a small semantic change here, namely that blocks with invalid transaction data (eg duplicate hashes, non-first coinbase, ...) will have their headers accepted into the tree (but never considered for connecting).

That's an inevitable consequence of moving towards headers-first, as we'll do header connectivity checkinng before transaction checking anyway (the transaction data won't be available until later, so we rather do more extensive checking on the headers that are available first).

This has very low DoS risk, as it still requires faking PoW.

@luke-jr
Copy link
Member

luke-jr commented Mar 16, 2014

This doesn't change the actual behaviour of the pre-existing functions, correct? So, eg, unit test assumptions remain valid..

@sipa
Copy link
Member Author

sipa commented Mar 16, 2014

A better summary of the changes:

  • before: standalone headers/tx checks, (orphan store), tree headers/tx checks, store block, connect
  • after: standalone headers checks, tree headers checks, store headers, standalone tx checks, (orphan store), tree tx checks, store block, connect

In a later pull request, the orphan storage will be removed entirely, as blocks can be stored with just headers validated but not transactions (potentially resulting in out-of-order block storagre).

@sipa
Copy link
Member Author

sipa commented Mar 17, 2014

Please take your time, and ask for clarification as necessary.

@sipa
Copy link
Member Author

sipa commented Mar 29, 2014

There is a bug in this code, which prevents syncing. Closing until fixed.

@sipa sipa closed this Mar 29, 2014
@sipa sipa reopened this Apr 7, 2014
@sipa
Copy link
Member Author

sipa commented Apr 8, 2014

\o/ segfault

@sipa
Copy link
Member Author

sipa commented Apr 8, 2014

Bug fixed.

@luke-jr I don't think any assumptions that are used in unit tests change. The only real change is that now CBlockIndex entries in the block tree may exist that do not have actual block data associated with them.

@mikehearn
Copy link
Contributor

I did a review, comments ended up on the origin commit rather than here - oops.

Overall looks good, just would appreciate some more comments and helper methods in a few places to make the logic easier to follow and reduce the chances of typos with all the bit twiddling.

@laanwj laanwj added this to the 0.10.0 milestone Apr 11, 2014
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these be constants perhaps in CChainParams?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, but let's do that in another PR.

@rdponticelli
Copy link
Contributor

There seems to be some bug on the connection logic in this pull. I got a node stuck after it got shutdown during syncing. After launching it again, all the remaining blocks got marked as orphans, and it never resumed syncing. Shutting down and launching again several times didn't help.

@sipa
Copy link
Member Author

sipa commented Apr 16, 2014

@rdponticelli With a recent version of this code (less than 8 days old?)

@rdponticelli
Copy link
Contributor

@sipa Yes, last version. Culprit seems to be some kind of race condition with the low disk space code. Some logs:

2014-04-16 14:18:49 UpdateTip: new best=0000000000000000937bdca5485b8a02da4cbfea74fe982903a2da1ba8537866 height=295933 log2_work=77.988128 tx=36873027 date=2014-04-15 07:18:51 progress=0.993805
2014-04-16 14:18:49 ProcessBlock: ACCEPTED
2014-04-16 14:18:49 *** Error: Disk space is low!
2014-04-16 14:19:03 ERROR: AcceptBlock() : FindBlockPos failed
2014-04-16 14:19:03 ERROR: ProcessBlock() : AcceptBlock FAILED
2014-04-16 14:19:03 Requesting shutdown
2014-04-16 14:19:03 ProcessBlock: ORPHAN BLOCK 0, prev=0000000000000000a89d601cd42705cfe8a25b6546ed892309c67e5a197e3fdb
2014-04-16 14:19:03 ProcessBlock: ORPHAN BLOCK 1, prev=000000000000000005ea1ca8a944a574826bd856b83a29871f46c8e2a2323deb
2014-04-16 14:19:03 Running Shutdown in thread

After that all the chain is detected as orphan:

2014-04-16 14:58:04 Initialization result: 1
2014-04-16 14:58:14 ProcessBlock: ORPHAN BLOCK 0, prev=0000000000000000a89d601cd42705cfe8a25b6546ed892309c67e5a197e3fdb
2014-04-16 14:58:15 ProcessBlock: ORPHAN BLOCK 1, prev=000000000000000005ea1ca8a944a574826bd856b83a29871f46c8e2a2323deb
2014-04-16 14:58:17 ProcessBlock: ORPHAN BLOCK 2, prev=00000000000000001179ce5490d2ff833c0afb1799b6beb0981096c9800002c8
2014-04-16 14:58:17 ProcessBlock: ORPHAN BLOCK 3, prev=00000000000000001b4fd2aaec801ac068ea2a39a4b9f9eb16569a69c6713d6f
2014-04-16 14:58:18 ProcessBlock: ORPHAN BLOCK 4, prev=000000000000000023524d33b27b7ea7135332b32853a7582cf504e20644ea48
2014-04-16 14:58:19 ProcessBlock: ORPHAN BLOCK 5, prev=0000000000000000a5344e5239e17ae8995d3c7c1417b38d0dc1e047595a3807

@rdponticelli
Copy link
Contributor

Running the code in master in this state I got:

2014-04-16 17:10:31 init message: Done loading
2014-04-16 17:10:31 Initialization result: 1
2014-04-16 17:10:41 Pre-allocating up to position 0x7000000 in blk00132.dat
2014-04-16 17:10:47 ERROR: ReadBlockFromDisk : OpenBlockFile failed
2014-04-16 17:10:47 *** Failed to read block
2014-04-16 17:10:51 ERROR: AcceptBlock() : AddToBlockIndex failed
2014-04-16 17:10:51 ERROR: ProcessBlock() : AcceptBlock FAILED
2014-04-16 17:10:51 ERROR: ReadBlockFromDisk : OpenBlockFile failed
2014-04-16 17:10:51 *** Failed to read block
2014-04-16 17:10:51 ERROR: AcceptBlock() : AddToBlockIndex failed
2014-04-16 17:10:51 ERROR: ProcessBlock() : AcceptBlock FAILED
2014-04-16 17:10:51 Requesting shutdown
2014-04-16 17:10:51 Running Shutdown in thread
2014-04-16 17:10:51 ERROR: ReadBlockFromDisk : OpenBlockFile failed
2014-04-16 17:10:51 *** Failed to read block
2014-04-16 17:10:51 addcon thread interrupt
2014-04-16 17:10:51 net thread interrupt
2014-04-16 17:10:51 dumpaddr thread stop
2014-04-16 17:10:51 opencon thread interrupt
2014-04-16 17:10:53 ERROR: AcceptBlock() : AddToBlockIndex failed
2014-04-16 17:10:53 ERROR: ProcessBlock() : AcceptBlock FAILED
2014-04-16 17:10:53 msghand thread interrupt
2014-04-16 17:10:53 Shutdown : In progress...

And after that the new code again:

2014-04-16 17:45:19 init message: Done loading
2014-04-16 17:45:19 Initialization result: 1
2014-04-16 17:45:28 ProcessBlock: ACCEPTED
2014-04-16 17:45:31 ProcessBlock: ACCEPTED
2014-04-16 17:45:31 CheckForkWarningConditions: Warning: Found invalid chain at least ~6 blocks longer than our best chain.
Chain state database corruption likely.
2014-04-16 17:45:31 ProcessBlock: ACCEPTED
2014-04-16 17:45:31 CheckForkWarningConditions: Warning: Large valid fork found
forking the chain at height 295933 (0000000000000000937bdca5485b8a02da4cbfea74fe982903a2da1ba8537866)
lasting to height 295941 (00000000000000002cda160c4452e7ad5252453a969dc6ff9b6e9e8bb584093f).
Chain state database corruption likely.
2014-04-16 17:45:31 ProcessBlock: ACCEPTED
2014-04-16 17:45:33 CheckForkWarningConditions: Warning: Large valid fork found
forking the chain at height 295933 (0000000000000000937bdca5485b8a02da4cbfea74fe982903a2da1ba8537866)
lasting to height 295942 (0000000000000000adfe890f4c33801b26fac05207747652d8c311244e59a6d0).
Chain state database corruption likely.

And it's still going like this....

@sipa
Copy link
Member Author

sipa commented Apr 17, 2014

@rdponticelli Thanks for testing!

So the problem was that if the block data failed to write, we ended up with a block header in the index, but no corresponding block data in the block file. At next startup, this would mean it would need to download it again, but the block download logic can't deal with the in-index-but-not-on-disk case yet, which will be added later.

I've (hopefully) fixed it by only writing the block header to the index after the block data is present.

@rdponticelli
Copy link
Contributor

Now I'm getting this on another node. It seems that it's breaking when it changes the block file:

2014-04-17 19:02:56 ProcessBlock: ACCEPTED
2014-04-17 19:03:07 UpdateTip: new best=0000000000000000551b394408e467f83cd5d6f36612f8e33643d188f0d4498f height=293947 log2_work=77.722856 tx=36140885 date=2014-04-03 11:35:26 progress=0.931634
2014-04-17 19:03:07 ProcessBlock: ACCEPTED
2014-04-17 19:03:18 Pre-allocating up to position 0x300000 in rev00129.dat
2014-04-17 19:03:19 UpdateTip: new best=000000000000000055c9d3b2a816877d3b1a9ca6048170ed7cee1c8629255e0e height=293948 log2_work=77.72298 tx=36141736 date=2014-04-03 11:56:14 progress=0.931702
2014-04-17 19:03:19 ProcessBlock: ACCEPTED
2014-04-17 19:03:21 Pre-allocating up to position 0x2000000 in blk00129.dat
2014-04-17 19:03:52 UpdateTip: new best=0000000000000000b2a0b9c58aa86994aa85385d0d8a8676945f6d3b642b27a9 height=293949 log2_work=77.723105 tx=36142897 date=2014-04-03 12:08:09 progress=0.931744
2014-04-17 19:03:53 ProcessBlock: ACCEPTED
2014-04-17 19:03:53 ProcessBlock: ORPHAN BLOCK 0, prev=00000000000000006c3f5ca15b21e43e7328c6ea033a25671f08c396cfd027ca
2014-04-17 19:03:59 UpdateTip: new best=00000000000000006e97b96972f7f31f0847fcf35e723193c0949dad4bbc730c height=293950 log2_work=77.723229 tx=36143254 date=2014-04-03 12:14:21 progress=0.931765
2014-04-17 19:03:59 ProcessBlock: ACCEPTED
2014-04-17 19:04:03 UpdateTip: new best=0000000000000000cf179e61ecbe56eb19561396ef72650f2a8b3e7d29d21cde height=293951 log2_work=77.723353 tx=36143569 date=2014-04-03 12:20:27 progress=0.931785
2014-04-17 19:04:03 ProcessBlock: ACCEPTED
2014-04-17 19:04:05 UpdateTip: new best=000000000000000039022b02422dfee754c2dd91cf8dac8289c956b30da37b80 height=293952 log2_work=77.723478 tx=36143600 date=2014-04-03 12:20:35 progress=0.931785
2014-04-17 19:04:05 ProcessBlock: ACCEPTED
2014-04-17 19:04:05 ProcessBlock: ORPHAN BLOCK 1, prev=00000000000000007f49a436045b4190c4164adef4479ebc290ae1379826b40a
2014-04-17 19:04:07 UpdateTip: new best=00000000000000001c9b145536ce2f3ffd529742b0ab58a4bdee1ebd985e4c69 height=293953 log2_work=77.723602 tx=36143765 date=2014-04-03 12:24:37 progress=0.931799
2014-04-17 19:04:07 ProcessBlock: ACCEPTED
2014-04-17 19:04:07 ProcessBlock: ORPHAN BLOCK 2, prev=0000000000000000c0b4bede793045606417892efcbb83c3d5c52b9aa74b9086
2014-04-17 19:04:08 UpdateTip: new best=0000000000000000c9cbe25307641906939f3e767d94dd7d2553b860dad7a835 height=293954 log2_work=77.723726 tx=36143895 date=2014-04-03 12:27:15 progress=0.931808

@sipa
Copy link
Member Author

sipa commented Apr 17, 2014

Just seeing orphans while syncing is perfectly expected (not intentional though, but it will require more to fix that).

@sipa
Copy link
Member Author

sipa commented Apr 18, 2014

Rebased and addressed @mikehearn's comments.

sipa added 2 commits April 25, 2014 00:33
Also modify some connection logic to deal with non-full blocks in the index.
@sipa
Copy link
Member Author

sipa commented Apr 24, 2014

Rebased.

@BitcoinPullTester
Copy link

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/942b33a19d3aa96326acc044c1834c48beff777c for binaries and test log.
This test script verifies pulls every time they are updated. It, however, dies sometimes and fails to test properly. If you are waiting on a test, please check timestamps to verify that the test.log is moving at http://jenkins.bluematt.me/pull-tester/current/
Contact BlueMatt on freenode if something looks broken.

@mikehearn
Copy link
Contributor

@gavinandresen How does this look to you?

@leofidus
Copy link

leofidus commented May 2, 2014

Tested this on the Dogecoin blockchain. Both initial sync and keeping up with the blockchain worked as normal, both restarting the client and waking up from hibernation worked fine and I didn't spot anything unusual in the logs.

@rebroad
Copy link
Contributor

rebroad commented May 9, 2014

I'm currently testing this. It's still incredibly slow due to the peer selection for the block download. I really think it should take the speed of download into consideration for selecting a peer, and I still don't understand why it selects only one peer - why can't it select several? After all, most download tools these days choose to download from multiple sources to reduce the load on any one source.

@gmaxwell
Copy link
Contributor

gmaxwell commented May 9, 2014

@rebroad This is an incremental reorganization. All that stuff you are talking about was implemented months ago in sipa's headers first branch but it was too risky a change to take at once and wasn't seeing adequate testing. This pull (and others) are breaking up the steps to move to it incrementally in a way which is easier to verify.

@rebroad
Copy link
Contributor

rebroad commented May 9, 2014

@gmaxwell yes, I know. The stuff I am talking about hasn't been implemented yet. This is why I am mentioning it.

@gmaxwell
Copy link
Contributor

gmaxwell commented May 9, 2014

The stuff I am talking about hasn't been implemented yet.

Pulling from multiple peers and taking speed into account were both implemented in the headers first branch. (actually it does something which is usually better than take speed into account it tracks when peers are stalling the process and kicks out the straggler(s)).

@laanwj
Copy link
Member

laanwj commented May 9, 2014

@rebroad That's just rude. Please keep discussion in issues about the goal of the issue. @sipa does not claim that this change solves the problems that you mention, so you should not be disappointed that they are not solved by it.

@sipa
Copy link
Member Author

sipa commented May 9, 2014

@rebroad Selecting multiple peers to download from would just bring duplicate block downloads back, unless it happens in a coordinated way. No such coordination exists, because we don't know in advance which peers have which blocks - we just download what they tell us they have.

Nobody claims we can't download from multiple peers at once - that is one of the goals of headers-first synchronization (which this is incremental work towards, but it will take a few more pull requests before we're actually there).

@laanwj laanwj added the P2P label May 9, 2014
@mikehearn
Copy link
Contributor

How do we decide when there's been enough testing and review? Seems like the path forward here isn't clear.

@laanwj
Copy link
Member

laanwj commented May 9, 2014

@mikehearn What part is unclear? This is tagged '0.10' so will be merged after the branch-off of 0.9.2.

@mikehearn
Copy link
Contributor

This may be a discussion for the mailing list, but I'm not sure it's healthy to keep pull requests that are "done" queued up where they have to be constantly rebased, waiting to enter master, missing out on testing that people might do. If you want to have a branch for stabilisation, why not branch for 0.9.2 now and cherry pick fixes into it until you're confident it's ready?

@laanwj
Copy link
Member

laanwj commented May 9, 2014

I know that - the plan is to split off 0.9.2 this week.

@mikehearn
Copy link
Contributor

Okie dokie.

@rebroad
Copy link
Contributor

rebroad commented May 9, 2014

@gmaxwell ah, so it's not in the master branch? Am I able to submit pull requests to the branch you mention? What is the purpose of this branch and will it as some point be merged into master?

My main reason for mentioning these concerns is that the current master as it is doesn't work well, and gets 300 or so orphan blocks piling up without catching up on the blockchain, often requiring a restart of bitcoind to get it downloading again, at which point the hundreds of orphan blocks already downloaded are then lost (since they are not saved to disk).

It's also frustrating to see this, considering I raised a pull request about 2 years ago which downloaded blocks far faster and more efficiently than it currently does, and which did detect stuck downloads, unlike the current master branch.

@sipa
Copy link
Member Author

sipa commented May 9, 2014

@rebroad Gregory talking about my personal headers-first branch (#2964),
which was never merged because of multiple reasons, and later abandoned.
I've since (slowly) been working on several pull requests with pieces of
that functionality. This pull request is just some refactoring work towards
that.

@laanwj laanwj merged commit 942b33a into bitcoin:master May 9, 2014
laanwj added a commit that referenced this pull request May 9, 2014
942b33a Split AcceptBlockHeader from AcceptBlock. (Pieter Wuille)
f457347 Split up CheckBlock in a block and header version (Pieter Wuille)
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants