Per-peer block tracking, stalled block download detection, orphan pool limiting #3514

sipa · 2014-01-11T18:21:17Z

This pull request brings in several block downloaded related improvements. It's intended as a preparation for headers-first, but It's useful on itself.

Changes:

Track blocks being downloaded per-peer, and limit the number in-flight.
Never request the same block twice, so no more duplicate downloads during sync.
In case a peer seems to be too slow to send blocks we need, they're disconnected and their in-flight blocks can be requested elsewhere (which saves both us and them bandwidth).
Limit the number of orphans held in memory, as those still happen very frequently during sync (the result from accidentally downloading blocks from two peers at once).

This should result in generally better synchronization behaviour from random/multiple peers, but may slow down the first part of the process (due to small blocks) and slow down fetching over high-latency high-bandwidth connections (which is relatively rare).

Test plan:

Delete blocks/ and chainstate/
Start a download from random peers (no special options, or use -addnode=IP optionally).
- Verify that no "already have block ..." messages appear during sync.
- Check that synchronization completes (it may still be slow, as there is no optimization to select particularly good peers to download from, and orphans are expected).
Delete blocks/ and chainstate/ again.
Start a download from a single fast peer (-connect=IP)
- Verify that no "already have block ..." messages appear during sync.
- Check that synchronization completes and progresses quickly (doesn't get stuck; it may be a bit slower than before this patch, particularly in the initial part of the chain; very occasional orphans are possible)

Headers-first will remove a part of the logic in this pull request again, but as it's useful on itself, I prefer getting this reviewed and merged anyway:

There won't be a per-peer to-be-downloaded queue of block hashes anymore; those will be decided globally based on good known (and verified) headers chains.
The per-peer block-downloads-in-flight will remain, but becomes a queue of CBlockIndex*'es instead of uint256's.
There won't be any orphan processing in the block download mechanism anymore (so no need to limit orphans).
Stalled download detection won't use as many hardcoded constants anymore (as the question becomes "is this peer slowing too slow compared to the others to keep up" rather than "is this peer slow according to some absolute measurement").

Tested by synchronizing while running in valgrind.

Depends on #3370 (Prepare block connection logic for headers-first).

laanwj · 2014-01-13T17:16:26Z

Needs rebase after #3516

sipa · 2014-01-13T17:17:22Z

Will rebase tonight.

sipa · 2014-01-18T20:49:13Z

Did a sync from scratch using this branch (plus #3517) on mainnet, a significant part of which under valgrind. Seems to work fine, though sometimes long chains of accumulated orphans are connected, resulting in cs_main being locked for a long time. I'll try to fix that independently.

sipa · 2014-01-27T21:36:04Z

Rebase after logging changes.

jgarzik · 2014-01-29T19:52:43Z

ACK

laanwj · 2014-01-30T13:22:42Z

I did some testing with block syncs, no problems found, worked fine for me.

sipa · 2014-01-30T13:26:45Z

There is one potential problem here, namely that the many orphans can result in very long chains (I've seen up to ~400) of blocks being connected within a single cs_main lock, essentially making the program look frozen. This can be solved by pushing some locks down, but may need some discussion first.

I could easily submit the limit-orphans commit as a separate pull request though, as I suppose that can be seen as a memory usage bug fix.

Keep track of which block is being requested (and to be requested) from each peer, and limit the number of blocks in-flight per peer. In addition, detect stalled downloads, and disconnect if they persist for too long. This means blocks are never requested twice, and should eliminate duplicate downloads during synchronization.

BitcoinPullTester · 2014-02-08T16:31:54Z

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/f59d8f0b644d49324cabd19c58cf2262d49e1392 for binaries and test log.
This test script verifies pulls every time they are updated. It, however, dies sometimes and fails to test properly. If you are waiting on a test, please check timestamps to verify that the test.log is moving at http://jenkins.bluematt.me/pull-tester/current/
Contact BlueMatt on freenode if something looks broken.

rebroad · 2014-02-23T01:06:25Z

Regarding slowing down the download of the small blocks - how about factoring in the block size and incrementally changing how many peers are used as the block sizes get larger?

rebroad · 2014-02-23T01:08:33Z

src/main.cpp

Why was this here?

Because blocks used to be tracked using the (weak) mapAlreadyAskedFor mechanism, but this patch introduces a better way, that retains information about whom it was asked from.

rebroad · 2014-02-26T23:29:33Z

I've been testing this a few days now. The stuck block detection doesn't seem to be working:-

2014-02-26 22:24:01 Requesting block 00000000000000014e06412631ff67305cde77c6631dc58a2ce35308558cee05 from 216.99.101.192:8333
2014-02-26 22:28:31 ResendWalletTransactions()
2014-02-26 22:54:16 socket recv error 110
2014-02-26 22:54:17 receive version message: /Satoshi:0.8.6/: version 70001, blocks=287993, us=70.42.240.30:57711, them=207.255.42.202:8333, peer=207.255.42.202:8333
2014-02-26 22:54:17 Added time data, samples 10, offset +0 (+0 minutes)
2014-02-26 22:56:34 socket recv error 110
2014-02-26 22:57:26 connect() to 173.79.229.22:8333 failed after select(): Connection refused
2014-02-26 22:57:28 socket recv error 110
2014-02-26 22:58:03 socket recv error 110

almost an 30 minutes went by with no download happening, and it didn't seem to notice.

rebroad · 2014-02-26T23:31:13Z

src/main.h

Magic number... why so high? In the later stages of block download I'd have thought a number close to 1 would be appropriate.

That would be a further optimization, yes.

rebroad · 2014-03-01T20:37:44Z

So far upon testing this code, I've not yet seen it downloading blocks from more than one node. There's not even any code to send a getblocks to more than one node from what I can tell. If a node is deemed to be unresponsive it's forcefully disconnected. How come this is ok, and yet my pull request to disconnect nodes sending blocks my nodes already had was considered not ok?

sipa · 2014-03-01T20:44:26Z

This does not do parallel block download yet, as that requires
headers-first sync. This is a step towards that, though.

And disconnecting unresponsive peers is fine - it's the only way to not
waste their bandwidth if we don't need/want requested blocks anymore.

What is not fine, is disconecting peers because they're the second ones to
respond to a block that we requested twice. We should simply not request
the same block twice in the first place. That is what this pull request is
supposed to do. If you still see already have block messages with this
patch, there is possibly a bug in it that will need to be addressed.

rebroad · 2014-03-02T00:27:28Z

@sipa I deleted my previous comment about "already have" messages - none seen with this patch so far! But... a misbehaving node could still send random blocks if it wanted to, so at some point in the future I suspect we'll want to set that sort of behaviour as misbehaviour - maybe even as part of this pull request...?

sipa · 2014-03-02T00:44:13Z

@rebroad That's a good question. The problem is that the current code actually does broadcast blocks sometimes without inv, namely newly mined blocks which are certain not be exist anywhere else.

rebroad · 2014-03-02T07:41:40Z

why would it broadcast a block without broadcasting the inv first?

As an aside note, I notice the block download got stuck today, and wasn't able to recover - it was still downloading blocks that were being advertised (i.e. the latest block mined), but the initial block download stayed stuck.

rebroad · 2014-03-02T09:09:02Z

@sipa Re misbehaviour, it could just be for nodes sending unsolicited blocks that the node already has. Perhaps +30.

rebroad · 2014-03-08T00:15:09Z

The stuck download detection seems like it might not be working.

f59d8f0 Per-peer block download tracking and stalled download detection. (Pieter Wuille)

rebroad · 2014-03-30T04:08:50Z

I think this has been pulled prematurely. It's still got problems as I mentioned above. For one, it doesn't resume download if the OS is suspended and resumed.

laanwj · 2014-03-30T07:47:02Z

@rebroad can you post new problems and bug reports in new issues? If you just dump them into existing, closed issues they will probably get lost.

laanwj · 2014-03-30T10:32:20Z

And IMO, going toward headers-first / parallel block download is the single most important point in continued development of Bitcoin Core right now.

As this is pull is an important step in that direction, this was prioritized to be merged as soon as possible after 0.9.0 was released.

If a little temporary inconvenience while the kinks are worked out is too much for you, I suggest staying with 0.9 for now. Thanks for helping test though.

sipa · 2014-03-30T15:01:14Z

The stuck download detection here is just a very simple prototype. It only uses some hardcoded constants as thresholds. The purpose is turning it into a detection for one peer being slower than the other, once actual parallel downloading is there.

@rebroad Is this a regression introduced by this pullreq, or a pre-existing problem that you expected to be fixed by this but wasn't?

rebroad · 2014-03-31T17:27:06Z

@sipa it's a regression.

laanwj · 2014-04-04T07:27:42Z

I suspended my PC with bitcoin running a few nights and it had no problem recovering connections when re-awakened.

@rebroad Do you perhaps have a dynamic IP, that changes if you computer was suspended for a while? If so Bitcoin indeed doesn't cope with it too well (it will AFAIK still advertise the old address so you won't get incoming connections). But that's not a regression or new issue.

rebroad · 2014-04-06T03:29:41Z

@laanwj yes, my IP address changes, but I don't accept incoming connections either. The getting stuck after a suspend didn't happen before I applied this pullreq AFAIK.

rebroad · 2016-10-28T09:28:22Z

src/main.cpp

+    assert(state != NULL);
+
+    // Make sure it's not listed somewhere already.
+    MarkBlockAsReceived(hash);


Really don't like the use of functions that clearly don't do what their name suggests they do.

What parts of this function are needed? And what would the function be called if it did what it is needing to do here?

satoshi-n

Incompetence or malicious bbbb

sipa mentioned this pull request Jan 12, 2014

Store orphan blocks in serialized form #3516

Merged

sipa mentioned this pull request Jan 24, 2014

Sometimes nodes send many duplicate blocks. This patch disconnects the p... #1382

Closed

gavinandresen mentioned this pull request Jan 29, 2014

Prepare block connection logic for headers-first #3370

Merged

rebroad reviewed Feb 23, 2014
View reviewed changes

rebroad reviewed Feb 26, 2014
View reviewed changes

laanwj added a commit that referenced this pull request Mar 10, 2014

Merge pull request #3514

b76733d

f59d8f0 Per-peer block download tracking and stalled download detection. (Pieter Wuille)

laanwj merged commit f59d8f0 into bitcoin:master Mar 10, 2014

sipa mentioned this pull request Mar 18, 2014

Speed limit / throttle network usage #273

Closed

rebroad mentioned this pull request May 8, 2014

Stuck block download not detected #4151

Closed

This was referenced Jun 23, 2014

ORPHAN BLOCKS - not able to load blockchain (0.9.2) #4356

Closed

ProcessBlock: ORPHAN BLOCK 751, prev=00xxxxxx #4353

Closed

rebroad mentioned this pull request Sep 3, 2014

Only need to subtract 1 (originally time was in seconds). #4828

Closed

rebroad reviewed Oct 28, 2016

View reviewed changes

satoshi-n reviewed Sep 5, 2020

View reviewed changes

bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022

Per-peer block tracking, stalled block download detection, orphan pool limiting #3514

Per-peer block tracking, stalled block download detection, orphan pool limiting #3514

Uh oh!

Conversation

sipa commented Jan 11, 2014

Uh oh!

laanwj commented Jan 13, 2014

Uh oh!

sipa commented Jan 13, 2014

Uh oh!

sipa commented Jan 18, 2014

Uh oh!

sipa commented Jan 27, 2014

Uh oh!

jgarzik commented Jan 29, 2014

Uh oh!

laanwj commented Jan 30, 2014

Uh oh!

sipa commented Jan 30, 2014

Uh oh!

BitcoinPullTester commented Feb 8, 2014

Uh oh!

rebroad commented Feb 23, 2014

Uh oh!

rebroad Feb 23, 2014

Choose a reason for hiding this comment

Uh oh!

sipa Feb 28, 2014

Choose a reason for hiding this comment

Uh oh!

rebroad commented Feb 26, 2014

Uh oh!

rebroad Feb 26, 2014

Choose a reason for hiding this comment

Uh oh!

sipa Mar 1, 2014

Choose a reason for hiding this comment

Uh oh!

rebroad commented Mar 1, 2014

Uh oh!

sipa commented Mar 1, 2014

Uh oh!

rebroad commented Mar 2, 2014

Uh oh!

sipa commented Mar 2, 2014

Uh oh!

rebroad commented Mar 2, 2014 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rebroad commented Mar 2, 2014

Uh oh!

rebroad commented Mar 8, 2014 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rebroad commented Mar 30, 2014 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

laanwj commented Mar 30, 2014

Uh oh!

laanwj commented Mar 30, 2014

Uh oh!

sipa commented Mar 30, 2014

Uh oh!

rebroad commented Mar 31, 2014

Uh oh!

laanwj commented Apr 4, 2014

Uh oh!

rebroad commented Apr 6, 2014

Uh oh!

rebroad Oct 28, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

satoshi-n left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

rebroad commented Mar 2, 2014 •

edited

Loading

rebroad commented Mar 8, 2014 •

edited

Loading

rebroad commented Mar 30, 2014 •

edited

Loading

rebroad Oct 28, 2016 •

edited

Loading