-
Notifications
You must be signed in to change notification settings - Fork 38.7k
Per-peer block tracking, stalled block download detection, orphan pool limiting #3514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Needs rebase after #3516 |
|
Will rebase tonight. |
|
Did a sync from scratch using this branch (plus #3517) on mainnet, a significant part of which under valgrind. Seems to work fine, though sometimes long chains of accumulated orphans are connected, resulting in cs_main being locked for a long time. I'll try to fix that independently. |
|
Rebase after logging changes. |
|
ACK |
|
I did some testing with block syncs, no problems found, worked fine for me. |
|
There is one potential problem here, namely that the many orphans can result in very long chains (I've seen up to ~400) of blocks being connected within a single cs_main lock, essentially making the program look frozen. This can be solved by pushing some locks down, but may need some discussion first. I could easily submit the limit-orphans commit as a separate pull request though, as I suppose that can be seen as a memory usage bug fix. |
Keep track of which block is being requested (and to be requested) from each peer, and limit the number of blocks in-flight per peer. In addition, detect stalled downloads, and disconnect if they persist for too long. This means blocks are never requested twice, and should eliminate duplicate downloads during synchronization.
|
Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/f59d8f0b644d49324cabd19c58cf2262d49e1392 for binaries and test log. |
|
Regarding slowing down the download of the small blocks - how about factoring in the block size and incrementally changing how many peers are used as the block sizes get larger? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because blocks used to be tracked using the (weak) mapAlreadyAskedFor mechanism, but this patch introduces a better way, that retains information about whom it was asked from.
|
I've been testing this a few days now. The stuck block detection doesn't seem to be working:- 2014-02-26 22:24:01 Requesting block 00000000000000014e06412631ff67305cde77c6631dc58a2ce35308558cee05 from 216.99.101.192:8333 almost an 30 minutes went by with no download happening, and it didn't seem to notice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Magic number... why so high? In the later stages of block download I'd have thought a number close to 1 would be appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be a further optimization, yes.
|
So far upon testing this code, I've not yet seen it downloading blocks from more than one node. There's not even any code to send a getblocks to more than one node from what I can tell. If a node is deemed to be unresponsive it's forcefully disconnected. How come this is ok, and yet my pull request to disconnect nodes sending blocks my nodes already had was considered not ok? |
|
This does not do parallel block download yet, as that requires And disconnecting unresponsive peers is fine - it's the only way to not What is not fine, is disconecting peers because they're the second ones to |
|
@sipa I deleted my previous comment about "already have" messages - none seen with this patch so far! But... a misbehaving node could still send random blocks if it wanted to, so at some point in the future I suspect we'll want to set that sort of behaviour as misbehaviour - maybe even as part of this pull request...? |
|
@rebroad That's a good question. The problem is that the current code actually does broadcast blocks sometimes without inv, namely newly mined blocks which are certain not be exist anywhere else. |
|
why would it broadcast a block without broadcasting the inv first? As an aside note, I notice the block download got stuck today, and wasn't able to recover - it was still downloading blocks that were being advertised (i.e. the latest block mined), but the initial block download stayed stuck. |
|
@sipa Re misbehaviour, it could just be for nodes sending unsolicited blocks that the node already has. Perhaps +30. |
|
The stuck download detection seems like it might not be working. |
f59d8f0 Per-peer block download tracking and stalled download detection. (Pieter Wuille)
|
I think this has been pulled prematurely. It's still got problems as I mentioned above. For one, it doesn't resume download if the OS is suspended and resumed. |
|
@rebroad can you post new problems and bug reports in new issues? If you just dump them into existing, closed issues they will probably get lost. |
|
And IMO, going toward headers-first / parallel block download is the single most important point in continued development of Bitcoin Core right now. As this is pull is an important step in that direction, this was prioritized to be merged as soon as possible after 0.9.0 was released. If a little temporary inconvenience while the kinks are worked out is too much for you, I suggest staying with 0.9 for now. Thanks for helping test though. |
|
The stuck download detection here is just a very simple prototype. It only uses some hardcoded constants as thresholds. The purpose is turning it into a detection for one peer being slower than the other, once actual parallel downloading is there. @rebroad Is this a regression introduced by this pullreq, or a pre-existing problem that you expected to be fixed by this but wasn't? |
|
@sipa it's a regression. |
|
I suspended my PC with bitcoin running a few nights and it had no problem recovering connections when re-awakened. @rebroad Do you perhaps have a dynamic IP, that changes if you computer was suspended for a while? If so Bitcoin indeed doesn't cope with it too well (it will AFAIK still advertise the old address so you won't get incoming connections). But that's not a regression or new issue. |
|
@laanwj yes, my IP address changes, but I don't accept incoming connections either. The getting stuck after a suspend didn't happen before I applied this pullreq AFAIK. |
| assert(state != NULL); | ||
|
|
||
| // Make sure it's not listed somewhere already. | ||
| MarkBlockAsReceived(hash); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really don't like the use of functions that clearly don't do what their name suggests they do.
What parts of this function are needed? And what would the function be called if it did what it is needing to do here?
satoshi-n
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incompetence or malicious bbbb
This pull request brings in several block downloaded related improvements. It's intended as a preparation for headers-first, but It's useful on itself.
Changes:
This should result in generally better synchronization behaviour from random/multiple peers, but may slow down the first part of the process (due to small blocks) and slow down fetching over high-latency high-bandwidth connections (which is relatively rare).
Test plan:
Headers-first will remove a part of the logic in this pull request again, but as it's useful on itself, I prefer getting this reviewed and merged anyway:
Tested by synchronizing while running in valgrind.
Depends on #3370 (Prepare block connection logic for headers-first).