Skip to content

Conversation

@sdaftuar
Copy link
Member

@sdaftuar sdaftuar commented May 5, 2017

This first commit delays requesting blocks from peers unless their tip has more than nMinimumChainWork (our anti-DoS threshold). This speeds up initial headers sync by delaying block downloads from starting until we're closer to being caught up.

The second commit implements a timeout on the initial headers sync: when we send the initial getheaders to a peer, estimate the number of headers we expect to receive, and set a timeout (15 minutes + 1 ms / header).

This is a pretty naive algorithm, but I think the worst case behavior (eg if we're bottlenecked by our own bandwidth) is that we cycle through peers while downloading headers, at most every 15 minutes -- which I think isn't too awful, since those 15 minute windows should be making progress at least, and this bounds the time a bad peer can interfere with headers sync.

FYI I kicked off a fresh node with this patch, and it took me about 1 minute to have the whole headers chain.

Implements #9068, though this isn't the exact approach @sipa suggested in that issue.

@fanquake fanquake added the P2P label May 5, 2017
@gmaxwell
Copy link
Contributor

gmaxwell commented May 6, 2017

Concept ACK. I suppose this timeout could later be made dynamic like has been proposed block transfer timeouts.

if (nNow > state.nHeadersSyncTimeout && !pto->fWhitelisted && nSyncStarted == 1 && (nPreferredDownload - state.fPreferredDownload >= 1)) {
// Disconnect a (non-whitelisted) peer if it is our only sync peer,
// and we have others we could be using instead.
LogPrintf("Timeout downloading headers from peer=%d, disconnecting\n", pto->id);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error: ‘const NodeId CNode::id’ is private within this context

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be pto->GetId().

nMinimumChainWork is an anti-DoS threshold; wait until we have a proposed
tip with more work than that before downloading blocks towards that tip.
@sdaftuar sdaftuar force-pushed the 2017-05-timeout-headers-sync branch from 1ac0cb8 to 9bddfc1 Compare May 8, 2017 18:33
@sdaftuar
Copy link
Member Author

sdaftuar commented May 8, 2017

Rebased after #10189.

@sdaftuar sdaftuar force-pushed the 2017-05-timeout-headers-sync branch from 9bddfc1 to ca4aebd Compare May 8, 2017 19:05
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Once we've caught up once

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improved the comment in 898fc2ac689aa77e74362aa27d12120256c1942e.

@jonasschnelli
Copy link
Contributor

utACK ca4aebd3c50bc424afd25a5611a6ea1e77429413

For the hybrid SPV mode (#9483), a fast headers sync is crucial and also for normal operations, someone could significant delay your initial sync.
If you connect to a malicious or just a very slow peer (It happened to me a couple of times), couldn't you – in theory – spend a couple of hours syncing headers? Even after this PR, I guess a peer could serve you the 2k headers package every 10minutes (while pong accordingly) leading to a header sync of more then 30h.

I'm not sure if this is solvable without to much of effort, maybe the only solution would be parallel headers download... or at least a check among a handful of peers which could serve 3x2000 headers the fastest.

@sdaftuar
Copy link
Member Author

sdaftuar commented May 9, 2017

If you connect to a malicious or just a very slow peer (It happened to me a couple of times), couldn't you – in theory – spend a couple of hours syncing headers? Even after this PR, I guess a peer could serve you the 2k headers package every 10minutes (while pong accordingly) leading to a header sync of more then 30h.

@jonasschnelli After this PR, if you have started syncing headers from your initial sync peer and your pindexBestHeader isn't caught up to within 24 hours of current time, then you'll disconnect them. So this puts a timeout on the entire headers sync (a bit over 20 minutes, currently -- 15 minutes + 1 ms/[expected number of headers]).

Once our headers chain is within 24 hours of current time, we'll request headers from our other peers as well, so after that point as long as one of your peers isn't stalling you, you'll get the headers chain.

@jonasschnelli
Copy link
Contributor

Thanks @sdaftuar for the explanation. Make sense.
I thought the timer resets after each headers message. A 20min delay hurts but is probably unavoidable (the fact that we don't know the bandwidth limitations and potentials).

Copy link
Contributor

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 898fc2ac689aa77e74362aa27d12120256c1942e, few notes, but none matter much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe usually its good practice to print if we would otherwise have disconnected a whitelisted peer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a27ff02f537a3225372e1e1286f0a93fb9037ba9

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I can take this opportunity to bikeshed your constants - why even bother with a base timeout this high? maybe just a minute (to handle the really-close-to-done-dont-want-a-1ms-timeout case) and then double/tripple the TIMEOUT_PER_HEADER?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wanted to do something clearly safe (it's easy to reason about booting your sync peer potentially every 15 minutes, but slightly scarier to think there could be a circumstance you'd boot that peer every 1 minute).

Anyway I'd like to see what other opinions are, I'm fine adjusting them however people would like...

@sdaftuar
Copy link
Member Author

Addressed @TheBlueMatt's nit in a fixup commit, let me know if I can squash.

Copy link
Contributor

@jnewbery jnewbery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK a27ff02f537a3225372e1e1286f0a93fb9037ba9 with one request for a comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nPreferredDownload - state.fPreferredDownload >= 1 was slightly confusing for me. It's saying we should only timeout a peer if we have another peer which is preferred. This doesn't exactly match the logic in SendMessages() where we'll try to start block sync from a non-preferred peer if we don't have any better options.

I think this is fine. It's not obvious that we should time out a peer if we don't have any other options which are preferred, but perhaps a comment here would make this clearer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment in 595d51c

@jnewbery
Copy link
Contributor

jnewbery commented May 22, 2017

re-utACK 595d51c

Copy link
Contributor

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed making the whole download logic a much clearer state machine, but that can come after this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"downleading"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, it would be nice to unset this peer as fSyncStarted so that we pick another peer to donwnload from, in case the whitelisted sync peer is broken.

Copy link
Member Author

@sdaftuar sdaftuar Jun 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed both of these comments in 65e3008

@sdaftuar sdaftuar force-pushed the 2017-05-timeout-headers-sync branch from b4c8b5e to 65e3008 Compare June 1, 2017 17:49
At startup, we choose one peer to serve us the headers chain, until
our best header is close to caught up.  Disconnect this peer if more
than 15 minutes + 1ms/expected_header passes and our best header
is still more than 1 day away from current time.
@sdaftuar sdaftuar force-pushed the 2017-05-timeout-headers-sync branch from 65e3008 to 76f7481 Compare June 5, 2017 20:33
@sdaftuar
Copy link
Member Author

sdaftuar commented Jun 5, 2017

Squashed 65e300812 -> 76f7481

@laanwj
Copy link
Member

laanwj commented Jun 6, 2017

utACK 76f7481

@laanwj laanwj merged commit 76f7481 into bitcoin:master Jun 6, 2017
laanwj added a commit that referenced this pull request Jun 6, 2017
76f7481 Add timeout for headers sync (Suhas Daftuar)
e265200 Delay parallel block download until chain has sufficient work (Suhas Daftuar)

Tree-SHA512: e7f5468b7defe67d4d2d5c976bc129dba2b32b2ea52d3ff33b9cbff5c3b5b799be867653f1bcd354340d707d76dcadf2da4588abf6d6ec4a06672cdc5e1101eb
laanwj added a commit that referenced this pull request Sep 6, 2017
eac64bb [qa] Test nMinimumChainWork (Suhas Daftuar)
0311836 Allow setting nMinimumChainWork on command line (Suhas Daftuar)

Pull request description:

  As discussed briefly here: https://botbot.me/freenode/bitcoin-core-dev/2017-02-28/?msg=81712308&page=4

  This adds a hidden command line option for setting `nMinimumChainWork`, which allows us to test this parameter in our functional tests, as well as allowing for niche use cases like syncing nodes that are otherwise disconnected from the network.

  See also #10345, which proposes a new use of `nMinimumChainWork`.

Tree-SHA512: fe4d8f4f289697615c98d8760f1cc74c076110310ea0b5b875fcab78c127da9195b4eb84148aebacc7606c246e5773d3f13bd5d9559d0a8bffac20a3a28c62df
codablock pushed a commit to codablock/dash that referenced this pull request Sep 29, 2019
eac64bb [qa] Test nMinimumChainWork (Suhas Daftuar)
0311836 Allow setting nMinimumChainWork on command line (Suhas Daftuar)

Pull request description:

  As discussed briefly here: https://botbot.me/freenode/bitcoin-core-dev/2017-02-28/?msg=81712308&page=4

  This adds a hidden command line option for setting `nMinimumChainWork`, which allows us to test this parameter in our functional tests, as well as allowing for niche use cases like syncing nodes that are otherwise disconnected from the network.

  See also bitcoin#10345, which proposes a new use of `nMinimumChainWork`.

Tree-SHA512: fe4d8f4f289697615c98d8760f1cc74c076110310ea0b5b875fcab78c127da9195b4eb84148aebacc7606c246e5773d3f13bd5d9559d0a8bffac20a3a28c62df
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants