Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 33 additions & 6 deletions src/net_processing.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -369,6 +369,9 @@ struct Peer {
/** Set of txids to reconsider once their parent transactions have been accepted **/
std::set<uint256> m_orphan_work_set GUARDED_BY(g_cs_orphans);

/** Whether we've sent this peer a getheaders in response to an inv prior to initial-headers-sync completing */
bool m_inv_triggered_getheaders_before_sync{false};

/** Protects m_getdata_requests **/
Mutex m_getdata_requests_mutex;
/** Work queue of items requested by this peer **/
Expand Down Expand Up @@ -681,6 +684,9 @@ class PeerManagerImpl final : public PeerManager
/** Number of nodes with fSyncStarted. */
int nSyncStarted GUARDED_BY(cs_main) = 0;

/** Hash of the last block we received via INV */
uint256 m_last_block_inv_triggering_headers_sync{};

/**
* Sources of received blocks, saved to be able punish them when processing
* happens afterwards.
Expand Down Expand Up @@ -3239,8 +3245,9 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type,
UpdateBlockAvailability(pfrom.GetId(), inv.hash);
if (!fAlreadyHave && !fImporting && !fReindex && !IsBlockRequested(inv.hash)) {
// Headers-first is the primary method of announcement on
// the network. If a node fell back to sending blocks by inv,
// it's probably for a re-org. The final block hash
// the network. If a node fell back to sending blocks by
// inv, it may be for a re-org, or because we haven't
// completed initial headers sync. The final block hash
// provided should be the highest, so send a getheaders and
// then fetch the blocks we need to catch up.
best_block = &inv.hash;
Expand All @@ -3265,10 +3272,30 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type,
}

if (best_block != nullptr) {
if (MaybeSendGetHeaders(pfrom, m_chainman.ActiveChain().GetLocator(m_chainman.m_best_header), *peer)) {
LogPrint(BCLog::NET, "getheaders (%d) %s to peer=%d\n",
m_chainman.m_best_header->nHeight, best_block->ToString(),
pfrom.GetId());
// If we haven't started initial headers-sync with this peer, then
// consider sending a getheaders now. On initial startup, there's a
// reliability vs bandwidth tradeoff, where we are only trying to do
// initial headers sync with one peer at a time, with a long
// timeout (at which point, if the sync hasn't completed, we will
// disconnect the peer and then choose another). In the meantime,
// as new blocks are found, we are willing to add one new peer per
// block to sync with as well, to sync quicker in the case where
// our initial peer is unresponsive (but less bandwidth than we'd
// use if we turned on sync with all peers).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we assume that the initial headers sync logic is robust, why add the additional complexity here? Wouldn't it be better to just skip the getheaders message? Or if the logic isn't assumed to be robust, lower the timeout or add a new peer on a random timeout?

Copy link
Member Author

@sdaftuar sdaftuar Aug 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I completely understand what you're asking, but the topic is more complicated than just an assumption around whether the initial headers sync logic is robust:

  • Ideally, we would only download the full headers chain from a single peer when we are starting up, because it saves bandwidth to do so.
  • It's possible that the peer we pick for initial headers sync could be (a) slow, (b) on a chain that is not the main chain, (c) adversarial, or some other terrible combination of those factors. So we cannot just have our logic rely on the initial peer to serve us the honest chain in a reasonable amount of time.
  • We currently have two behaviors that help protect us from choosing a bad initial peer. The main protection we have is that when a block INV is received, we send a getheaders to all peers that announce the block, resulting in us getting the main chain with high probability. However, this is bandwidth wasting if we have many peers that serve us an INV at the same time, which is probably the common case when we're in a scenario that our initial peer is slow.
  • The second protection we have is that after about 20 minutes, we'll evict our initial headers-sync peer if our tip's timestamp isn't within a day of the current time. This could kick in if we have a bad initial peer and no blocks are found for a while.

I think we could do a variety of things to improve the current situation on master; I think that adding (say) one additional headers sync peer on some kind of timer (maybe every 5 or 10 minutes) could make sense. I think that choosing a random peer among the set of peers announcing a block is probably better peer selection than choosing a random peer (or random outbound peer) on a timer, just because if a peer sends an INV there's more reason to believe that they are responsive and going to be helpful in getting us the chain, but probably some combination of both would be even better.

However, the complexity I ran into when thinking about other strategies for initial sync has to do with the eviction logic. Right now, I think it's mostly good that we evict our (single) initial headers-sync peer if we can't get a chain tip that is recent within 20 minutes. However, triggering that logic on all our peers at the same time seems over the top to me, because there are edge-case scenarios (such as: no blocks have been found on the network for a day, or the honest chain is some kind of billion-block timewarp chain that takes more than 20 minutes to download) where I think such logic could be badly behaved for the network, because we could end up with no peers or we could fall out of consensus.

I think what I'm proposing in this patch is a narrow change that exactly addresses the bandwidth problem, and maximizes the chance we find a good peer quickly, without making our behavior in those edge-case scenarios any worse. Nevertheless, a bigger overhaul of this logic that carefully considers these things could certainly be an improvement and make this whole thing easier to think about.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sdaftuar can you move this comment into the top of the PR; I think this context helps quite a bit (and was going to ask this a comment on the PR directly).

CNodeState& state{*Assert(State(pfrom.GetId()))};
if (state.fSyncStarted || (!peer->m_inv_triggered_getheaders_before_sync && *best_block != m_last_block_inv_triggering_headers_sync)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are two blocks found at the same time/height, this could still trigger a getheaders to most peers, no? (Someone sends block A, then someone else sends block B, then someone else sends A, etc) Might it be better to set m_last_extra_headers_sync = SteadyClock::now(); and check that it's been at least a minute before adding an extra one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, but I think your suggestion is less robust overall than what I have proposed here: sending an INV to a node is free (there is no proof-of-work attached to a block hash), so an adversary could take advantage of your proposed strategy by continually connecting, sending an INV, and disconnecting, to prevent a node from trying to sync with any of its honest peers that are announcing main-chain blocks.

On the other hand, I just checked one of my long-running, well-connected nodes, and it's seen about 10 stale blocks in the last 50000. So that seems like pretty good odds that a node starting up is unlikely to run into this?

if (MaybeSendGetHeaders(pfrom, m_chainman.ActiveChain().GetLocator(m_chainman.m_best_header), *peer)) {
LogPrint(BCLog::NET, "getheaders (%d) %s to peer=%d\n",
m_chainman.m_best_header->nHeight, best_block->ToString(),
pfrom.GetId());
}
if (!state.fSyncStarted) {
peer->m_inv_triggered_getheaders_before_sync = true;
// Update the last block hash that triggered a new headers
// sync, so that we don't turn on headers sync with more
// than 1 new peer every new block.
m_last_block_inv_triggering_headers_sync = *best_block;
}
}
}

Expand Down
105 changes: 105 additions & 0 deletions test/functional/p2p_initial_headers_sync.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
#!/usr/bin/env python3
# Copyright (c) 2022 The Bitcoin Core developers
# Distributed under the MIT software license, see the accompanying
# file COPYING or http://www.opensource.org/licenses/mit-license.php.
"""Test initial headers download

Test that we only try to initially sync headers from one peer (until our chain
is close to caught up), and that each block announcement results in only one
additional peer receiving a getheaders message.
"""

from test_framework.test_framework import BitcoinTestFramework
from test_framework.messages import (
CInv,
MSG_BLOCK,
msg_headers,
msg_inv,
)
from test_framework.p2p import (
p2p_lock,
P2PInterface,
)
from test_framework.util import (
assert_equal,
)
import random

class HeadersSyncTest(BitcoinTestFramework):
def set_test_params(self):
self.setup_clean_chain = True
self.num_nodes = 1

def announce_random_block(self, peers):
new_block_announcement = msg_inv(inv=[CInv(MSG_BLOCK, random.randrange(1<<256))])
for p in peers:
p.send_and_ping(new_block_announcement)

def run_test(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, may make the test slightly more readable (and doesn't give the casual reader the impression that there may be more than one node running).

Suggested change
def run_test(self):
def run_test(self):
node = self.nodes[0]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving this as-is; I think the self.num_nodes = 1 in set_test_params makes it clear that only one node is involved.

self.log.info("Adding a peer to node0")
peer1 = self.nodes[0].add_p2p_connection(P2PInterface())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't need to happen in this PR but in the future this test could also cover our preference for outbound peers for the initial headers sync.


# Wait for peer1 to receive a getheaders
peer1.wait_for_getheaders()
# An empty reply will clear the outstanding getheaders request,
# allowing additional getheaders requests to be sent to this peer in
# the future.
peer1.send_message(msg_headers())

self.log.info("Connecting two more peers to node0")
# Connect 2 more peers; they should not receive a getheaders yet
peer2 = self.nodes[0].add_p2p_connection(P2PInterface())
peer3 = self.nodes[0].add_p2p_connection(P2PInterface())

all_peers = [peer1, peer2, peer3]

self.log.info("Verify that peer2 and peer3 don't receive a getheaders after connecting")
for p in all_peers:
p.sync_with_ping()
with p2p_lock:
assert "getheaders" not in peer2.last_message
assert "getheaders" not in peer3.last_message

with p2p_lock:
peer1.last_message.pop("getheaders", None)

self.log.info("Have all peers announce a new block")
self.announce_random_block(all_peers)

self.log.info("Check that peer1 receives a getheaders in response")
peer1.wait_for_getheaders()
peer1.send_message(msg_headers()) # Send empty response, see above
with p2p_lock:
peer1.last_message.pop("getheaders", None)

self.log.info("Check that exactly 1 of {peer2, peer3} received a getheaders in response")
count = 0
peer_receiving_getheaders = None
for p in [peer2, peer3]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for p in [peer2, peer3]:
for p in [peer2, peer3]:
p.sync_with_ping()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I'm missing something; I believe this sync_with_ping() is unnecessary because we use send_and_ping() in announce_random_block(), does that sound right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, ignore my suggestion.

with p2p_lock:
if "getheaders" in p.last_message:
count += 1
peer_receiving_getheaders = p
p.last_message.pop("getheaders", None)
p.send_message(msg_headers()) # Send empty response, see above

assert_equal(count, 1)

self.log.info("Announce another new block, from all peers")
self.announce_random_block(all_peers)

self.log.info("Check that peer1 receives a getheaders in response")
peer1.wait_for_getheaders()

self.log.info("Check that the remaining peer received a getheaders as well")
expected_peer = peer2
if peer2 == peer_receiving_getheaders:
expected_peer = peer3

expected_peer.wait_for_getheaders()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
expected_peer.wait_for_getheaders()
expected_peer.wait_for_getheaders()
peer_receiving_headers.sync_with_ping()
with p2p_lock:
assert "getheaders" not in peer_receiving_headers.last_message

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I intentionally did not include this test (that a peer that we've started sync with doesn't receive a subsequent getheaders in response to an INV) because I felt like that was overspecifying what I think of as reasonable behavior. I think the most important property is that we add a new peer with each new block, but whether we continue trying to sync with existing peers is (in my view) up for debate.

For instance, if the test were rewritten a bit so that the peer receiving the initial getheaders in response to an INV had a big long headers chain to serve the node, then you'd expect there to be a getheaders in flight as headers sync proceeds. Granted that is not the situation in this test right now, but I feel like it's helpful for future test maintenance to not overspecify behavior if it doesn't really matter for what we're trying to achieve.


self.log.info("Success!")

if __name__ == '__main__':
HeadersSyncTest().main()

1 change: 1 addition & 0 deletions test/functional/test_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,7 @@
'rpc_generate.py',
'wallet_balance.py --legacy-wallet',
'wallet_balance.py --descriptors',
'p2p_initial_headers_sync.py',
'feature_nulldummy.py',
'mempool_accept.py',
'mempool_expiry.py',
Expand Down