Skip to content

Conversation

@mzumsande
Copy link
Contributor

@mzumsande mzumsande commented May 28, 2021

AddrFetch connections (old name: oneshots) are intended to be short-lived connections on which we ask a peer for addresses via getaddr and disconnect after receiving them.

This is done by disconnecting after receiving the first addr. However, it is no longer working as intended, because nowadays, the first addr a typical bitcoin core node sends is its self-announcement.
So we'll disconnect before the peer gets a chance to answer our getaddr.

I checked that this affects both -seednode peers specified manually, and DNS seeds when AddrFetch is used as a fallback if DNS doesn't work for us.

The current behavior of getting peers via AddrFetch when starting with an empty addrman would be to connect to the peer, receive its self-announcement and add it to addrman, disconnect, reconnect to the same peer again as a full outbound (no other addresses in addrman) and then receive more addr. This is silly and not in line with AddrFetch peer being intended to be short-lived peers. 

Fix this by only disconnecting after receiving an addr message of size > 1.

[Edit] As per review discussion, this PR now also adds a timeout after which we disconnect if we haven't received any suitable addr, and a functional test.

Disconnecting an AddrFetch peer only after receiving an addr
message of size >1 prevents dropping them before
they had a chance to answer the getaddr request.
@mzumsande mzumsande force-pushed the 202105_addrfetch_fix branch from 162f229 to b6c5d1e Compare May 28, 2021 16:30
@sdaftuar
Copy link
Member

I agree this makes sense.

While you're improving the addrfetch logic, perhaps we could also add a timeout for disconnection so that if an addrfetch peer hasn't responded to our getaddr for some amount of time, we eventually disconnect regardless? I believe right now we'd just stay connected indefinitely to such a peer.

@DrahtBot DrahtBot added the P2P label May 28, 2021
Copy link
Contributor

@amitiuttarwar amitiuttarwar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! concept ACK

half baked thought: Wondering if it would make sense to combine this value with the one used here. In my understanding, both are trying to use the number of addrs to discern [response to getaddr] vs [probably a self-announcement]. We could introduce a constant that's used in both places.

I'm also wondering if we can add functional test coverage for AddrFetch connections. I think the groundwork is in place with the outbound-full-relay and block-relay-only connections. We might just be able to add the addr-fetch option to TestNode.add_outbound_p2p_connection, the addconnection RPC, and CConnman::AddConnection. But I only took an initial look & this certainly does not need to happen in this PR.

@sipa
Copy link
Member

sipa commented May 29, 2021

Concept ACK

Copy link
Contributor

@lsilva01 lsilva01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested ACK b6c5d1e.
I ran the node with the args -debug=net -dnsseed=0 -seednode=<IP> -fixedseeds=0 using the master branch and this PR branch (table below).

The way ADDR_FETCH is used today to fetch new address from a seed IP seems not work as expected, since the first message is only self-announcement and the ADDR message with 1000 addresses is received via OUTBOUND_FULL_RELAY from the same peer.

As shown in the second table, after the proposed change in this PR, the first ADDR_FETCH connection actually fetches for addresses.

Master Branch results:

From IP Connection Type vAddr size AddrMan size self-announcement
seed_ip ADDR_FETCH 1 0 yes
seed_ip OUTBOUND_FULL_RELAY 1 1 yes
seed_ip OUTBOUND_FULL_RELAY 1000 1 no
new_ip OUTBOUND_FULL_RELAY 1 870 yes

PR Branch results:

From IP Connection Type vAddr size AddrMan size self-announcement
seed_ip ADDR_FETCH 1 0 yes
seed_ip ADDR_FETCH 1000 1 no
new_ip OUTBOUND_FULL_RELAY 1 870 yes

@bitcoin bitcoin deleted a comment from Gondayt May 29, 2021
@mzumsande
Copy link
Contributor Author

mzumsande commented May 30, 2021

While you're improving the addrfetch logic, perhaps we could also add a timeout for disconnection so that if an addrfetch peer hasn't responded to our getaddr for some amount of time, we eventually disconnect regardless? I believe right now we'd just stay connected indefinitely to such a peer.

Yes, I think that would be useful, and I will add this. I think such an addrfetch peer also wouldn't count toward the max full outbound limit in spite of effectively behaving like one. As for the timeout, I'd suggest a value 4*AVG_ADDRESS_BROADCAST_INTERVAL = 2min after connection.

half baked thought: Wondering if it would make sense to combine this value with the one used here. In my understanding, both are trying to use the number of addrs to discern [response to getaddr] vs [probably a self-announcement]. We could introduce a constant that's used in both places.

Do you suggest just using size > 10 instead of 1 in the check? More seems difficult, because the rest of the criteria work on the level of individual addrs.

I'm also wondering if we can add functional test coverage for AddrFetch connections. I think the groundwork is in place with the outbound-full-relay and block-relay-only connections. We might just be able to add the addr-fetch option to TestNode.add_outbound_p2p_connection, the addconnection RPC, and CConnman::AddConnection
timeout behavior is tested.

Good idea! I think it would make sense to combine this with @sdaftuar 's suggestion of a timeout, so that the timout will be tested.
Adding AddrFetch to the test framework actually does seem to work exactly as you suggested, just need to figure out some details and will add this in the next days (unless reviewers prefer that to be a separate PR).

@bitcoin bitcoin deleted a comment from Gondayt May 31, 2021
@mzumsande mzumsande marked this pull request as draft May 31, 2021 22:18
@mzumsande
Copy link
Contributor Author

I added the timeout and a functional test now. Thanks a lot @amitiuttarwar for help with the functional test!

@mzumsande mzumsande marked this pull request as ready for review June 3, 2021 15:09
@mzumsande mzumsande force-pushed the 202105_addrfetch_fix branch from 630e499 to d728c6c Compare June 3, 2021 15:15
Copy link
Member

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach ACK d728c6c8717960d4cbc5c2ea10833ce997d7ae1b some optional suggestions below, nice find and thanks for adding the test coverage.

@mzumsande mzumsande force-pushed the 202105_addrfetch_fix branch 2 times, most recently from be25888 to eda1064 Compare June 4, 2021 16:42
@mzumsande
Copy link
Contributor Author

Thanks for the review @jonatack! I took your suggestions and pushed an update.

@mzumsande mzumsande force-pushed the 202105_addrfetch_fix branch 3 times, most recently from 11a0bb7 to 535c990 Compare June 8, 2021 00:45
@sipa
Copy link
Member

sipa commented Jun 8, 2021

utACK 535c990e16c6c191aa1b484b800a16bf5e4d0765

Copy link
Contributor

@jnewbery jnewbery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 535c990e16c6c191aa1b484b800a16bf5e4d0765

One suggestion to simplify the test

@mzumsande mzumsande force-pushed the 202105_addrfetch_fix branch from 535c990 to 8dbf2b2 Compare June 10, 2021 20:42
@jnewbery
Copy link
Contributor

Code review ACK 8dbf2b210dbdaddc11295b926fefddca127d7e8a

Copy link
Contributor

@amitiuttarwar amitiuttarwar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code review ACK 8dbf2b210d

@mzumsande
Copy link
Contributor Author

mzumsande commented Jun 16, 2021

Analyzing #22243 gave me the concern that a timeout of 4*AVG_ADDRESS_BROADCAST_INTERVAL = 2min may be too low, given the distribution of PoissonNextSend(), leading to the node disconnecting early before the peer would send an addr too often. I'll do some simulations later today to confirm.

If AddrFetch peers don't send us addresses, disconnect them after
a while.
@mzumsande mzumsande force-pushed the 202105_addrfetch_fix branch from 8dbf2b2 to 96061d3 Compare June 17, 2021 20:09
@mzumsande
Copy link
Contributor Author

I changed the timeout to 10 * AVG_ADDRESS_BROADCAST_INTERVAL = 5min. Simulated this, and while the probability of failure (i.e. disconnecting before the peer has scheduled its getaddr answer) was ~1.8% before, it is now of order 10^-5 which I think is fine.

@amitiuttarwar
Copy link
Contributor

reACK 96061d3410898d4759488e50d1f78211849ede50

@naumenkogs
Copy link
Contributor

Concept ACK

@jnewbery
Copy link
Contributor

Code review ACK 96061d3410898d4759488e50d1f78211849ede50

Copy link
Contributor

@lsilva01 lsilva01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-ACK 96061d3

luke-jr pushed a commit to bitcoinknots/bitcoin that referenced this pull request Jun 27, 2021
Disconnecting an AddrFetch peer only after receiving an addr
message of size >1 prevents dropping them before
they had a chance to answer the getaddr request.

Github-Pull: bitcoin#22096
Rebased-From: b6c5d1e
luke-jr pushed a commit to bitcoinknots/bitcoin that referenced this pull request Jun 27, 2021
If AddrFetch peers don't send us addresses, disconnect them after
a while.

Github-Pull: bitcoin#22096
Rebased-From: 8ffaa1dcaef9652779ec2f167a40bcf25dc83e3d
luke-jr pushed a commit to bitcoinknots/bitcoin that referenced this pull request Jun 27, 2021
Co-authored-by: Amiti Uttarwar <[email protected]>

Github-Pull: bitcoin#22096
Rebased-From: 1e6a0fc930d197d0c5f4fc6dcf64fc44fd241b83
luke-jr pushed a commit to bitcoinknots/bitcoin that referenced this pull request Jun 27, 2021
Co-authored-by: Amiti Uttarwar <[email protected]>

Github-Pull: bitcoin#22096
Rebased-From: 8dbf2b210dbdaddc11295b926fefddca127d7e8a

const auto current_time = GetTime<std::chrono::microseconds>();

if (pto->IsAddrFetchConn() && current_time - std::chrono::seconds(pto->nTimeConnected) > 10 * AVG_ADDRESS_BROADCAST_INTERVAL) {
Copy link
Contributor

@naumenkogs naumenkogs Jul 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why linking this timing to AVG_ADDRESS_BROADCAST_INTERVAL? It seems irrelevant.
Also, 5 minutes feels like a lot, if we thinking of a bootstrapping node hitting buggy/sybil nodes again and again. Maybe 1 minute?

Copy link
Contributor

@naumenkogs naumenkogs Jul 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, I see the Poisson justification.

nit: Shouldn't it be linked to AVG_LOCAL_ADDRESS_BROADCAST_INTERVAL then? This is the relevant constant for the first Poisson delay, no?

Copy link
Contributor Author

@mzumsande mzumsande Jul 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so: AVG_LOCAL_ADDRESS_BROADCAST_INTERVAL=24h only deals with our self-announcements, which are then added to m_addrs_to_send. The frequency with which we actually send out the content of m_addrs_to_send is controlled by AVG_ADDRESS_BROADCAST_INTERVAL.

I initially chose a timeout of 2min instead of 5min, but that would have led to a premature disconnect in 1.8% of attempts, which seemed too high.
I think it makes sense to be a bit more conservative here, because we use AddrFetch only when we have specified the peer with -seednode or when we can't do DNS and thus connect manually to DNS seeds. We currently don't make AddrFetch connections to random nodes, and don't just choose another AddrFetch peer if we don't get an answer from the current one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK the second part justification.

Re first part: You are right, my confusion was because of 2 variables we have: m_next_addr_send and m_next_local_addr_send.

vAddrOk.push_back(addr);
}
m_addrman.Add(vAddrOk, pfrom.addr, 2 * 60 * 60);
if (vAddr.size() < 1000) peer->m_getaddr_sent = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work as intended though?

  1. Alice connects to Bob, each other send GETADDR, m_getaddr_sent=true
  2. Alice self-announces, Bob sets m_getaddr_sent=false
  3. Alice responds to GETADDR, m_getaddr_sent is already false, so the check relying on it is not really correct.

The check being incorrect seems to be not critical because vAddr.size() <= 10 is very likely to be false anyway (assuming sane addrman), but what's the point then?

What this check actually does is don't relay the first self-announcement. I don't think this is intended? Or if it is, it's very unclear from the codebase esp with that variable name. We could really benefit from a comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: what i'm asking here is not really related to the change of this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it doesn't work as intended! There was some earlier discussion about this (the flag was recently renamed from fGetAddr), see #21707 (comment) and you may recall #19794. Maybe it would be good to revisit this topic...

@mzumsande mzumsande force-pushed the 202105_addrfetch_fix branch from 96061d3 to 5730a43 Compare July 12, 2021 00:31

const auto current_time = GetTime<std::chrono::microseconds>();

if (pto->IsAddrFetchConn() && current_time - std::chrono::seconds(pto->nTimeConnected) > 10 * AVG_ADDRESS_BROADCAST_INTERVAL) {
Copy link
Contributor

@naumenkogs naumenkogs Jul 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm just pointing out here that this timeout also handles the case when they send us 1-addr messages (in a malicious case, this can be anything, not just self-announcement).

Currently I can't come up with a way why this is bad, but if we want to disable this unintended behaviour, we could disconnect addrfetch peers on the second 1-addr message too.

Ideally, this gets a comment at least :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@naumenkogs do I understand correctly that you're concerned about the case where an AddrFetch connection could repeatedly send addr messages with 1 address in them, until this timeout is hit?

The malicious case seems very unlikely- we only make AddrFetch connections to nodes submitted via -seednode, or to the DNS seeds in certain circumstances.

In a more ordinary case, I could imagine how the gossiping of self-announcements could lead to > 1 addr message with a size of 1. If the AddrFetch connection runs a Bitcoin Core node, it seems possible since this line waits approximately 10 intervals before timing out. But also I believe the seeders often run custom software and it doesn't make sense to me that they should be less effective if they were to participate in network gossip.

Conceptually, AddrFetch connections are trying to get a getaddr to be fulfilled, and a getaddr is a request for many addresses. To me, it makes sense for the requesting node to wait a certain amount of time for an acceptable response, and disconnect if that condition is not met. I'm not seeing why we would want to disconnect even earlier in this circumstance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The malicious case seems very unlikely- we only make AddrFetch connections to nodes submitted via -seednode, or to the DNS seeds in certain circumstances.

Yes indeed. For some reason I thought we do this to random nodes periodically. Maybe that's a feature I wanted to see, in some form :)
I agree that for current master it's much less of a concern. A comment might help for future when master changes.

If the AddrFetch connection runs a Bitcoin Core node, it seems possible since this line waits approximately 10 intervals before timing out.

I don't understand this part. Yeah it's 10 intervals of 30s, but self-announcement happens once every 24h. Very-very unlikely a second self-announcement appears within 300s.

But also I believe the seeders often run custom software and it doesn't make sense to me that they should be less effective if they were to participate in network gossip.

I also don't understand this part, what does it have to do with them participating in gossip?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was entertaining the idea of when a node might send multiple addr messages of size 1 before fulfilling the getaddr message. Although bitcoin core nodes will only initiate a self announcement once every ~24 hours, they will relay self announcements from other nodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's also not really more malicious than when they send us nothing at all: The worst thing I could think of that they might do is to painstakingly send us more than 1000 addresses this way.

I'm not sure what kind of comment would help, I'd rather not describe the AddrFetch success-case logic in detail here at the failure case, because if we adjust that logic, the comment will most likely be forgotten.
Maybe something non-specific like "Timeout for AddrFetch peers for when they don't sent us a suitable getaddr response (in which case we'd immediately disconnect)"?

Copy link
Contributor

@jnewbery jnewbery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 5730a43

@amitiuttarwar
Copy link
Contributor

reACK 5730a43

@naumenkogs
Copy link
Contributor

ACK 5730a43

@fanquake fanquake merged commit e4487fd into bitcoin:master Jul 20, 2021
@laanwj
Copy link
Member

laanwj commented Jul 20, 2021

Posthumous ACK 5730a43

sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Jul 23, 2021
…cements

5730a43 test: Add functional test for AddrFetch connections (Martin Zumsande)
c34ad33 net, rpc: Enable AddrFetch connections for functional testing (Martin Zumsande)
533500d p2p: Add timeout for AddrFetch peers (Martin Zumsande)
b6c5d1e p2p: AddrFetch - don't disconnect on self-announcements (Martin Zumsande)

Pull request description:

  AddrFetch connections (old name: oneshots) are intended to be short-lived connections on which we ask a peer for addresses via `getaddr` and disconnect after receiving them.

  This is done by disconnecting after receiving the first `addr`. However, it is no longer working as intended, because nowadays, the first `addr` a typical bitcoin core node sends is its self-announcement.
  So we'll disconnect before the peer gets a chance to answer our `getaddr`.

  I checked that this affects both `-seednode` peers specified manually, and DNS seeds when AddrFetch is used as a fallback if DNS doesn't work for us.

  The current behavior of getting peers via AddrFetch when starting with an empty addrman would be to connect to the peer, receive its self-announcement and add it to addrman, disconnect, reconnect to the same peer again as a full outbound (no other addresses in addrman) and then receive more `addr`. This is silly and not in line with AddrFetch peer being intended to be short-lived peers. 

  Fix this by only disconnecting after receiving an `addr` message of size > 1.

  [Edit] As per review discussion, this PR now also adds a timeout after which we disconnect if we haven't received any suitable `addr`, and a functional test.

ACKs for top commit:
  amitiuttarwar:
    reACK 5730a43
  naumenkogs:
    ACK 5730a43
  jnewbery:
    ACK 5730a43

Tree-SHA512: 8a81234f37e827705138eb254223f7f3b3bf44a06cb02126fc7990b0d231b9bd8f07d38d185cc30d55bf35548a6fdc286b69602498d875b937e7c58332158bf9

// Max connections of specified type already exist
if (existing_connections >= max_connections) return false;
if (max_connections != std::nullopt && existing_connections >= max_connections) return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it slightly confusing to check for std::nullopt, but then use the >= operator on std::optional. That operator will again check for std::nullopt. Funnily that operator will evaluate to true if max_connections was std::nullopt, which it can't be here. It might be more clear to just use the >= operator on int.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point - do I understand it correctly that you suggest to use existing_connections >= max_connections.value() in the comparison instead? I could do that as a follow-up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*max_connections should be enough because an exception can't be thrown

Copy link
Member

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Post-humous ACK 5730a43


self.log.info("Check timeout for addr-fetch peer that does not send addrs")
peer = node.add_outbound_p2p_connection(P2PInterface(), p2p_idx=1, connection_type="addr-fetch")
node.setmocktime(int(time.time()) + 301) # Timeout: 5 minutes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The state just before the timeout could be checked to ensure the disconnection only occurs after the 5 minute threshold.

maflcko pushed a commit to maflcko/bitcoin-core that referenced this pull request Aug 19, 2021
…imeout coverage

f8d8eb5 test: add addr-fetch timeout connection coverage in p2p_addrfetch.py (Jon Atack)
9321086 test: add assert_getpeerinfo method and coverage in p2p_addrfetch.py (Jon Atack)

Pull request description:

  This patch adds additional addr-fetch peer connection state and timeout coverage as a follow-up to bitcoin#22096.

ACKs for top commit:
  Saviour1001:
    Tested ACK <code>[f8d8eb5](https://github.com/bitcoin/bitcoin/pull/22568/commits/f8d8eb5fdaa93b6e5b77fd901b94927dc3a0473e)</code>
  mzumsande:
    Code review ACK f8d8eb5

Tree-SHA512: 9a13a705d1da6b308d6dcbc6930575205e2e88bfe9f2e7cb4e0c4c40d26538430e6b02c6c772d0cee64e534777348291469a139f99afbf9d4f93f31b9e7b0818
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Jan 28, 2022
Summary:
```
AddrFetch connections (old name: oneshots) are intended to be short-lived connections on which we ask a peer for addresses via getaddr and disconnect after receiving them.

This is done by disconnecting after receiving the first addr. However, it is no longer working as intended, because nowadays, the first addr a typical bitcoin core node sends is its self-announcement.
So we'll disconnect before the peer gets a chance to answer our getaddr.

I checked that this affects both -seednode peers specified manually, and DNS seeds when AddrFetch is used as a fallback if DNS doesn't work for us.

The current behavior of getting peers via AddrFetch when starting with an empty addrman would be to connect to the peer, receive its self-announcement and add it to addrman, disconnect, reconnect to the same peer again as a full outbound (no other addresses in addrman) and then receive more addr. This is silly and not in line with AddrFetch peer being intended to be short-lived peers.

Fix this by only disconnecting after receiving an addr message of size > 1.

[Edit] As per review discussion, this PR now also adds a timeout after which we disconnect if we haven't received any suitable addr, and a functional test.
```

Backport of [[bitcoin/bitcoin#22096 | core#22096]].

Ref T1696.

Test Plan:
  ninja all check-all

Reviewers: #bitcoin_abc, PiRK

Reviewed By: #bitcoin_abc, PiRK

Maniphest Tasks: T1696

Differential Revision: https://reviews.bitcoinabc.org/D10925
gwillen pushed a commit to ElementsProject/elements that referenced this pull request Jun 1, 2022
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Aug 18, 2022
@mzumsande mzumsande deleted the 202105_addrfetch_fix branch October 13, 2023 15:21
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.