-
Notifications
You must be signed in to change notification settings - Fork 38.8k
p2p: AddrFetch - don't disconnect on self-announcements #22096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Disconnecting an AddrFetch peer only after receiving an addr message of size >1 prevents dropping them before they had a chance to answer the getaddr request.
162f229 to
b6c5d1e
Compare
|
I agree this makes sense. While you're improving the addrfetch logic, perhaps we could also add a timeout for disconnection so that if an addrfetch peer hasn't responded to our |
amitiuttarwar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch! concept ACK
half baked thought: Wondering if it would make sense to combine this value with the one used here. In my understanding, both are trying to use the number of addrs to discern [response to getaddr] vs [probably a self-announcement]. We could introduce a constant that's used in both places.
I'm also wondering if we can add functional test coverage for AddrFetch connections. I think the groundwork is in place with the outbound-full-relay and block-relay-only connections. We might just be able to add the addr-fetch option to TestNode.add_outbound_p2p_connection, the addconnection RPC, and CConnman::AddConnection. But I only took an initial look & this certainly does not need to happen in this PR.
|
Concept ACK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested ACK b6c5d1e.
I ran the node with the args -debug=net -dnsseed=0 -seednode=<IP> -fixedseeds=0 using the master branch and this PR branch (table below).
The way ADDR_FETCH is used today to fetch new address from a seed IP seems not work as expected, since the first message is only self-announcement and the ADDR message with 1000 addresses is received via OUTBOUND_FULL_RELAY from the same peer.
As shown in the second table, after the proposed change in this PR, the first ADDR_FETCH connection actually fetches for addresses.
Master Branch results:
| From IP | Connection Type | vAddr size |
AddrMan size |
self-announcement |
|---|---|---|---|---|
seed_ip |
ADDR_FETCH |
1 | 0 | yes |
seed_ip |
OUTBOUND_FULL_RELAY |
1 | 1 | yes |
seed_ip |
OUTBOUND_FULL_RELAY |
1000 | 1 | no |
new_ip |
OUTBOUND_FULL_RELAY |
1 | 870 | yes |
PR Branch results:
| From IP | Connection Type | vAddr size |
AddrMan size |
self-announcement |
|---|---|---|---|---|
seed_ip |
ADDR_FETCH |
1 | 0 | yes |
seed_ip |
ADDR_FETCH |
1000 | 1 | no |
new_ip |
OUTBOUND_FULL_RELAY |
1 | 870 | yes |
Yes, I think that would be useful, and I will add this. I think such an addrfetch peer also wouldn't count toward the max full outbound limit in spite of effectively behaving like one. As for the timeout, I'd suggest a value
Do you suggest just using
Good idea! I think it would make sense to combine this with @sdaftuar 's suggestion of a timeout, so that the timout will be tested. |
|
I added the timeout and a functional test now. Thanks a lot @amitiuttarwar for help with the functional test! |
630e499 to
d728c6c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach ACK d728c6c8717960d4cbc5c2ea10833ce997d7ae1b some optional suggestions below, nice find and thanks for adding the test coverage.
be25888 to
eda1064
Compare
|
Thanks for the review @jonatack! I took your suggestions and pushed an update. |
11a0bb7 to
535c990
Compare
|
utACK 535c990e16c6c191aa1b484b800a16bf5e4d0765 |
jnewbery
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 535c990e16c6c191aa1b484b800a16bf5e4d0765
One suggestion to simplify the test
535c990 to
8dbf2b2
Compare
|
Code review ACK 8dbf2b210dbdaddc11295b926fefddca127d7e8a |
amitiuttarwar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code review ACK 8dbf2b210d
|
Analyzing #22243 gave me the concern that a timeout of |
If AddrFetch peers don't send us addresses, disconnect them after a while.
8dbf2b2 to
96061d3
Compare
|
I changed the timeout to |
|
reACK 96061d3410898d4759488e50d1f78211849ede50 |
|
Concept ACK |
|
Code review ACK 96061d3410898d4759488e50d1f78211849ede50 |
lsilva01
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-ACK 96061d3
Disconnecting an AddrFetch peer only after receiving an addr message of size >1 prevents dropping them before they had a chance to answer the getaddr request. Github-Pull: bitcoin#22096 Rebased-From: b6c5d1e
If AddrFetch peers don't send us addresses, disconnect them after a while. Github-Pull: bitcoin#22096 Rebased-From: 8ffaa1dcaef9652779ec2f167a40bcf25dc83e3d
Co-authored-by: Amiti Uttarwar <[email protected]> Github-Pull: bitcoin#22096 Rebased-From: 1e6a0fc930d197d0c5f4fc6dcf64fc44fd241b83
Co-authored-by: Amiti Uttarwar <[email protected]> Github-Pull: bitcoin#22096 Rebased-From: 8dbf2b210dbdaddc11295b926fefddca127d7e8a
|
|
||
| const auto current_time = GetTime<std::chrono::microseconds>(); | ||
|
|
||
| if (pto->IsAddrFetchConn() && current_time - std::chrono::seconds(pto->nTimeConnected) > 10 * AVG_ADDRESS_BROADCAST_INTERVAL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why linking this timing to AVG_ADDRESS_BROADCAST_INTERVAL? It seems irrelevant.
Also, 5 minutes feels like a lot, if we thinking of a bootstrapping node hitting buggy/sybil nodes again and again. Maybe 1 minute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, I see the Poisson justification.
nit: Shouldn't it be linked to AVG_LOCAL_ADDRESS_BROADCAST_INTERVAL then? This is the relevant constant for the first Poisson delay, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so: AVG_LOCAL_ADDRESS_BROADCAST_INTERVAL=24h only deals with our self-announcements, which are then added to m_addrs_to_send. The frequency with which we actually send out the content of m_addrs_to_send is controlled by AVG_ADDRESS_BROADCAST_INTERVAL.
I initially chose a timeout of 2min instead of 5min, but that would have led to a premature disconnect in 1.8% of attempts, which seemed too high.
I think it makes sense to be a bit more conservative here, because we use AddrFetch only when we have specified the peer with -seednode or when we can't do DNS and thus connect manually to DNS seeds. We currently don't make AddrFetch connections to random nodes, and don't just choose another AddrFetch peer if we don't get an answer from the current one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK the second part justification.
Re first part: You are right, my confusion was because of 2 variables we have: m_next_addr_send and m_next_local_addr_send.
| vAddrOk.push_back(addr); | ||
| } | ||
| m_addrman.Add(vAddrOk, pfrom.addr, 2 * 60 * 60); | ||
| if (vAddr.size() < 1000) peer->m_getaddr_sent = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work as intended though?
- Alice connects to Bob, each other send GETADDR, m_getaddr_sent=true
- Alice self-announces, Bob sets
m_getaddr_sent=false - Alice responds to GETADDR,
m_getaddr_sentis already false, so the check relying on it is not really correct.
The check being incorrect seems to be not critical because vAddr.size() <= 10 is very likely to be false anyway (assuming sane addrman), but what's the point then?
What this check actually does is don't relay the first self-announcement. I don't think this is intended? Or if it is, it's very unclear from the codebase esp with that variable name. We could really benefit from a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: what i'm asking here is not really related to the change of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, it doesn't work as intended! There was some earlier discussion about this (the flag was recently renamed from fGetAddr), see #21707 (comment) and you may recall #19794. Maybe it would be good to revisit this topic...
Co-authored-by: Amiti Uttarwar <[email protected]>
Co-authored-by: Amiti Uttarwar <[email protected]>
96061d3 to
5730a43
Compare
|
|
||
| const auto current_time = GetTime<std::chrono::microseconds>(); | ||
|
|
||
| if (pto->IsAddrFetchConn() && current_time - std::chrono::seconds(pto->nTimeConnected) > 10 * AVG_ADDRESS_BROADCAST_INTERVAL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'm just pointing out here that this timeout also handles the case when they send us 1-addr messages (in a malicious case, this can be anything, not just self-announcement).
Currently I can't come up with a way why this is bad, but if we want to disable this unintended behaviour, we could disconnect addrfetch peers on the second 1-addr message too.
Ideally, this gets a comment at least :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@naumenkogs do I understand correctly that you're concerned about the case where an AddrFetch connection could repeatedly send addr messages with 1 address in them, until this timeout is hit?
The malicious case seems very unlikely- we only make AddrFetch connections to nodes submitted via -seednode, or to the DNS seeds in certain circumstances.
In a more ordinary case, I could imagine how the gossiping of self-announcements could lead to > 1 addr message with a size of 1. If the AddrFetch connection runs a Bitcoin Core node, it seems possible since this line waits approximately 10 intervals before timing out. But also I believe the seeders often run custom software and it doesn't make sense to me that they should be less effective if they were to participate in network gossip.
Conceptually, AddrFetch connections are trying to get a getaddr to be fulfilled, and a getaddr is a request for many addresses. To me, it makes sense for the requesting node to wait a certain amount of time for an acceptable response, and disconnect if that condition is not met. I'm not seeing why we would want to disconnect even earlier in this circumstance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The malicious case seems very unlikely- we only make AddrFetch connections to nodes submitted via -seednode, or to the DNS seeds in certain circumstances.
Yes indeed. For some reason I thought we do this to random nodes periodically. Maybe that's a feature I wanted to see, in some form :)
I agree that for current master it's much less of a concern. A comment might help for future when master changes.
If the AddrFetch connection runs a Bitcoin Core node, it seems possible since this line waits approximately 10 intervals before timing out.
I don't understand this part. Yeah it's 10 intervals of 30s, but self-announcement happens once every 24h. Very-very unlikely a second self-announcement appears within 300s.
But also I believe the seeders often run custom software and it doesn't make sense to me that they should be less effective if they were to participate in network gossip.
I also don't understand this part, what does it have to do with them participating in gossip?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was entertaining the idea of when a node might send multiple addr messages of size 1 before fulfilling the getaddr message. Although bitcoin core nodes will only initiate a self announcement once every ~24 hours, they will relay self announcements from other nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's also not really more malicious than when they send us nothing at all: The worst thing I could think of that they might do is to painstakingly send us more than 1000 addresses this way.
I'm not sure what kind of comment would help, I'd rather not describe the AddrFetch success-case logic in detail here at the failure case, because if we adjust that logic, the comment will most likely be forgotten.
Maybe something non-specific like "Timeout for AddrFetch peers for when they don't sent us a suitable getaddr response (in which case we'd immediately disconnect)"?
jnewbery
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 5730a43
|
reACK 5730a43 |
|
ACK 5730a43 |
|
Posthumous ACK 5730a43 |
…cements 5730a43 test: Add functional test for AddrFetch connections (Martin Zumsande) c34ad33 net, rpc: Enable AddrFetch connections for functional testing (Martin Zumsande) 533500d p2p: Add timeout for AddrFetch peers (Martin Zumsande) b6c5d1e p2p: AddrFetch - don't disconnect on self-announcements (Martin Zumsande) Pull request description: AddrFetch connections (old name: oneshots) are intended to be short-lived connections on which we ask a peer for addresses via `getaddr` and disconnect after receiving them. This is done by disconnecting after receiving the first `addr`. However, it is no longer working as intended, because nowadays, the first `addr` a typical bitcoin core node sends is its self-announcement. So we'll disconnect before the peer gets a chance to answer our `getaddr`. I checked that this affects both `-seednode` peers specified manually, and DNS seeds when AddrFetch is used as a fallback if DNS doesn't work for us. The current behavior of getting peers via AddrFetch when starting with an empty addrman would be to connect to the peer, receive its self-announcement and add it to addrman, disconnect, reconnect to the same peer again as a full outbound (no other addresses in addrman) and then receive more `addr`. This is silly and not in line with AddrFetch peer being intended to be short-lived peers. Fix this by only disconnecting after receiving an `addr` message of size > 1. [Edit] As per review discussion, this PR now also adds a timeout after which we disconnect if we haven't received any suitable `addr`, and a functional test. ACKs for top commit: amitiuttarwar: reACK 5730a43 naumenkogs: ACK 5730a43 jnewbery: ACK 5730a43 Tree-SHA512: 8a81234f37e827705138eb254223f7f3b3bf44a06cb02126fc7990b0d231b9bd8f07d38d185cc30d55bf35548a6fdc286b69602498d875b937e7c58332158bf9
|
|
||
| // Max connections of specified type already exist | ||
| if (existing_connections >= max_connections) return false; | ||
| if (max_connections != std::nullopt && existing_connections >= max_connections) return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it slightly confusing to check for std::nullopt, but then use the >= operator on std::optional. That operator will again check for std::nullopt. Funnily that operator will evaluate to true if max_connections was std::nullopt, which it can't be here. It might be more clear to just use the >= operator on int.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point - do I understand it correctly that you suggest to use existing_connections >= max_connections.value() in the comparison instead? I could do that as a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*max_connections should be enough because an exception can't be thrown
jonatack
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Post-humous ACK 5730a43
|
|
||
| self.log.info("Check timeout for addr-fetch peer that does not send addrs") | ||
| peer = node.add_outbound_p2p_connection(P2PInterface(), p2p_idx=1, connection_type="addr-fetch") | ||
| node.setmocktime(int(time.time()) + 301) # Timeout: 5 minutes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The state just before the timeout could be checked to ensure the disconnection only occurs after the 5 minute threshold.
…imeout coverage f8d8eb5 test: add addr-fetch timeout connection coverage in p2p_addrfetch.py (Jon Atack) 9321086 test: add assert_getpeerinfo method and coverage in p2p_addrfetch.py (Jon Atack) Pull request description: This patch adds additional addr-fetch peer connection state and timeout coverage as a follow-up to bitcoin#22096. ACKs for top commit: Saviour1001: Tested ACK <code>[f8d8eb5](https://github.com/bitcoin/bitcoin/pull/22568/commits/f8d8eb5fdaa93b6e5b77fd901b94927dc3a0473e)</code> mzumsande: Code review ACK f8d8eb5 Tree-SHA512: 9a13a705d1da6b308d6dcbc6930575205e2e88bfe9f2e7cb4e0c4c40d26538430e6b02c6c772d0cee64e534777348291469a139f99afbf9d4f93f31b9e7b0818
Summary: ``` AddrFetch connections (old name: oneshots) are intended to be short-lived connections on which we ask a peer for addresses via getaddr and disconnect after receiving them. This is done by disconnecting after receiving the first addr. However, it is no longer working as intended, because nowadays, the first addr a typical bitcoin core node sends is its self-announcement. So we'll disconnect before the peer gets a chance to answer our getaddr. I checked that this affects both -seednode peers specified manually, and DNS seeds when AddrFetch is used as a fallback if DNS doesn't work for us. The current behavior of getting peers via AddrFetch when starting with an empty addrman would be to connect to the peer, receive its self-announcement and add it to addrman, disconnect, reconnect to the same peer again as a full outbound (no other addresses in addrman) and then receive more addr. This is silly and not in line with AddrFetch peer being intended to be short-lived peers. Fix this by only disconnecting after receiving an addr message of size > 1. [Edit] As per review discussion, this PR now also adds a timeout after which we disconnect if we haven't received any suitable addr, and a functional test. ``` Backport of [[bitcoin/bitcoin#22096 | core#22096]]. Ref T1696. Test Plan: ninja all check-all Reviewers: #bitcoin_abc, PiRK Reviewed By: #bitcoin_abc, PiRK Maniphest Tasks: T1696 Differential Revision: https://reviews.bitcoinabc.org/D10925
AddrFetch connections (old name: oneshots) are intended to be short-lived connections on which we ask a peer for addresses via
getaddrand disconnect after receiving them.This is done by disconnecting after receiving the first
addr. However, it is no longer working as intended, because nowadays, the firstaddra typical bitcoin core node sends is its self-announcement.So we'll disconnect before the peer gets a chance to answer our
getaddr.I checked that this affects both
-seednodepeers specified manually, and DNS seeds when AddrFetch is used as a fallback if DNS doesn't work for us.The current behavior of getting peers via AddrFetch when starting with an empty addrman would be to connect to the peer, receive its self-announcement and add it to addrman, disconnect, reconnect to the same peer again as a full outbound (no other addresses in addrman) and then receive more
addr. This is silly and not in line with AddrFetch peer being intended to be short-lived peers.Fix this by only disconnecting after receiving an
addrmessage of size > 1.[Edit] As per review discussion, this PR now also adds a timeout after which we disconnect if we haven't received any suitable
addr, and a functional test.