-
Notifications
You must be signed in to change notification settings - Fork 38.8k
p2p: skip netgroup diversity of new connections for tor/i2p/cjdns #27374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsNo conflicts as of last run. |
sdaftuar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK.
I think the other way we could write this would be to leave setConnected unchanged, and change the logic where we look up in setConnected to only do the lookup for non-IPV4/IPV6 peers. It doesn't seem to me like it matters much which way we do this, since I believe netgroups for different networks cannot collide, so this looks fine to me.
src/net.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: If you touch this again, I think it would be worth adding a comment explaining why this code is here (and probably also add a comment to where the lookup in setConnected happens down below, near line 1806, so that if someone is touching the code in just one place they are aware of this issue).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! done.
mzumsande
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
I think the same logic needs to be applied to the anchors in
Lines 1803 to 1805 in 328087d
| if (!addr.IsValid() || IsLocal(addr) || !IsReachable(addr) || | |
| !HasAllDesirableServiceFlags(addr.nServices) || | |
| setConnected.count(m_netgroupman.GetGroup(addr))) continue; |
Otherwise we could choose e.g. two onion anchors from the same onion netgroup, but after a restart refuse to reconnect to one, which defeats the point of anchors.
I think the other way we could write this would be to leave setConnected unchanged, and change the logic where we look up in setConnected to only do the lookup for non-IPV4/IPV6 peers.
setConnected is used also at
Line 1890 in 328087d
| OpenNetworkConnection(addrConnect, (int)setConnected.size() >= std::min(nMaxConnections - 1, 2), &grant, nullptr, conn_type); |
to determine the
fCountFailure bool, which tells addrman to possibly deprioritize selecting an address after various failed attempts (nAttempts). As far as I can see, this logic should equally be applied to addrs from alt networks, so it might be better to leave setConnected unchanged as sdaftuar suggested?
src/net.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about putting this into a seperate function (something like bool LimitOutboundByNetgroup(Network n)), and using a switch statement there, similar to the existing IsAddrV1Compatible()? That way, the compiler would prevent us from forgetting about this if we ever add another network.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! done.
|
Does this PR trick the code to behave as if we are not connected to Tor (by not inserting into I feel that we should lookup PS This is very much related to #27213 |
i didn't add tor/i2p/cjdns peer addresses in
@mzumsande, @vasild that's true! summarising the 2 options suggested:
curious to know what everyone thinks about option 2? (otherwise we can go with option 1 for now) |
542ac83 to
8b65760
Compare
|
updated the PR to do option 1:
still curious to know opinions about increasing 4 bits to 5 bits in |
A new circular dependency in the form of "netaddress -> netbase -> netaddress" appears to have been introduced.
^---- failure generated from lint-circular-dependencies.py |
@lightlike Good observation, I missed this. I think that bit of code that we're using in our So I think the right change here is to modify that line to just sum the number of outbound connections (full-relay, block-relay, addr-fetch, manual, and feelers) and use that in place of
@vasild While it's possible that we'd make all 8 of our outbound connections to Tor peers, I think this is an unlikely problem, because we limit the number of Tor connections to be a fraction of our addrman (I can't recall what that fraction is though, can someone remind us?). I believe that an adversary looking to exploit this can really only do so for (eg) new nodes that have a relatively small addrman, by flooding such a node with onion addresses; but that attack is already available today (since we have 16 different onion netgroups, an adversary creating a zillion onion addresses to eclipse a victim will just have the ones that are in different netgroups be the ones that eclipse a peer, instead of ones in the same netgroup). So I don't think that this PR makes that problem any worse, and we can wait until we have something like #27213 to solve this problem more generally.
I think this is unnecessary and would be counter to what we're trying to fix here. If a node is running on both ipv4 and tor, I think we should still not require netgroup diversity across the tor peers, because such diversity is meaningless. And as I explained above I don't think the implicit limiting of tor peers that might arise from this would be needed or desirable.
This would double the number of addrman slots that can be used by Tor peers, right? My off the cuff guess is that this would be undesirable, but I haven't gone back to figure out what fraction of the addrman this would be yet. |
src/netaddress.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mention in my comment elsewhere, I think we can drop this logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, see below.
src/netaddress.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should generally avoid adding assert() to networking code, unless we're sure that it's really better to crash than to continue, eg because the node is in an inconsistent or corrupt state and is unable to continue running. That doesn't seem to be the case here, so I think we could use Assume(false) instead, so that we only get crashes in debug builds and our CI environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this assert is hit, it means that somebody has added a new entry in enum Network, has forgotten to update this function, has ignored the compiler warning and has ignored the CI failure. Or a memory corruption has occurred and CNetAddr::m_net has been overwritten with random bytes. I think assert(false) is the appropriate response in both cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really disagree on this point. We have had many examples over the years of asserts that people have added to our networking code for reasons that seem justifiable (along the lines of the reasons you've given), only to discover later that (usually due to an unexpected combination of code changes elsewhere) we ended up opening up a remote crash bug in deployed software that can take down the network -- in situations where we could easily have written code more defensively so that our software wouldn't crash.
We have Assume() exactly to avoid situations where a crash is worse than trying to gracefully recover, and we should err on the side of using that, particularly in networking code that could be triggered by an adversary against the whole network.
src/netaddress.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this method belongs to CConnman, something like:
bool CConnman::AllowOutboundByNetgroup(const std::set<std::vector<unsigned char>>& connected_netgroups,
const CAddress& addr)
{
return connected_netgroups.count(m_netgroupman.GetGroup(addr)) == 0 ||
(gArgs.GetArgs("-onlynet").size() == 1 && IsReachable(addr.GetNetwork()) &&
(addr.IsTor() || addr.IsI2P() || addr.IsCJDNS()));
} |
RE
looks like this code was introduced in #8065, with the primary motivation stated as preventing inaccurate incrementing of the addrman counter when a node is offline. this reinforces what I suspected from reading the code, so the suggestion to update the logic to the sum of outbound connections makes sense however, the OP of that post also states:
which, I don't fully understand. is this referring to a scenario where a node is only able to connect to other nodes within a local network environment, so would have frequent failures when trying to connect to the bitcoin network at large? and if this is the case, how could we distinguish this behavior pattern from a node running only on Tor and having all outbound peers from the same netgroup? RE 4 bits to 5 bits in
The implications of changing from 4 to 5 bits in |
mzumsande
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks to me that trying to solve the problem of "cannot open many connections to Tor" this would create another problem: "too many connections to Tor" - it would allow all 8 outbound to be to Tor and will not try to diversify to IPv4 (or other networks).
@vasild While it's possible that we'd make all 8 of our outbound connections to Tor peers, I think this is an unlikely problem, because we limit the number of Tor connections to be a fraction of our addrman (I can't recall what that fraction is though, can someone remind us?). I believe that an adversary looking to exploit this can really only do so for (eg) new nodes that have a relatively small addrman, by flooding such a node with onion addresses; but that attack is already available today (since we have 16 different onion netgroups, an adversary creating a zillion onion addresses to eclipse a victim will just have the ones that are in different netgroups be the ones that eclipse a peer, instead of ones in the same netgroup).
I agree with sdaftuar. I think that keeping the netgroup-based limiting in the case where we can connect to more than 1 network is unnecessary, because the current AddrMan logic puts the alternative networks already at a disadvantage compared to clearnet, and the current netgroup just amplifies that by sometimes not picking an alt-net peer and then picking another clearnet peer instead.
So while having too many alt-net peers is not typically an issue in non-adverse scenarios, the netgroup restriction doesn't even really help that much in an adverse scenario of an attacker trying to eclipse us, because with 10 outbound peers and 16 netgroups, having all alt-net peers would still be possible (unless we also have manual connections!)
(I can't recall what that fraction is though, can someone remind us?)
From a single fixed source, we can fill 16 buckets (1024 addrs) with onion addresses - that is, for example the case for inbound onion peers, because we see them as 127.0.0.1. We can fill 64 buckets with IPv4 or IPv6 addresses from a single source.
Counting multiple sources from the same alternative network, they can fill up a theoretical maximum of 256 buckets (~16k addresses) or 1/4 of the new tables I think (but due to collisions from the the modulo operations in GetNewBucket and GetBucketPosition that depend on nKey, it is typically more like ~200 buckets in reality).
On the other hand, multiple IPv4 and IPv6 sources can fill up all 1024 buckets in the new table.
which, I don't fully understand. is this referring to a scenario where a node is only able to connect to other nodes within a local network environment, so would have frequent failures when trying to connect to the bitcoin network at large? and if this is the case, how could we distinguish this behavior pattern from a node running only on Tor and having all outbound peers from the same netgroup?
Yes, I think that's the scenario - a temporary network failure that wouldn't apply to peers you are locally connected to. I don't think it's a problem if you are running -onlynet=onion because even if the peer you are connected to is within your network, you would fail to connect to it because you wouldn't be connecting to them directly but through onion routing (which would fail if the internet is down).
src/netaddress.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, see below.
@amitiuttarwar Thanks for finding this, I hadn't thought about the issue of wanting to distinguish between local and non-local IPV4 networks for the purposes of detecting whether a node's network connectivity is down. This means my suggestion of just summing outbound peers, rather than looking at netgroup diversity of peers, isn't great. I now think we could sum the IPV4/IPV6 entries in setConnected and add that to the total number of outbound alt-network peers (basically treating every alt-network peer as a distinct entry in setConnected), and use that for this check? EDIT: for the avoidance of doubt, I wrote a patch to implement my suggested approach here. |
|
Maybe also add a short explanation before the |
8b65760 to
f85e365
Compare
…works Co-authored-by: Suhas Daftuar <[email protected]>
f85e365 to
b5585ba
Compare
|
Thanks a lot @sdaftuar! I've used the simplified approach in your patch!
tried a similar calculation for tried tables - which depends on netgroup of tor address(16 group possibilities) and in an ideal theoretical ipv4 + tor scenario, the chance of all 8 outbound connections turning out to be tor would be = (50% new + 50 % tried)^8
i guess these are the upper bound probability of this scenario happening. it is anyways covered in the comments above related to how netgroup restrictions for Tor/I2P/CJDNS don't make much difference in adverse situations.
thanks for explaining! makes sense, noticed how this would result in tor/i2p/cjdns addresses filling the entire addrman.
I liked this approach!
since we include manual connections in
|
mzumsande
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK b5585ba
I reviewed the code and verified that we now make outgoing connections to peers with lexicographically adjacent onion addresses.
vasild
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK b5585ba
I think this fixes the problem described in #27264 (comment) and is ok to be merged in its current form.
Could it create another problem of connecting too much to Tor and not try to diversify to IPv4? Yes. Is this unlikely problem? Yes, with a common addrman that is filled mostly with IPv4 addresses. With a non-common addrman? I do not know.
| // Record addrman failure attempts when node has at least 2 persistent outbound connections to peers with | ||
| // different netgroups in ipv4/ipv6 networks + all peers in Tor/I2P/CJDNS networks. | ||
| // Don't record addrman failure attempts when node is offline. This can be identified since all local | ||
| // network connections(if any) belong in the same netgroup and size of setConnected would only be 1. | ||
| OpenNetworkConnection(addrConnect, (int)setConnected.size() + outbound_privacy_network_peers >= std::min(nMaxConnections - 1, 2), &grant, nullptr, conn_type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This (int)setConnected.size() >= std::min(nMaxConnections - 1, 2) was introduced in c769c4a.
I think that for the purposes of checking whether we are offline, being connected to just one Tor peer means that we are not. This is because the Tor network is global (same for I2P or CJDNS). There is no such thing as "local area network" of Tor nodes (e.g. 10.0.0.0/24) or localhost Tor address (like 127.0.0.1 in IPv4). If we are connected to one Tor node, it means we are not offline and can possibly connect to other Tor nodes as well.
Maybe out of the scope of this PR, but would it not be better to change that logic to: "if we are connected to at least one CNetAddr::IsRoutable() IPv4/6 address or at least one Tor address or at least one I2P or at least one CJDNS address, then we are not offline"?
cc @gmaxwell
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #8065, @gmaxwell pointed out:
This is still somewhat incomplete protection because our outbound peers could be down but not timed out
We never really know if our peers are connected or not, because when your network goes down it usually takes some amount of time for each connection to time out. Requiring two connections to be up (versus just 1 connection to e.g. Tor, or some other non-local ipv4/ipv6 network) is a tradeoff around how long we spend in a state where we're punishing addrman entries while we might be offline and just haven't noticed it yet, vs not punishing entries for being down when our network is up.
My guess is that most nodes on the network don't have local network peers, and so the effect of this logic for most users since it was deployed in #8065 -- requiring 2 non-local outbound connections before we start punishing addrman entries for failed attempts -- will be the same after this PR is merged.
If someone can figure out that it's more optimal for this to be 1 or 3 or some other value, I think we can pick that up in a new PR; however I'd suspect it doesn't matter too much, and that 2 is likely better than 1...
|
Concept ACK, will review. |
|
ACK b5585ba |
jonatack
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Post-merge ACK.
I am not completely sure the additional complexity involved is worth it versus reverting 72e8ffd (#27264), but the change itself here seems fine, modulo a couple of minor loose ends.
if we run a node only on tor/i2p/cjdns
Via interactions with users, my understanding is that it may possibly be fairly often the case that node operators are running on non-clearnet networks only. (See also these polls for fun: users https://twitter.com/jonatack/status/1629173539067527171 / bitcoin devs https://twitter.com/jonatack/status/1631717328416014336).
Perhaps increasingly so, based on the increase in the number of Tor v3 and I2P peers reported by -addrinfo over the past year or so, now up to 15k onion and 1300 I2P peers, i.e. recently active / non-IsTerrible ones. (I2P peers in particular increased significantly recently after 2 node-in-a-box packages added support in December 2022.)
| int nOutboundBlockRelay = 0; | ||
| std::set<std::vector<unsigned char> > setConnected; | ||
| int outbound_privacy_network_peers = 0; | ||
| std::set<std::vector<unsigned char>> setConnected; // netgroups of our ipv4/ipv6 outbound peers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code doc at https://github.com/bitcoin/bitcoin/pull/27374/files#diff-00021eed586a482abdb09d6cdada1d90115abe988a91421851960e26658bed02R1858 is now out of date.
The naming of setConnected should probably be updated to its new semantics.
Addressed in #27467.
11bb31c p2p: "skip netgroup diversity of new connections for tor/i2p/cjdns" follow-up (Jon Atack) Pull request description: In #27374 the role of the `setConnected` data structure in `CConnman::ThreadOpenConnections` changed from the set of outbound peer netgroups to those of outbound IPv4/6 peers only. In accordance with the changed semantics, this pull fixes a code comment regarding feeler connections and updates the naming of `setConnected` to `outbound_ipv46_peer_netgroups`. Addresses #27374 (comment). ACKs for top commit: mzumsande: Code Review ACK 11bb31c vasild: ACK 11bb31c ryanofsky: Code review ACK 11bb31c Tree-SHA512: df9151a6cce53c279e549683a9f30fdc23d513dc664cfee1cf0eb8ec80b2848d32c80a92cc0a9f47d967f305864975ffb339fe0eaa80bc3bef1b28406419eb96
Follow up for #27264.
In order to make sure that our persistent outbound slots belong to different netgroups, distinct net groups of our peers are added to
setConnected. We’d only open a persistent outbound connection to peers which have a different netgroup compared to those netgroups present insetConnected.Current
GetGroup()logic assumes route-based diversification behaviour for tor/i2p/cjdns addresses (addresses are public key based and not route-based). Distinct netgroups possible (according to the currentGetGroup()logic) for:setConnectedis used inThreadOpenConnections()before making outbound and anchor connections to new peers so that they belong to distinct netgroups.behaviour on master
MAX_OUTBOUND_FULL_RELAY_CONNECTIONSto 17 withonlynet=onionand observed how node wouldn't make more than 16 outbound connections.behaviour on PR
setConnectedandGetGroupdoesn't get called on tor/i2p/cjdns(see Fix net grouping of non-IP networks #27369)