-
Notifications
You must be signed in to change notification settings - Fork 38.7k
p2p: Mitigate GETADDR fingerprinting by setting address timestamps to a fixed value #33498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33498. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. |
|
Concept ACK on working on this. Good to see proposals like this and #33464. |
mzumsande
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
The suggested solution reduces the precision of timestamps for the receiving party, but makes fingerprinting harder - seems like an acceptable trade-off to me.
If there is an active node behind the address it won't be Terrible for another ~20 days - until that happens, we could update it with a more accurate timestamp when we connect to it or receive the addr via gossip relay.
|
Concept ACK |
Co-authored-by: Daniela Brozzoni <[email protected]>
Co-authored-by: Daniela Brozzoni <[email protected]>
3a8b014 to
28fe8ec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this could lead to addrs of nodes that have left the network never getting removed:
The scenario is a well-connected node that is present in most other nodes' addrmans, but that suddenly leaves the network or changes its IP.
Currently what happens is that 1) It won't self-advertise anymore and 2) after 30 days, the addr gets Terrible and won't be part of GetAddr interactions between other nodes. So it won't be relayed by anyone after that - It will stay a while in most nodes' addrmans and eventually be replaced due to addrman collisions.
With this PR, if an address older than 10 days (but not older than 30 days) is part of a GetAddr answer, the receiving peer will postdate its timestamp, giving it more time until it gets Terrible. There is a chaining effect, because the receiving peer will also relay it longer when it answers GetAddr requests itself, and so on. If this would happen frequently enough, the addr of the node that has left the network would always stay in the 10-30 day range and never cease to be relayed, and we'd be stuck with it forever - which would be problematic, because with enough time and node turnover, everyone's addrman would get "full" and there would be no more room for actual new nodes.
The crucial question is if GetAddr interactions happen frequently enough for this to be an important effect.
Thanks, this is insightful and likely to happen. I was initially thinking the other checks( But it’s unlikely that there would be enough attempted connections before the address is sent again.
I don't have data on this yet - will grep and update soon |
|
Thank you Martin, I agree that the current solution has this issue. Our other approach would have run into the same problem. We had considered setting all timestamps to zero, but we didn’t implement it because we’re not sure if it’s compatible with btcd, as mentioned in the PR description. This approach has the same problem as the current one:
So it ends up having the same problem: the address looks newer than it actually is, and it sticks around longer than it should, potentially forever. I’ve been thinking about an alternative solution, though I’m not sure if it makes sense. Perhaps an approach similar to the current one, but range-based, could work. Instead of setting every timestamp to 10 +- 2 days ago, we could adjust it based on how old the original timestamp is. As a rough idea (numbers are just placeholders for now), we could generate three random reference timestamps once: one around 25+-2 days ago, one around 15+-2 days ago, and one around 5+-2 days ago; then assign each address to one of these depending on how old its original timestamp is:
The exact numbers would need more thought. The fingerprinting issue would still exist: if two identities correspond to the same node, they would share the same initial timestamp and end up in the same range. Still, if we choose reasonable values, a large number of nodes might end up sending approximately the same timestamps. |
|
Moved back to draft, working on addressing -> #33498 (review) |
When a node is connected to multiple networks (e.g., clearnet and Tor), it keeps separate ADDR response caches for each network. These are refreshed about once per day (randomized between 21 and 27 hours). See → https://github.com/bitcoin/bitcoin/blob/master/src/net.cpp#L3519
This is an example of a dual-homed node’s addrman response, with separate caches:
IPv4 response:
Tor response:
Currently, there's a fingerprinting attack that exploits responses to GETADDR messages. An attacker can collect responses from supposedly different nodes and compare the timestamps. By looking at overlaps in responses, they can correlate Tor and clearnet identities — effectively linking them back to the same node.
More details on this attack here: https://delvingbitcoin.org/t/fingerprinting-nodes-via-addr-requests/1786
This PR mitigates the attack by setting the timestamps in different caches to a fixed value in the past (10.5 ± 2.5 days), preventing correlation through timestamps.
After the change, timestamps in each cache are now uniform, but differ between caches:
IPv4 response:
Tor response:
We initially considered setting the timestamps to 0, since they would eventually be updated and saved. However, this isn’t compatible with btcd (see details here → btcsuite/btcd#2411 ).
This is still a work in progress — we’re continuing to test and are open to trying other solutions as well.
This is joint work with @danielabrozzoni