Skip to content

Conversation

@naiyoma
Copy link
Contributor

@naiyoma naiyoma commented Sep 29, 2025

When a node is connected to multiple networks (e.g., clearnet and Tor), it keeps separate ADDR response caches for each network. These are refreshed about once per day (randomized between 21 and 27 hours). See → https://github.com/bitcoin/bitcoin/blob/master/src/net.cpp#L3519
This is an example of a dual-homed node’s addrman response, with separate caches:

IPv4 response:

{102.130.242.11:8333 : 1741100851}
{119.17.151.161:8333 : 1739825443}
{ubeirqalc4vj54baszpatvctpohn62hzj7vfnbncy6upa6vywvgewrad.onion:8333 : 1739825443}

Tor response:

{102.130.242.11:8333 : 1741100851}
{94.23.167.176:8333 : 1740895525}
{ubeirqalc4vj54baszpatvctpohn62hzj7vfnbncy6upa6vywvgewrad.onion:8333 : 1739825443}

Currently, there's a fingerprinting attack that exploits responses to GETADDR messages. An attacker can collect responses from supposedly different nodes and compare the timestamps. By looking at overlaps in responses, they can correlate Tor and clearnet identities — effectively linking them back to the same node.
More details on this attack here: https://delvingbitcoin.org/t/fingerprinting-nodes-via-addr-requests/1786

This PR mitigates the attack by setting the timestamps in different caches to a fixed value in the past (10.5 ± 2.5 days), preventing correlation through timestamps.
After the change, timestamps in each cache are now uniform, but differ between caches:

IPv4 response:

{102.130.242.11:8333 : 1757964675}
{119.17.151.161:8333 : 1757964675}
{ubeirqalc4vj54baszpatvctpohn62hzj7vfnbncy6upa6vywvgewrad.onion:8333 : 1757964675}

Tor response:

{102.130.242.11:8333 : 1757878277}
{119.17.151.161:8333 : 1757878277}
{ubeirqalc4vj54baszpatvctpohn62hzj7vfnbncy6upa6vywvgewrad.onion:8333 : 1757878277}

We initially considered setting the timestamps to 0, since they would eventually be updated and saved. However, this isn’t compatible with btcd (see details here → btcsuite/btcd#2411 ).
This is still a work in progress — we’re continuing to test and are open to trying other solutions as well.

This is joint work with @danielabrozzoni

@DrahtBot DrahtBot added the P2P label Sep 29, 2025
@DrahtBot
Copy link
Contributor

DrahtBot commented Sep 29, 2025

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33498.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept ACK jonatack, mzumsande, sipa

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

@jonatack
Copy link
Member

Concept ACK on working on this. Good to see proposals like this and #33464.

Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

The suggested solution reduces the precision of timestamps for the receiving party, but makes fingerprinting harder - seems like an acceptable trade-off to me.

If there is an active node behind the address it won't be Terrible for another ~20 days - until that happens, we could update it with a more accurate timestamp when we connect to it or receive the addr via gossip relay.

@sipa
Copy link
Member

sipa commented Oct 1, 2025

Concept ACK

@naiyoma naiyoma force-pushed the 2025_3/getaddr_timestamp_changes branch from 3a8b014 to 28fe8ec Compare October 3, 2025 11:48
@naiyoma naiyoma marked this pull request as ready for review October 8, 2025 12:27
Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this could lead to addrs of nodes that have left the network never getting removed:

The scenario is a well-connected node that is present in most other nodes' addrmans, but that suddenly leaves the network or changes its IP.

Currently what happens is that 1) It won't self-advertise anymore and 2) after 30 days, the addr gets Terrible and won't be part of GetAddr interactions between other nodes. So it won't be relayed by anyone after that - It will stay a while in most nodes' addrmans and eventually be replaced due to addrman collisions.

With this PR, if an address older than 10 days (but not older than 30 days) is part of a GetAddr answer, the receiving peer will postdate its timestamp, giving it more time until it gets Terrible. There is a chaining effect, because the receiving peer will also relay it longer when it answers GetAddr requests itself, and so on. If this would happen frequently enough, the addr of the node that has left the network would always stay in the 10-30 day range and never cease to be relayed, and we'd be stuck with it forever - which would be problematic, because with enough time and node turnover, everyone's addrman would get "full" and there would be no more room for actual new nodes.
The crucial question is if GetAddr interactions happen frequently enough for this to be an important effect.

@naiyoma
Copy link
Contributor Author

naiyoma commented Oct 16, 2025

With this PR, if an address older than 10 days (but not older than 30 days) is part of a GetAddr answer, the receiving peer will postdate its timestamp, giving it more time until it gets Terrible. There is a chaining effect, because the receiving peer will also relay it longer when it answers GetAddr requests itself, and so on. If this would happen frequently enough, the addr of the node that has left the network would always stay in the 10-30 day range and never cease to be relayed, and we'd be stuck with it forever - which would be problematic, because with enough time and node turnover, everyone's addrman would get "full" and there would be no more room for actual new nodes.

Thanks, this is insightful and likely to happen. I was initially thinking the other checks( m_last_success ) might still be useful to mark zombie addresses as terrible.

But it’s unlikely that there would be enough attempted connections before the address is sent again.

The crucial question is if GetAddr interactions happen frequently enough for this to be an important effect.

I don't have data on this yet - will grep and update soon

@danielabrozzoni
Copy link
Member

Thank you Martin, I agree that the current solution has this issue.

Our other approach would have run into the same problem. We had considered setting all timestamps to zero, but we didn’t implement it because we’re not sure if it’s compatible with btcd, as mentioned in the PR description. This approach has the same problem as the current one:

  • Suppose we have an address in our addrman with a timestamp of 29 days ago, and the corresponding node has already left the network
  • We send the address in a getaddr response with the timestamp set to 0
  • The receiving node interprets that as “5 days ago”

So it ends up having the same problem: the address looks newer than it actually is, and it sticks around longer than it should, potentially forever.

I’ve been thinking about an alternative solution, though I’m not sure if it makes sense. Perhaps an approach similar to the current one, but range-based, could work. Instead of setting every timestamp to 10 +- 2 days ago, we could adjust it based on how old the original timestamp is.

As a rough idea (numbers are just placeholders for now), we could generate three random reference timestamps once: one around 25+-2 days ago, one around 15+-2 days ago, and one around 5+-2 days ago; then assign each address to one of these depending on how old its original timestamp is:

  • If older than 25 days → use the 25-day reference
  • If older than 15 days → use the 15-day reference
  • Otherwise → use the 5-day reference

The exact numbers would need more thought. The fingerprinting issue would still exist: if two identities correspond to the same node, they would share the same initial timestamp and end up in the same range. Still, if we choose reasonable values, a large number of nodes might end up sending approximately the same timestamps.

@naiyoma naiyoma marked this pull request as draft October 28, 2025 11:54
@naiyoma
Copy link
Contributor Author

naiyoma commented Oct 28, 2025

Moved back to draft, working on addressing -> #33498 (review)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants