Skip to content

Conversation

@sr-gi
Copy link
Member

@sr-gi sr-gi commented May 8, 2023

Currently, non-announced transactions can be served after a BIP35 MEMPOOl message has been received from a peer. By following this pattern, we are potentially allowing peers to fingerprint transactions that may have originated in our node and to perform timing analysis on the propagation of transactions that enter our mempool.

This simply removes the relay bypass so only announced transactions are served to peers.

@DrahtBot
Copy link
Contributor

DrahtBot commented May 8, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept ACK glozow, willcl-ark
Approach NACK vasild

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #27675 (net_processing: Drop m_recently_announced_invs bloom filter by ajtowns)
  • #27625 (p2p: Stop relaying non-mempool txs by MarcoFalke)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@DrahtBot DrahtBot added the P2P label May 8, 2023
@sr-gi
Copy link
Member Author

sr-gi commented May 8, 2023

@glozow glozow changed the title net: avoid serving non-announced txs as a result of a MEMPOOL message net processing: avoid serving non-announced txs as a result of a MEMPOOL message May 8, 2023
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this just change the fingerprinting vector from GETDATA txid to FILTERCLEAR; FILTERADD whatever; MEMPOOL ? In which case we're just as easily fingerprinted, but we spend more cpu while being fingerprinted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean keep adding what you're interested in into the filter so it triggers an INV from the node and, therefore, we can request it (or, similarly, keep resetting the filter and just add the one single transaction that we want to probe)? If so, isn't that equivalent to having no filter?

I think in both of the aforementioned cases (which boil down to the same logic if I'm not mistaken) the patch prevents the attacker from requesting unannounced transactions straight-away, they will have to wait for m_next_inv_send_time to go off.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see what you mean. That makes sense.

Next point on those lines though: adding many things to m_recently_announced_invs can overflow the bloom filter, either causing false positives (leading to privacy bugs), or since it's a rolling bloom filter, causing overflow leading to false negatives (leading to functionality bugs where you can't get the txs from the MEMPOOL response). Wouldn't it be better to make the insertion conditional on txinfo.m_time <= now - UNCONDITIONAL_RELAY_DELAY ? Even then this approach seems like it can trigger false positives fairly easily, since the rolling bloom filter capacity is only 3500 txs?

Copy link
Member Author

@sr-gi sr-gi May 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to make the insertion conditional on txinfo.m_time <= now - UNCONDITIONAL_RELAY_DELAY ?

Yeah, I think that makes perfect sense, given there is no point in adding something to the filter if we are going to unconditionally serve it anyway.

Even then this approach seems like it can trigger false positives fairly easily, since the rolling bloom filter capacity is only 3500 txs?

It may indeed. I haven't got my head around how to properly compute what are the odds of this happening though, given it feels like it depends on how often MEMPOOL messages are sent to us, which may be unbounded (how many MEMPOOL requests can we receive in a UNCONDITIONAL_RELAY_DELAY period, and how full our mempool may be?). In any case, this feels detrimental for the peer implementing this behavior instead of for us (?)

Something I'm really wondering is why this is actually bound by INVENTORY_MAX_RECENT_RELAY (3500) instead of by MAX_PEER_TX_ANNOUNCEMENTS (5000). It feels quite inconsistent that we are able to accept up to 5000 transaction announcements from someone but we are only able to remember the last 3500 that we have announced to someone(else).

I think what may make even more sense would be to reply to mempool messages only with transactions that fall into the txinfo.m_time <= now - UNCONDITIONAL_RELAY_DELAY check, that way we don't really have to add transactions to m_recently_announced_invs here, and we are also discouraging peers bloating us with MEMPOOL messages. As long as they have a filter set, any transaction that matches their filter will be announced to them anyway (h/t @sdaftuar). I guess this could be done as a followup if it makes sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to make the insertion conditional on txinfo.m_time <= now - UNCONDITIONAL_RELAY_DELAY ?

Also wouldn't that also apply here? https://github.com/bitcoin/bitcoin/blob/27c40e3dcf280a137bcfaa787674c26781d911b8/src/net_processing.cpp#L5688

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops. I was incorrectly assuming that a reply to a mempool message was rate-limited and size-limited. (Which on a second thought, probably doesn't make sense to do). So a single reply can hold up to 50'000 inv entries, which will certainly overflow. Sorry for the distraction.

Copy link
Member

@maflcko maflcko May 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note: The 50k max inv size also seems to overflow, which is probably another source for false negatives, depending on the use case.

Image stolen from the comment #27426 (comment) by 0xB10C .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note: The 50k max inv size also seems to overflow, which is probably another source for false negatives, depending on the use case.

You lost me there. Do you mean this may be a possible source of false negatives for the current path, or for both?

In this patch, I can see that potentially being an issue (that's what the original concern from AJ's comes from), but I don't see how that is an issue for master, given nothing is added to m_recently_announced_invs when building this huge INVs as a result of a MEMPOOL message.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You lost me there.

Apologies, I typed something incorrect, again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good, I was just wondering if I may have been missing something.

Copy link
Member

@glozow glozow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

The issue on master that I believe this addresses is being able to learn exactly when a transaction enters your mempool. You send mempool and get the response, then put out the tx, then getdata getdata getdata getdata... until the node tells you the tx instead of notfound. Also in general I think it's quite sensible to only serve transactions you announced.

@sr-gi sr-gi force-pushed the mempool-not-inved branch from c04f145 to 27c40e3 Compare May 9, 2023 19:32
@sr-gi
Copy link
Member Author

sr-gi commented May 9, 2023

Addressed comments by @MarcoFalke and fixed a couple of logs from the tests

@willcl-ark
Copy link
Member

Concept ACK

@darosior
Copy link
Member

I'm only a casual lurker on the net_processing side but i was under the impression we were thinking of the BIP35 as permissioned anyways. If we are already trusting the peer that sends use MEMPOOL message, why not just include everything we've got?
A practical counter-argument to this could be - yeah but at this time around 20% to 30% of reachable nodes set peerbloomfilters so it's still worth closing an obvious fingerprinting vector.

Another thing is whether it's breaking any usecase. Since the message is from a trusted peer a client may be relying on getting the whole mempool. For instance BTCPay/NBXplorer needs whitelist=mempool iirc. What's the proportion of announced vs non-announced transactions? How much would be left out with this patch?

@willcl-ark
Copy link
Member

i was under the impression we were thinking of the BIP35 as permissioned anyways.
...
For instance BTCPay/NBXplorer needs whitelist=mempool iirc.

Just on these points, currently the MEMPOOL message will be responded to either if you have the whitelist permission mempool on the peer or if you have enabled NODE_BLOOM (with -peerbloomfilters). In the latter case I don't think we can accurately describe these as "trusted peers". See the changes in 4581a68 (#27559) which clarify this.

@vasild
Copy link
Contributor

vasild commented May 11, 2023

The title "avoid serving non-announced txs as a result of a MEMPOOL message" is confusing - it gives the impression that this PR will change the response to MEMPOOL messages, but it does not do that. Before and after the PR we would send the full mempool in a reply to mempool (correct me if I got it wrong).

@glozow
Copy link
Member

glozow commented May 11, 2023

What's the proportion of announced vs non-announced transactions? How much would be left out with this patch?

The PR (as it is right now) still announces everything in response to mempool. This is saying "I haven't even announced this to you, why are you requesting it from me?" A normal client, who presumably does not already know what it wants to request until it receives an announcement (i.e. using mempool to learn about transactions), shouldn't be affected at all.

@vasild
Copy link
Contributor

vasild commented May 11, 2023

Is it the case that this PR will change the behavior in the following two cases (from the PR title and description I got the impression that it is wider scope):

Case 1

  • the node receives MEMPOOL request and replies to it
  • in the same second a new transaction enters the mempool (after the MEMPOOL message)
  • the node receives GETDATA for that transaction

In master it would send the transaction even though it never advertised it with INV due to the txinfo.m_time <= mempool_req check (both values would be equal).

In this PR it would claim "not found".

This can be resolved by increasing the precision of TxRelay::m_last_mempool_req and TxMempoolInfo::m_time or by changing the condition to <.

Case 2

  • the node receives MEMPOOL request and replies to it, but some transaction is omitted from the reply because it is not in TxRelay::m_bloom_filter.
  • the node receives GETDATA for that transaction

In master it would send the transaction even though it never advertised it with INV because PeerManagerImpl::FindTxForGetData() does not check the bloom filter.

In this PR it would claim "not found".

Is this a problem at all? If yes, then it can be resolved by checking the bloom filter from PeerManagerImpl::FindTxForGetData().

@sr-gi
Copy link
Member Author

sr-gi commented May 11, 2023

I'm only a casual lurker on the net_processing side but i was under the impression we were thinking of the BIP35 as permissioned anyways. If we are already trusting the peer that sends use MEMPOOL message, why not just include everything we've got? A practical counter-argument to this could be - yeah but at this time around 20% to 30% of reachable nodes set peerbloomfilters so it's still worth closing an obvious fingerprinting vector.

We still do. This is not modifying the behavior of the announcements that result from a mempool message, but the behavior of the node to the getdata messages that may follow. Before the patch, a node was able to send us a mempool message and, from that point on, request any transaction from our mempool (even if we haven't announced that to them at all).

Another thing is whether it's breaking any usecase. Since the message is from a trusted peer a client may be relying on getting the whole mempool. For instance BTCPay/NBXplorer needs whitelist=mempool iirc. What's the proportion of announced vs non-announced transactions? How much would be left out with this patch?

As of 27c40e3 we still do, even though I would argue that, by doing this, we are encouraging peers to spam us with mempool messages instead of waiting for us to announce the transactions that match their filters in due time.

@sr-gi
Copy link
Member Author

sr-gi commented May 11, 2023

The title "avoid serving non-announced txs as a result of a MEMPOOL message" is confusing - it gives the impression that this PR will change the response to MEMPOOL messages, but it does not do that. Before and after the PR we would send the full mempool in a reply to mempool (correct me if I got it wrong).

Maybe it should read "after receiving a MEMPOOL message" instead of "as a result of a MEMPOOL message".

What this PR is trying to achieve is to prevent us from responding to getdata requests about unannounced transactions after receiving a mempool message.

@sr-gi
Copy link
Member Author

sr-gi commented May 11, 2023

Is it the case that this PR will change the behavior in the following two cases (from the PR title and description I got the impression that it is wider scope):

Case 1

  • the node receives MEMPOOL request and replies to it
  • in the same second a new transaction enters the mempool (after the MEMPOOL message)
  • the node receives GETDATA for that transaction

In master it would send the transaction even though it never advertised it with INV due to the txinfo.m_time <= mempool_req check (both values would be equal).

In this PR it would claim "not found".

This can be resolved by increasing the precision of TxRelay::m_last_mempool_req and TxMempoolInfo::m_time or by changing the condition to <.

This is exactly what we are trying to prevent, but I don't think your approach fixes the issue. This condition is checking that our inv (txinfo) is more recent than the last mempool_req, meaning that once a mempool request has been processed the peer can request any transaction from our mempool. Why is this bad? It makes transactions that enter our memopool easily probable. The node can create a transaction (tx), send it through any other link, and spam us with getdata messages until we reply with it.

Case 2

  • the node receives MEMPOOL request and replies to it, but some transaction is omitted from the reply because it is not in TxRelay::m_bloom_filter.
  • the node receives GETDATA for that transaction

In master it would send the transaction even though it never advertised it with INV because PeerManagerImpl::FindTxForGetData() does not check the bloom filter.

In this PR it would claim "not found".

Is this a problem at all? If yes, then it can be resolved by checking the bloom filter from PeerManagerImpl::FindTxForGetData().

Arguably this is the expected behavior, master's is not.

The response to the mempool message didn't include an INV from that specific transaction (both in the patch nor in master's) because it didn't match the filter set by the peer, therefore, the peer should not be interested in it. With respect to why claiming not found after the peer requests the transaction (what makes this different from master) I think we can default to the same reasoning as per Case 1.

@vasild
Copy link
Contributor

vasild commented May 11, 2023

txinfo.m_time <= mempool_req

This condition is checking that our inv (txinfo) is more recent than the last mempool_req, meaning that once a mempool request has been processed the peer can request any transaction from our mempool.

I don't get it. Is it not that the condition is checking whether txinfo is older than the last mempool_req? In that case the transaction has been part of the MEMPOOL reply, so we have sent an INV about it.

@glozow
Copy link
Member

glozow commented May 11, 2023

txinfo.m_time <= mempool_req

This condition is checking that our inv (txinfo) is more recent than the last mempool_req, meaning that once a mempool request has been processed the peer can request any transaction from our mempool.

I don't get it. Is it not that the condition is checking whether txinfo is older than the last mempool_req? In that case the transaction has been part of the MEMPOOL reply, so we have sent an INV about it.

Ah, I think you're right! So this changes Case 2, not Case 1.

@sr-gi
Copy link
Member Author

sr-gi commented May 11, 2023

txinfo.m_time <= mempool_req

This condition is checking that our inv (txinfo) is more recent than the last mempool_req, meaning that once a mempool request has been processed the peer can request any transaction from our mempool.

I don't get it. Is it not that the condition is checking whether txinfo is older than the last mempool_req? In that case the transaction has been part of the MEMPOOL reply, so we have sent an INV about it.

Ah, I think you're right! So this changes Case 2, not Case 1.

He is indeed. The test for Case 1 was passing because both requests were processed within the same second 😅

@vasild
Copy link
Contributor

vasild commented May 12, 2023

No, wait! This PR changes both Case1 and Case2, but Case1 is only about things happening in the same second.

What I tried yesterday:

  • Run the newly added tests, they were passing, of course. Cool!
  • Revert the net_processing.cpp changes, both tests Request the irrelevant transaction even though it has not being offered to us and Create a third transaction (that was therefore not part of the mempool message) and try to... failed. Good! To reach the second test I disabled the assert from the first.

But I couldn't explain why the second test would fail (without net_processing.cpp changes aka on master). Then I suspected things happen in the same second in the test and added time.sleep(1) before creating the third transaction and then the test passed.

Concept ACK. I agree that we should not respond to GETDATA asking for a transaction which we did not INV to that peer and this PR does that.

However, I agree with @ajtowns's concern raised above about blowing up m_recently_announced_invs. That bloom filter has size 3500 and if we put 300k elements in it, then it is probably not going to work as expected.
Approach NACK due to that.

So, to achieve the goal of not replying to GETDATA without a prior INV we have to remember all INVs sent to the peer, but that is kind of "hard" considering a MEMPOOL reply sends 300k INVs...

@vasild
Copy link
Contributor

vasild commented May 12, 2023

Looking at the comment of CRollingBloomFilter if m_recently_announced_invs is changed from 3500 to 300000 then its size would increase from 37KB to 3MB. I guess that is too much and maybe was the reason why MEMPOOL replies don't use m_recently_announced_invs and instead use the m_last_mempool_req timing mechanism.

@sr-gi
Copy link
Member Author

sr-gi commented May 12, 2023

No, wait! This PR changes both Case1 and Case2, but Case1 is only about things happening in the same second.

I meant regarding the time checks in master, the checks were indeed protecting against querying something that was in the mempool before the mempool message was received. Meaning that while this patch fixes the issues around requesting data after a mempool message is received, they are not as severe as initially assumed.

However, I agree with @ajtowns's concern raised above about blowing up m_recently_announced_invs. That bloom filter has size 3500 and if we put 300k elements in it, then it is probably not going to work as expected. Approach NACK due to that.

I think at this point there are two easy solutions to the problem (with their corresponding tradeoffs, of course):

  1. Do things as stated in the patch but guarding addition to the m_recently_announced_invs rolling filter by checking that the data we are adding is not older than two minutes, as @ajtowns originally suggested. This will prevent us from adding data that will be served unconditionally anyway, reducing the odds of overflowing the filter. However, I'm not sure how AJ computed the odds of the filter overflowing due to high traffic, so I'm not able to convince myself this is enough.

  2. Keep the m_recently_announced_invs bypass as it is in master, but do check the bloom filter when deciding whether to relay something that may have been announced to a peer as a response to a mempool message. This has two main drawbacks:

@sr-gi sr-gi force-pushed the mempool-not-inved branch from 27c40e3 to 8d640ad Compare May 12, 2023 19:53
@sr-gi
Copy link
Member Author

sr-gi commented May 12, 2023

Addresses @willcl-ark comments and fixes tests to make sure the MEMPOOL message and the second GETDATA message are not processed within the same second, as per @vasild comment.

@sr-gi
Copy link
Member Author

sr-gi commented May 22, 2023

Adds invs to m_recently_announced_invs conditionally to them not being unconditionally relayable and rebases master.

Currently, non-announced transactions can be served after a BIP35 MEMPOOl
message has been received from a peer. By following this pattern, we are
potentially allowing peers to fingerprint transactions that may have
originated in our node and to perform timing analysis on the propagation
of transactions that enter our mempool.

This simply removes the relay bypass so only announced transactions are
served to peers.
@DrahtBot
Copy link
Contributor

🐙 This pull request conflicts with the target branch and needs rebase.

@luke-jr
Copy link
Member

luke-jr commented Jun 23, 2023

I wonder if there should be a noprivacy permission flag...

@maflcko
Copy link
Member

maflcko commented Aug 17, 2023

Are you still working on this?

@sr-gi
Copy link
Member Author

sr-gi commented Aug 17, 2023

Are you still working on this?

I wonder if this is still meaningful now that #27675 has landed

@maflcko
Copy link
Member

maflcko commented Aug 17, 2023

Maybe the test? Though, I haven't looked at it recently.

@sr-gi
Copy link
Member Author

sr-gi commented Aug 17, 2023

Maybe the test? Though, I haven't looked at it recently.

That may make sense. I'll take a look and rebase

@sr-gi
Copy link
Member Author

sr-gi commented Sep 14, 2023

Maybe the test? Though, I haven't looked at it recently.

The test can be adapted to fit the logic after #27675, but the patch itself does not seem relevant anymore. Would it be worth reworking this PR or just closing it and opening one for the test?

@maflcko
Copy link
Member

maflcko commented Sep 14, 2023

Thanks for taking a look. Seems fine to open a new test-only pull

@sr-gi
Copy link
Member Author

sr-gi commented Sep 14, 2023

I'm closing this one then

@sr-gi sr-gi closed this Sep 14, 2023
@bitcoin bitcoin locked and limited conversation to collaborators Sep 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants