Skip to content

Conversation

@mzumsande
Copy link
Contributor

@mzumsande mzumsande commented Sep 12, 2023

This is joint work with amitiuttarwar.

See issue #28462 for a broader discussion on increasing the number of block-relay-only connections independent of this particular implementation proposal.

We suggest to increase the number of inbound slots allocated to block-relay-only peers by increasing the default maximum connections from 125 to 200, with 50% of inbound slots accessible for tx-relaying peers.
This is a prerequisite for being able to increase the default number of outgoing block-relay-only peers later, because the current inbound capacity of the network is not sufficient.
In order to account for incoming tx-relaying peers separately from incoming block-relay peers, changes to the inbound eviction logic are necessary.

See the next post in this thread for a more detailed explanation and motivation of the changes.

@DrahtBot
Copy link
Contributor

DrahtBot commented Sep 12, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/28463.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept ACK naumenkogs, naiyoma

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #32394 (net: make m_nodes_mutex non-recursive by vasild)
  • #29415 (Broadcast own transactions only via short-lived Tor or I2P connections by vasild)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@DrahtBot DrahtBot added the P2P label Sep 12, 2023
@mzumsande mzumsande marked this pull request as draft September 12, 2023 20:36
@mzumsande
Copy link
Contributor Author

mzumsande commented Sep 12, 2023

Status Quo

We have a default maximum of 125 connections (not counting manual ones). Out of these,
11 (8 full outbounds, 2 block-relay-only, 1 feeler) are reserved for outbounds, leaving us with
114 inbound slots. There are currently no limits on how many of the inbounds can support transaction relay.

Problem

We want to increase the number of inbound slots specifically offered to block-relay-only connections, but don't want to significantly change the number of tx-relaying inbound peers, because we don’t want to radically change the memory and traffic requirements that come with running a full node with the default configuration.
This is complicated by a timing issue: When a new peer connects to us, we don't know yet whether it is block-relay-only before we have processed their version message.
Inbound peer eviction, however, happens right after a peer connects to us and before processing its version msg. This means that we don't yet know during inbound eviction, if the new peer wants tx relay.

Approach

We propose to introduce a separate maximum of tx-relaying inbound connections, in addition to the existing global maximum.
The new maximum for tx-relaying inbounds gets chosen such that 50% of the total inbound slots can be utilized by tx-relaying peers. We choose a ratio here instead of a fixed number here so that it scales with users changing their -maxconnections value from the default. The ratio of 50% is also made adjustable in the form of a new startup parameter.

Changes to the eviction logic

We don't change the eviction logic for total connections, which happens in CreateNodeFromAcceptedSocket().
We add a tx-relay-specific eviction check to the version message processing which runs if we don't have capacity for another tx-relaying peer: This check uses the same logic from AttemptToEvictConnection, but specifically evicts only tx-relaying peers (by protecting the rest of the peers). Only if we cannot find another peer to evict (e.g. all peers are protected, which should only happen if a low -maxconnections is used) we disconnect the new peer. This approach achieves that the number of tx-relaying peers can never exceed the limit.

Numbers

Specifically, we propose to increase the total number of connections from 125 to 200, with a suggested ratio of 50% tx-relay inbounds. With 11 connection slots reserved for outbound peers,
this results in (200 - 11) * 0.5 = 94.5 slots for tx-relaying inbound peers, which closely aligns with the typical tx-relaying inbounds seen in today's world, where typically ~20% of inbound slots are already taken by block-relay-only connections.
These numbers were chosen to be sufficient to raise the number of block-relay-only connections from 2 to 8 once this patch is widely deployed (see Issue #28462 for more details on this).
[Note: In a previous iteration, a ratio of 60% was suggested, see this post for more discussion.]

Handling bloom filter peers

The logic gets a bit complicated if a node supports BIP37 (bloom filters), because then an inbound peer can be switched to tx-relay after sending a FILTERLOAD or FILTERCLEAR message (but never back!), which could result in an excess of tx-relaying inbounds. We propose to solve this by triggering the eviction logic for a tx-relaying peer from net_processing whenever we switch a peer to tx-relay due to a BIP37-related message.

@naumenkogs
Copy link
Contributor

Concept ACK, specifically on changes to the eviction logic, and handling bloom filters.
Numbers also seem fine I guess.

I was curious whether there is (or will be) any other good reason to delay eviction after the version (apart from making this logic cleaner), but I haven't identified any.

@mzumsande
Copy link
Contributor Author

mzumsande commented Sep 28, 2023

I was curious whether there is (or will be) any other good reason to delay eviction after the version (apart from making this logic cleaner), but I haven't identified any.

Discussed this with @amitiuttarwar and we think that it would be better to move the tx-related eviction into the version processing logic.
Pros:

  • It's conceptually simpler to only evict a tx-relaying peer when we are sure that it is one
  • We wouldn't have to evict anyone in the situation where all tx inbound slots are full and a new block-relay inbound connects
  • We need the logic for this (EvictTxPeerIfFull()) anyway when dealing with bloom-filter peers.

Cons:

  • The eviction now happens in two different places, after accepting a connection and during version processing.

I will update the PR with this soon!

@mzumsande mzumsande force-pushed the 202308_increase_block_relay branch from f78a0ed to 039ac99 Compare October 3, 2023 19:08
@mzumsande mzumsande force-pushed the 202308_increase_block_relay branch from 039ac99 to 82ed8a5 Compare October 13, 2023 18:35
@mzumsande
Copy link
Contributor Author

Changed the approach to do the eviction of tx-relaying peers within the version message processing, when we can make use of the fRelay flag they sent us - this has the advantage that we don't have to evict new peers before being sure if they want tx relay.

As a result of the later eviction, it became necessary to protect the new peer from being evicted right by AttemptToEvictConnection() right after connecting (so that the new peer only gets evicted if no other peer can be found). This isn't done for the existing unconditional inobound eviction in net (CreateNodeFromAcceptedSocket), because there we would evict a peer before adding the new peer to m_nodes.

@mzumsande mzumsande force-pushed the 202308_increase_block_relay branch from 82ed8a5 to 8247eaa Compare October 13, 2023 18:46
@amitiuttarwar
Copy link
Contributor

overall, I like this approach significantly more. although we are introducing another touchpoint for inbound eviction logic, it is more straightforward to reason about. but the biggest improvement is not unnecessarily disconnecting a new or existing connection when tx relay isn't being requested.

reviewed this in-depth. the fundamental structure & logic looks good to me. left a lot of comments along the way 😛

some additional thoughts:

  • commit message for ab5ad61f67f3bcab503ac3eb012c4f35d733b946 says "When we receive an inbound connection and don't have space for another full-relay peer, we now attempt to evict specifically a full-relay inbound." I think this should mention timing of processing the version message for clarity.
  • the order of events when processing the VERSION message makes sense, because we want to instantiate tx_relay for the before before triggering the new eviction logic (which uses that in the count).
  • this code strengthens the distinction between inbound-tx-relay connections from inbound-block-relay connections. ideally, this would be captured in separate connection types, but that would require more invasive changes since we don't know the distinction at the time of accepting the connection. I don't think the benefit would be worth the cost, but is something worth keeping in mind if we continue to build stronger distinctions between the two types of inbounds.

src/net.h Outdated
* a peer to evict (all eligible peers are protected)
* so that the caller can deal with this.
*/
bool EvictTxPeerIfFull(std::optional<NodeId> protect_peer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it could be clearer to call this function something like TxInboundCapacityAvailable. imo that seems more useful from the POV of the net processing caller, but either is ok with me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default param here makes the FILTERLOAD/FILTERCLEAR callers slightly cleaner

Suggested change
bool EvictTxPeerIfFull(std::optional<NodeId> protect_peer);
bool EvictTxPeerIfFull(std::optional<NodeId> protect_peer) = std::nullopt;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to use default values.

Not sure about the rename suggestion, I think I prefer EvictTxPeerIfFull, because I think for naming describing what the function does is more important than describing the return value. Otherwise other spots might call this function in the future with the intention to just perform a check, and then might be surprised about the side effects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and then might be surprised about the side effects.

yeah totally. I was trying to make the return values more self-evident, but on further thought, I agree with you that emphasizing eviction is more important - people can always read the doc string to understand the return value. can't think of anything to concisely express HasOrMakeInboundTxCapacity, so leaving as is sounds good :)

@mzumsande mzumsande force-pushed the 202308_increase_block_relay branch from 8247eaa to 9555a33 Compare October 19, 2023 21:42
@mzumsande
Copy link
Contributor Author

Thanks for the review @amitiuttarwar! i pushed an update, addressing all feedback.

@mzumsande mzumsande force-pushed the 202308_increase_block_relay branch from 7b91366 to 32caafb Compare March 25, 2025 17:39
@mzumsande mzumsande force-pushed the 202308_increase_block_relay branch from 32caafb to 5723a73 Compare March 25, 2025 20:57
@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed.
Debug: https://github.com/bitcoin/bitcoin/runs/39401550363

Hints

Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:

  • Possibly due to a silent merge conflict (the changes in this pull request being
    incompatible with the current code in the target branch). If so, make sure to rebase on the latest
    commit of the target branch.

  • A sanitizer issue, which can only be found by compiling with the sanitizer and running the
    affected test.

  • An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

@mzumsande mzumsande force-pushed the 202308_increase_block_relay branch from 5723a73 to df1c9c2 Compare March 25, 2025 21:04
DashCoreAutoGuix pushed a commit to DashCoreAutoGuix/dash that referenced this pull request Jul 29, 2025
df69b22 doc: improve documentation around connection limit maximums (Amiti Uttarwar)
adc171e scripted-diff: Rename connection limit variables (Amiti Uttarwar)
e9fd9c0 net: add m_max_inbound to connman (Amiti Uttarwar)
c25e0e0 net, refactor: move calculations for connection type limits into connman (Amiti Uttarwar)

Pull request description:

  This is joint work with amitiuttarwar.

  This has the first few commits of bitcoin#28463. It is not strictly a prerequisite for that, but has changes that in our opinion make sense on their own.
  It improves the handling of maximum numbers for different connection types (that are set during init and don’t change after) by:
  * moving all calculations into one place, `CConnMan::Init()`. Before, they were dispersed between `Init`, `CConnman::Init` and other parts of `CConnman`, resulting in some duplicated test code.
  * removing the possibility of having a negative maximum of inbound connections, which is hard to argue about
  * renaming of variables and doc improvements

ACKs for top commit:
  amitiuttarwar:
    co-author review ACK df69b22
  naumenkogs:
    ACK df69b22
  achow101:
    ACK df69b22

Tree-SHA512: 913d56136bc1df739978de50db67302f88bac2a9d34748ae96763288d97093e998fc0f94f9b6eff12867712d7e86225af6128f4170bf2b5b8ab76f024870a22c
DashCoreAutoGuix pushed a commit to DashCoreAutoGuix/dash that referenced this pull request Aug 6, 2025
df69b22 doc: improve documentation around connection limit maximums (Amiti Uttarwar)
adc171e scripted-diff: Rename connection limit variables (Amiti Uttarwar)
e9fd9c0 net: add m_max_inbound to connman (Amiti Uttarwar)
c25e0e0 net, refactor: move calculations for connection type limits into connman (Amiti Uttarwar)

Pull request description:

  This is joint work with amitiuttarwar.

  This has the first few commits of bitcoin#28463. It is not strictly a prerequisite for that, but has changes that in our opinion make sense on their own.
  It improves the handling of maximum numbers for different connection types (that are set during init and don’t change after) by:
  * moving all calculations into one place, `CConnMan::Init()`. Before, they were dispersed between `Init`, `CConnman::Init` and other parts of `CConnman`, resulting in some duplicated test code.
  * removing the possibility of having a negative maximum of inbound connections, which is hard to argue about
  * renaming of variables and doc improvements

ACKs for top commit:
  amitiuttarwar:
    co-author review ACK df69b22
  naumenkogs:
    ACK df69b22
  achow101:
    ACK df69b22

Tree-SHA512: 913d56136bc1df739978de50db67302f88bac2a9d34748ae96763288d97093e998fc0f94f9b6eff12867712d7e86225af6128f4170bf2b5b8ab76f024870a22c
@maflcko maflcko removed the CI failed label Sep 26, 2025
mzumsande and others added 6 commits October 22, 2025 10:45
If the added bool evict_tx_relay_peers set, non-tx-relay
peers will be exempt from disconnection.
Also allow to specify a peer that exempt from eviction.
Both options will be used in later commits.

Co-authored-by: Amiti Uttarwar <[email protected]>
..and adjust the eviction logic.
The new default max connection number is 200, the default maximum of tx-relaying
inbounds is limited to 50% of all inbound connections.
With 11 outbound connections, that is (200 - 11) * 0.5 = 94.5.
As a result, the tx-related maximum traffic should not change
drastically.

When we receive an inbound connection and don't have space for another
full-relay peer, we now attempt to evict specifically a full-relay inbound
after receiving the version message of the new peer.

Once this commit is widely deployed, the added inbound capacity will
allow us to increase the number of outgoing block-relay-only connections.

Co-authored-by: Amiti Uttarwar <[email protected]>
Permit users to change the amount of inbounds that are permitted to relay
transactions. This is particularly relevant to ensure that superusers that are
not concerned with resource usage are not artificially restricted from offering
many transaction relay slots to the network.

Co-authored-by: Martin Zumsande <[email protected]>
@mzumsande mzumsande force-pushed the 202308_increase_block_relay branch from df1c9c2 to 9806bc5 Compare October 22, 2025 11:13
@mzumsande
Copy link
Contributor Author

df1c9c2 to 9806bc5: rebased (there was a silent conflict for p2p_opportunistic_1p1c.py, together with the -maxconnections=94 default limit used in the functional tests)

@DrahtBot
Copy link
Contributor

🐙 This pull request conflicts with the target branch and needs rebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants