-
Notifications
You must be signed in to change notification settings - Fork 38.6k
[no merge, meta] refactor: net/net processing split #28252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We temporarily introduce a method to get a specific NodeEvictionCandidate from the eviction manager for use in `CConnman::AttemptToEvictConnection()`. `EvictionManager::GetCandidate` is removed once we are done moving the remaining eviction state from `CNode` to the eviction manager.
Now that the eviction manager is aware of the keyed net group and whether or not a peer is prefered for eviction, we can get rid of the corresponding CNode members, since they are not used elsewhere.
…vict into two functions
SelectNodeToEvict only suggests inbound nodes to evict.
-BEGIN VERIFY SCRIPT-
ren() { sed -i "s:\<$1\>:$2:g" $(git grep -l "\<$1\>" ./src ./test); }
ren SelectNodeToEvict SelectInboundNodeToEvict
-END VERIFY SCRIPT-
Most of the stats in CNodeStateStats are now taken from the Peer object.
Rename the struct to PeerStats.
-BEGIN VERIFY SCRIPT-
ren() { sed -i "s:\<$1\>:$2:g" $(git grep -l "\<$1\>" ./src ./test); }
ren CNodeStateStats PeerStats
ren GetNodeStateStats GetPeerStats
ren nodeStateStats m_peer_stats
ren statestats peer_stats
ren fNodeStateStatsAvailable m_peer_stats_available
ren fStateStats peer_stats_available
-END VERIFY SCRIPT-
Also rename to m_clean_subversion. Since the subversion is now contained in the Peer object, it may not be available in `getpeerinfo` while the peer connection is being torn down. Update the rpc help text to mark the field optional and update the tests to tolerate the field being absent.
Handle version nonces entirely within net_processing.
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. |
|
🐙 This pull request conflicts with the target branch and needs rebase. |
That seems undesirable to me? Where possible, acting directly on the node via a pointer/reference is much better than taking a global lock, doing a lookup, and then acting.
I think it's a mistake trying to have many different owners of "per-node" state -- that increases the lookups, locks and coordination required when adding/removing nodes. I think a simpler approach would be to add a |
|
@ajtowns Thanks for the quick feedback! I think we'll mostly disagree here but I appreciate your opinion.
While it is obviously true that directly accessing the CNode pointers is more performant (no locking or look ups), accessing internals of other components like this leads to spaghetti code and the performance win of micro or nano seconds is just not worth it.
This sounds like doing a complete 180 on the work that happened over multiple years to decouple net and net processing. I'm not re-inventing the wheel in this PR, this general direction has been what other contributors have had in mind and worked on for multiple years. I'd prefer finishing up previous work and making things consistent instead of going in an entirely new direction. |
If we want to have multiple threads processing different peers, then avoiding (or at least minimising) having to go through shared data structures will be necessary. Keeping our data structures in sync is already challenging, making it more complicated and having more things to keep coordinated is a bad thing.
If you're going in the wrong direction, doing a 180 is a good thing, and going even further in the same direction is a bad thing. |
I could agree on s/necessary/more performant, but with the shared data structure design it would really only cause contention for the look ups which is really not that bad. i.e. // in CConnman::DoX(NodeId)
{
LOCK(m_nodes_mutex);
// do lookup
}
// do work on the CNode
It's challenging right now because we are passing internal pointers around.
It's really not that complicated if all you have to do is call
I think we're just gonna disagree on what the right direction is :) It mostly seems like a discussion of performance vs. sane interfaces and I don't think it is worth optimizing for performance in this specific case (at the cost of bad interfaces) as it doesn't represent a bottleneck either way. |
|
We've debated the CNode* passing/abstracting thing ever since CConnman was merged. This has come up over and over so it's definitely not new. And the big problem is: everyone has their own take. The question of "should we use CNode or CConnman as the interface for network stuff" could be debated forever. I think the only correct answer is "not both arbitrarily", which is what we do now. I commend @dergoegge for working on this. It's tough and requires picking an approach and sticking to it. Personally, I don't love any solution, but NodeId as an access handle has always seemed like one of the most reasonable approaches to me. It's not the most sophisticated, but it's straightforward. And from my experiments years ago anyway, there were fewer footguns as there are no more external lifetime concerns. So Concept ACK from me on that part. What matters to me is that CConnman gets a clean separation and CNode's internals are no longer exposed. TBH I'm less concerned about the approach taken to get there. I understand that there may be a performance trade-off, but imo it's worth it for simplicity. I fully expect people to disagree, but if performance is the argument, imo it needs to be demonstrated. @ajtowns's take is also totally reasonable imo, just a matter of different priorities I think. |
Isn't the point of opening this to discuss that and at least try to come to an agreement? Being immediately dismissed out of hand feels pretty frustrating.
I don't really think that's at all true? Passing pointers to objects is a completely normal thing; and whether something is "internal" or not is just a matter of how you define the api -- if other things are accessing it, it's not "internal" by definition, though perhaps it should be. I'd contend that the challenging things is mostly that we're letting nodes interact with other nodes' state directly.
It also means either wrapping
I don't think this approach is a "sane" interface -- interacting with many peers is fundamentally something we should be able to do in parallel, and adding many synchronisation points where we prevent ourselves from operating in parallel is a fundamentally broken approach. It's broken both from the perspective of "it would be nice if we could do better in the future", in that you're making those improvements harder, but it's also broken from the perspective of "it's easier to avoid bugs when you can just focus on a single thing, and not have to worry about many interactions". Ignoring the performance aspect, and just focussing on the "net_processing deals with a single peer and doesn't interfere with the state of other peers" would, I think, look like:
Having (Actually parallelising the msghand thread probably depends on whether we can at least do parallel lookups into the mempool perhaps via a reader/writer lock approach, but something like this would be a necessary step too) |
I apologize, that wasn't my intention. I want to make it clear though that this PR is about continuing previous decoupling work that has seemingly dropped of peoples radar when previous contributors switched focus or left the project. I do not think the previous work went in the wrong direction nor that this breaks potentially multi threading the message processing code in the future. If you think this design is somehow fundamentally broken w.r.t. parallelizing, please elaborate why because I don't see it.
It seemed well understood to me that we think of our p2p stack to consist of two layers/modules: net (lower level networking, connection management) and message processing (net processing) that each have their own responsibilities. Therefore I would classify pointers into their respective data structures as internals. Perhaps a competing PR/design doc of what you are thinking about would help? I've thought about the design in this PR so much that I might be a little stuck in my ways now. Judging by the rest of your comment, it seems to me like parallelizing message processing is of big interest to you. I agree that that is cool but I also think we should have better testing (ideally implementation agnostic) before switching to that. In any case maybe it makes more sense to discuss that in another issue?
Using |
|
I think I might be coming across as disagreeing more than I actually do? I'm all for modularisation and better testing, and getting things tightly focussed so that you don't need to keep huge amounts of context in your head; but I think the way we've been doing that in
I think it would be better to move towards something more like:
ie, I guess I don't think (There's also a bunch of "low-level" things in Maybe think of it as That would imply a few implementation changes compared to master:
I think an approach like that would have a few benefits:
If you want good performance then you keep things simple -- few locks, access things directly. Often that approach is also simple and the easy to understand. Doing things in a complicated, round about way isn't "fundamentally broken" -- you can make it work, and you can always throw enough hardware at it to make it work fast enough. But I think it should be pretty obvious that: bool CNode::IsDisconnected() const { return fDisconnected; }
void CNode::Disconnect() { fDisconnected = true; }is simpler and more performant and better for parallelisation than bool CConnman::IsDisconnected(NodeId id) const
{
LOCK(m_nodes_mutex);
auto it = m_nodes.find(id);
return it == m_nodes.end() ||
it->second->fDisconnect;
}
bool CConnman::DisconnectNode(NodeId id)
{
LOCK(m_nodes_mutex);
auto it = m_nodes.find(id);
if (it == m_nodes.end()) return false;
it->second->fDisconnect = true;
return true;
}The more complicated code here isn't doing anything clever to justify the complexity: it's still just reading/setting the same atomic bool. Both Now, sure: a quick map lookup while holding a global lock isn't the end of the world, but refactoring should be making things simpler or more efficient or both, and this is doing the opposite. That's why I think it's the wrong direction. If the design is forcing your code to be complicated, ugly and slow, then the design's bad. For parallelisation in particular, you get best performance with the fewest locks (so you don't have to stop, and potentially have another process claim the cpu) and the least shared data (so you can just process it in cache without having to synchronise with other tasks). So adding locks and lookups over shared data on things that were previously just a single atomic operation is completely backwards. Here's a concrete example: https://github.com/ajtowns/bitcoin/commits/202308-netnode . It first splits out For example, if you look at |
|
I guess two other things might be worth thinking about:
|
|
Closing since there doesn't seem to be sufficient support/interest for this |
This PR is supposed to provide context for some of the refactoring PRs I've been working on (#25268, #26621, etc).
The aim is to completely decouple CConnman and its internals from PeerManager to allow for isolated testing of our message processing code (isolated from net only as isolating from validation is another can of worms). To get there, this work refactors CConnman's API to use
NodeIdas the primary handle for managing connections and defines a new interfaceConnectionsInterfacethat describes the interface between PeerManager and CConnman. The result is that CNode is no longer part of this interface and access to it is no longer needed within net processing.Most of the API simply shifts from
CConnman::DoX(CNode*)toCConnman::DoX(NodeId)and where possible the PR minimizes the need for new methods onCConnmanto avoid the slight overhead of (internally) lockingm_nodes_mutexto look up the rightCNodeby its id.Some steps this PR is taking:
ConnectionContextwhich is meant to hold all this constant connection data and for each connection a copy is stored on the correspondingPeerinstance. This reduces the need for accessing CNode in net processing and avoids someFor{Each}Nodecalls (which are a horrible layer violation).CNode::fDisconnectto true. This PR wraps access to that flag onCConnman, i.e.CConnman::IsDisconnected&CConnman::DisconnectNode(NodeId).Finally, the PR also introduces initial unit tests for PeerManager that make use of a mocked
ConnectionsInterfaceinstead of the actual CConnman.Looking for conceptual & approach review.
This is not super polished so I would expect the CI to fail but I will break this up and polish if there is enough conceptual support. I will not keep this rebased as the general architectural direction this is taking should be clear in any case.