-
Notifications
You must be signed in to change notification settings - Fork 38.7k
p2p: Track orphans by who provided them #26551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
5d92928 to
1624699
Compare
glozow
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
1624699 to
0a32d4f
Compare
|
Some previous discussion at #21527 (comment) |
src/net_processing.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I think it should say "Creates more work if", as it's less ambiguous (I actually spent time figuring the difference out...).
- So now
bool fMoreNodeWork = m_msgproc->ProcessMessages(pnode, flagInterruptMsgProc);should be renamed tomay_have_more_workright?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this PR changes whether fMoreNodeWork should be renamed -- the bool should have the same properties both before and after this PR.
But... I think its current behaviour is technically not quite right? If we process a TX message in ProcessMessage now, we'll set fMoreWork = true only when m_getdata_requests has entries, but I think we should also be doing it if the peer's orphanage work queue isn't empty, as ProcessMessage may have added to it. I think this is technically a bug introduced in #15644 -- there should have been an if (!pfrom->orphan_work_set.empty()) fMoreWork = true; added after the ProcessMessage() call, as well as after the // this maintains the order of responses comment.
The only result of the bug is that when a parent comes in, processing its orphaned children will be delayed by 100ms rather than occurring essentially immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be better to write:
while (true) {
bool all_nodes_finished_processing = true;
for (pnode : nodes) {
if (m_msgproc->ProcessMessages(pnode)) all_nodes_finished_processing = false;
m_msgproc->SendMessages(pnode);
}
if (all_nodes_finished_processing) wait_for(100ms);
}The idea (as I understand it) is just that ProcessMessages() returns true if it does some substantial work and exits early, without checking everything. If that's the case for any peer, then it doesn't wait before looping through all peers again. If ProcessMessages() completed for all peers, whether it did work or not, then it is okay to sleep briefly before looping over everything again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right. Let's try applying these changes and see whether everything makes sense once again.
Part of my confusion is that it's unclear whether has_more_work applies to a given peer, or to all peers... That variable rename could help. (peer_has_more_work vs node_has_more_work or something.)
|
Concept ACK |
0a32d4f to
8f9d9ab
Compare
src/net_processing.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't work after the other changes in this PR -- you might be creating more work for some other peer, so should be checking if any peers have txs to reconsider, and only doing this once after looping through all the peers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I guess this raises the question: should we move orphan processing outside of ProcessMessages and SendMessages entirely, and just do it separately, independent of both who sent the orphan and who sent the parent?
Unless someone has a good reason not to, I think I might draft that up as an alternative PR so the approaches can be compared more directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we move orphan processing outside of ProcessMessages and SendMessages entirely, and just do it separately, independent of both who sent the orphan and who sent the parent?
Had a similar thought, mainly I don't see why we should try to process all orphans before moving on to ProcessMessage()... afaict we never send a message to a peer as a result of processing an orphan (other than potentially disconnecting) so maintaining the order of responses is not justification like it is for finishing ProcessGetData() queue.
fwiw doing orphan processing independently seems reasonable to me, though I don't have a very strong grasp of all the possible implications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought of it more along the lines of "if a node gives us a lot of orphans to process, and we're now processing them, we shouldn't do more work for that node until all the orphan work is done; also we shouldn't delay work from other nodes while we're processing that node's orphans".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I guess this raises the question: should we move orphan processing outside of
ProcessMessagesandSendMessagesentirely,
Where do you intend to move it? Into a dedicated thread for orphan processing?
This doesn't work after the other changes in this PR -- you might be creating more work for some other peer, so should be checking if any peers have txs to reconsider, and only doing this once after looping through all the peers.
Is this situation (that the peer who sent us an orphan is different from the peer that sent us the parent) typical? Otherwise might it be acceptable to just have a unnecessary 100ms wait if it occurs very rarely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do you intend to move it? Into a dedicated thread for orphan processing?
No, I was thinking a separate ProcessOrphans method like ProcessMessages and SendMessages.
Is this situation (that the peer who sent us an orphan is different from the peer that sent us the parent) typical? Otherwise might it be acceptable to just have a unnecessary 100ms wait if it occurs very rarely?
On a long running node, I'm seeing maybe a few hundred orphans being accepted during startup (ie, children whose parent txs were announced while my node was offline) but then perhaps only 20-60 accepted orphans per day. I don't have much of an idea how many of those would have a different node sending the parent compared to the child, but it doesn't seem likely to be a real problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we move orphan processing outside of ProcessMessages and SendMessages entirely, and just do it separately, independent of both who sent the orphan and who sent the parent?
Unless someone has a good reason not to, I think I might draft that up as an alternative PR so the approaches can be compared more directly.
See #19364
src/net_processing.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we move orphan processing outside of ProcessMessages and SendMessages entirely, and just do it separately, independent of both who sent the orphan and who sent the parent?
Had a similar thought, mainly I don't see why we should try to process all orphans before moving on to ProcessMessage()... afaict we never send a message to a peer as a result of processing an orphan (other than potentially disconnecting) so maintaining the order of responses is not justification like it is for finishing ProcessGetData() queue.
fwiw doing orphan processing independently seems reasonable to me, though I don't have a very strong grasp of all the possible implications.
mzumsande
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
src/net_processing.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I guess this raises the question: should we move orphan processing outside of
ProcessMessagesandSendMessagesentirely,
Where do you intend to move it? Into a dedicated thread for orphan processing?
This doesn't work after the other changes in this PR -- you might be creating more work for some other peer, so should be checking if any peers have txs to reconsider, and only doing this once after looping through all the peers.
Is this situation (that the peer who sent us an orphan is different from the peer that sent us the parent) typical? Otherwise might it be acceptable to just have a unnecessary 100ms wait if it occurs very rarely?
8f9d9ab to
6c0a068
Compare
|
Updated. I left the orphan processing in
|
|
utACK 6c0a068ea3670bb7a1c955ed6ecde5d050c1a812 |
Previously, when we processed a new tx we would attempt to ATMP any orphans that considered the new tx a parent immediately, but would only accept at most one such tx, leaving any others to be considered on a future run of ProcessMessages(). With this patch, we don't attempt any orphan processing immediately after receiving a tx, instead deferring all of them until the next call to ProcessMessages().
If we made progress on orphans, consider that enough work for this peer for this round of ProcessMessages. This also allows cleaning up the api for TxOrphange:GetTxToReconsider().
…consider When PR#15644 made orphan processing interruptible, it also introduced a potential 100ms delay between processing of the first and second newly reconsiderable orphan, because it didn't check if the orphan work set was non-empty after invoking ProcessMessage(). This adds that check, so that ProcessMessages() will return true if there are orphans to process, usually avoiding the 100ms delay in CConnman::ThreadMessageHandler().
6c0a068 to
c58c249
Compare
|
utACK c58c249 |
glozow
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK c58c249
| LOCK(peer->m_getdata_requests_mutex); | ||
| if (!peer->m_getdata_requests.empty()) fMoreWork = true; | ||
| } | ||
| // Does this peer has an orphan ready to reconsider? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c58c249 nit: s/has/have
mzumsande
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review ACK c58c249
nits can be ignored.
| * reconsidered. | ||
| * @return True if there are still orphans in this peer's work set. | ||
| * @peer[in] peer The peer whose orphan transactions we will reconsider. Generally only | ||
| * one orphan will be reconsidered on each call of this function. If an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think "Generally only one orphan will be reconsidered" is misleading. Calling ProcessTransaction for a tx once more, getting TX_MISSING_INPUTS again and therefore keeping the orphan would qualify as being "reconsidered" for me, whereas what is meant is that the tx is being removed from the orphanage.
But no need to address here, since this PR didn't change this.
| void AddChildrenToWorkSet(const CTransaction& tx) EXCLUSIVE_LOCKS_REQUIRED(!m_mutex);; | ||
|
|
||
| /** Does this peer have any work to do? */ | ||
| bool HaveTxToReconsider(NodeId peer) EXCLUSIVE_LOCKS_REQUIRED(!m_mutex);; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: double ;
c58c249 net_processing: indicate more work to do when orphans are ready to reconsider (Anthony Towns) ecb0a3e net_processing: Don't process tx after processing orphans (Anthony Towns) c583775 net_processing: only process orphans before messages (Anthony Towns) be23046 txorphange: Drop redundant originator arg from GetTxToReconsider (Anthony Towns) a4fe099 txorphanage: index workset by originating peer (Anthony Towns) Pull request description: We currently process orphans by assigning them to the peer that provided a missing parent; instead assign them to the peer that provided the orphan in the first place. This prevents a peer from being able to marginally delay another peer's transactions and also simplifies the internal API slightly. Because we're now associating orphan processing with the peer that provided the orphan originally, we no longer process orphans immediately after receiving the parent, but defer until a future call to `ProcessMessage`. Based on bitcoin#26295 ACKs for top commit: naumenkogs: utACK c58c249 glozow: ACK c58c249 mzumsande: Code Review ACK c58c249 Tree-SHA512: 3186c346f21e60440266a2a80a9d23d7b96071414e14b2b3bfe50457c04c18b1eab109c3d8c2a7726a6b10a2eda1f0512510a52c102da112820a26f5d96f12de
Wenyu1227
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/net_processing.cpp
Summary: ``` We currently process orphans by assigning them to the peer that provided a missing parent; instead assign them to the peer that provided the orphan in the first place. This prevents a peer from being able to marginally delay another peer's transactions and also simplifies the internal API slightly. Because we're now associating orphan processing with the peer that provided the orphan originally, we no longer process orphans immediately after receiving the parent, but defer until a future call to ProcessMessage. ``` Backport of [[bitcoin/bitcoin#26551 | core#26551]]. Depends on D16235. Test Plan: ninja all check-all Reviewers: #bitcoin_abc, PiRK Reviewed By: #bitcoin_abc, PiRK Subscribers: PiRK Differential Revision: https://reviews.bitcoinabc.org/D16236
We currently process orphans by assigning them to the peer that provided a missing parent; instead assign them to the peer that provided the orphan in the first place. This prevents a peer from being able to marginally delay another peer's transactions and also simplifies the internal API slightly. Because we're now associating orphan processing with the peer that provided the orphan originally, we no longer process orphans immediately after receiving the parent, but defer until a future call to
ProcessMessage.Based on #26295