Skip to content

Conversation

@ajtowns
Copy link
Contributor

@ajtowns ajtowns commented Nov 21, 2022

We currently process orphans by assigning them to the peer that provided a missing parent; instead assign them to the peer that provided the orphan in the first place. This prevents a peer from being able to marginally delay another peer's transactions and also simplifies the internal API slightly. Because we're now associating orphan processing with the peer that provided the orphan originally, we no longer process orphans immediately after receiving the parent, but defer until a future call to ProcessMessage.

Based on #26295

@DrahtBot
Copy link
Contributor

DrahtBot commented Nov 21, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK naumenkogs, glozow, mzumsande

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #26247 (refactor: Make m_mempool optional in PeerManager by MarcoFalke)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

Copy link
Member

@glozow glozow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

@ajtowns ajtowns force-pushed the 202211-orphanguardians branch from 1624699 to 0a32d4f Compare November 28, 2022 23:06
@ajtowns ajtowns marked this pull request as ready for review November 29, 2022 01:03
@glozow glozow added the P2P label Nov 29, 2022
@maflcko maflcko changed the title net_processing: Track orphans by who provided them p2p: Track orphans by who provided them Nov 29, 2022
@maflcko maflcko removed the P2P label Nov 29, 2022
@DrahtBot DrahtBot added the P2P label Nov 29, 2022
@ajtowns
Copy link
Contributor Author

ajtowns commented Dec 8, 2022

Some previous discussion at #21527 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I think it should say "Creates more work if", as it's less ambiguous (I actually spent time figuring the difference out...).
  2. So now bool fMoreNodeWork = m_msgproc->ProcessMessages(pnode, flagInterruptMsgProc); should be renamed to may_have_more_work right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this PR changes whether fMoreNodeWork should be renamed -- the bool should have the same properties both before and after this PR.

But... I think its current behaviour is technically not quite right? If we process a TX message in ProcessMessage now, we'll set fMoreWork = true only when m_getdata_requests has entries, but I think we should also be doing it if the peer's orphanage work queue isn't empty, as ProcessMessage may have added to it. I think this is technically a bug introduced in #15644 -- there should have been an if (!pfrom->orphan_work_set.empty()) fMoreWork = true; added after the ProcessMessage() call, as well as after the // this maintains the order of responses comment.

The only result of the bug is that when a parent comes in, processing its orphaned children will be delayed by 100ms rather than occurring essentially immediately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be better to write:

while (true) {
    bool all_nodes_finished_processing = true;
    for (pnode : nodes) {
        if (m_msgproc->ProcessMessages(pnode)) all_nodes_finished_processing = false;
        m_msgproc->SendMessages(pnode);
    }
    if (all_nodes_finished_processing) wait_for(100ms);
}

The idea (as I understand it) is just that ProcessMessages() returns true if it does some substantial work and exits early, without checking everything. If that's the case for any peer, then it doesn't wait before looping through all peers again. If ProcessMessages() completed for all peers, whether it did work or not, then it is okay to sleep briefly before looping over everything again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right. Let's try applying these changes and see whether everything makes sense once again.

Part of my confusion is that it's unclear whether has_more_work applies to a given peer, or to all peers... That variable rename could help. (peer_has_more_work vs node_has_more_work or something.)

@naumenkogs
Copy link
Contributor

Concept ACK

@ajtowns ajtowns force-pushed the 202211-orphanguardians branch from 0a32d4f to 8f9d9ab Compare December 22, 2022 14:39
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work after the other changes in this PR -- you might be creating more work for some other peer, so should be checking if any peers have txs to reconsider, and only doing this once after looping through all the peers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I guess this raises the question: should we move orphan processing outside of ProcessMessages and SendMessages entirely, and just do it separately, independent of both who sent the orphan and who sent the parent?

Unless someone has a good reason not to, I think I might draft that up as an alternative PR so the approaches can be compared more directly.

Copy link
Member

@glozow glozow Dec 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we move orphan processing outside of ProcessMessages and SendMessages entirely, and just do it separately, independent of both who sent the orphan and who sent the parent?

Had a similar thought, mainly I don't see why we should try to process all orphans before moving on to ProcessMessage()... afaict we never send a message to a peer as a result of processing an orphan (other than potentially disconnecting) so maintaining the order of responses is not justification like it is for finishing ProcessGetData() queue.

fwiw doing orphan processing independently seems reasonable to me, though I don't have a very strong grasp of all the possible implications.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of it more along the lines of "if a node gives us a lot of orphans to process, and we're now processing them, we shouldn't do more work for that node until all the orphan work is done; also we shouldn't delay work from other nodes while we're processing that node's orphans".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I guess this raises the question: should we move orphan processing outside of ProcessMessages and SendMessages entirely,

Where do you intend to move it? Into a dedicated thread for orphan processing?

This doesn't work after the other changes in this PR -- you might be creating more work for some other peer, so should be checking if any peers have txs to reconsider, and only doing this once after looping through all the peers.

Is this situation (that the peer who sent us an orphan is different from the peer that sent us the parent) typical? Otherwise might it be acceptable to just have a unnecessary 100ms wait if it occurs very rarely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do you intend to move it? Into a dedicated thread for orphan processing?

No, I was thinking a separate ProcessOrphans method like ProcessMessages and SendMessages.

Is this situation (that the peer who sent us an orphan is different from the peer that sent us the parent) typical? Otherwise might it be acceptable to just have a unnecessary 100ms wait if it occurs very rarely?

On a long running node, I'm seeing maybe a few hundred orphans being accepted during startup (ie, children whose parent txs were announced while my node was offline) but then perhaps only 20-60 accepted orphans per day. I don't have much of an idea how many of those would have a different node sending the parent compared to the child, but it doesn't seem likely to be a real problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we move orphan processing outside of ProcessMessages and SendMessages entirely, and just do it separately, independent of both who sent the orphan and who sent the parent?

Unless someone has a good reason not to, I think I might draft that up as an alternative PR so the approaches can be compared more directly.

See #19364

Copy link
Member

@glozow glozow Dec 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we move orphan processing outside of ProcessMessages and SendMessages entirely, and just do it separately, independent of both who sent the orphan and who sent the parent?

Had a similar thought, mainly I don't see why we should try to process all orphans before moving on to ProcessMessage()... afaict we never send a message to a peer as a result of processing an orphan (other than potentially disconnecting) so maintaining the order of responses is not justification like it is for finishing ProcessGetData() queue.

fwiw doing orphan processing independently seems reasonable to me, though I don't have a very strong grasp of all the possible implications.

Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I guess this raises the question: should we move orphan processing outside of ProcessMessages and SendMessages entirely,

Where do you intend to move it? Into a dedicated thread for orphan processing?

This doesn't work after the other changes in this PR -- you might be creating more work for some other peer, so should be checking if any peers have txs to reconsider, and only doing this once after looping through all the peers.

Is this situation (that the peer who sent us an orphan is different from the peer that sent us the parent) typical? Otherwise might it be acceptable to just have a unnecessary 100ms wait if it occurs very rarely?

@ajtowns
Copy link
Contributor Author

ajtowns commented Jan 25, 2023

Updated. I left the orphan processing in ProcessMessage based on the following thoughts:

  • an attacker could probably create arbitrary amounts of work by sending orphan txs that then gets added to the todo list as soon as the parent becomes available. if they do this, it should only slow down communication with that peer, not any of our other peers.
  • avoiding a pretty rare 100ms delay isn't worth the hassle of extending NetEventsInterface with new virtual methods for handling orphan txs

@naumenkogs
Copy link
Contributor

utACK 6c0a068ea3670bb7a1c955ed6ecde5d050c1a812

Previously, when we processed a new tx we would attempt to ATMP any
orphans that considered the new tx a parent immediately, but would only
accept at most one such tx, leaving any others to be considered on a
future run of ProcessMessages(). With this patch, we don't attempt any
orphan processing immediately after receiving a tx, instead deferring
all of them until the next call to ProcessMessages().
If we made progress on orphans, consider that enough work for this peer
for this round of ProcessMessages. This also allows cleaning up the api
for TxOrphange:GetTxToReconsider().
…consider

When PR#15644 made orphan processing interruptible, it also introduced a
potential 100ms delay between processing of the first and second newly
reconsiderable orphan, because it didn't check if the orphan work set
was non-empty after invoking ProcessMessage(). This adds that check, so
that ProcessMessages() will return true if there are orphans to process,
usually avoiding the 100ms delay in CConnman::ThreadMessageHandler().
@ajtowns ajtowns force-pushed the 202211-orphanguardians branch from 6c0a068 to c58c249 Compare January 25, 2023 08:16
@naumenkogs
Copy link
Contributor

utACK c58c249

Copy link
Member

@glozow glozow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK c58c249

LOCK(peer->m_getdata_requests_mutex);
if (!peer->m_getdata_requests.empty()) fMoreWork = true;
}
// Does this peer has an orphan ready to reconsider?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c58c249 nit: s/has/have

@glozow glozow requested a review from mzumsande January 25, 2023 14:49
Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review ACK c58c249

nits can be ignored.

* reconsidered.
* @return True if there are still orphans in this peer's work set.
* @peer[in] peer The peer whose orphan transactions we will reconsider. Generally only
* one orphan will be reconsidered on each call of this function. If an
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think "Generally only one orphan will be reconsidered" is misleading. Calling ProcessTransaction for a tx once more, getting TX_MISSING_INPUTS again and therefore keeping the orphan would qualify as being "reconsidered" for me, whereas what is meant is that the tx is being removed from the orphanage.
But no need to address here, since this PR didn't change this.

void AddChildrenToWorkSet(const CTransaction& tx) EXCLUSIVE_LOCKS_REQUIRED(!m_mutex);;

/** Does this peer have any work to do? */
bool HaveTxToReconsider(NodeId peer) EXCLUSIVE_LOCKS_REQUIRED(!m_mutex);;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: double ;

@glozow glozow merged commit 77a3603 into bitcoin:master Jan 26, 2023
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Jan 26, 2023
c58c249 net_processing: indicate more work to do when orphans are ready to reconsider (Anthony Towns)
ecb0a3e net_processing: Don't process tx after processing orphans (Anthony Towns)
c583775 net_processing: only process orphans before messages (Anthony Towns)
be23046 txorphange: Drop redundant originator arg from GetTxToReconsider (Anthony Towns)
a4fe099 txorphanage: index workset by originating peer (Anthony Towns)

Pull request description:

  We currently process orphans by assigning them to the peer that provided a missing parent; instead assign them to the peer that provided the orphan in the first place. This prevents a peer from being able to marginally delay another peer's transactions and also simplifies the internal API slightly. Because we're now associating orphan processing with the peer that provided the orphan originally, we no longer process orphans immediately after receiving the parent, but defer until a future call to `ProcessMessage`.

  Based on bitcoin#26295

ACKs for top commit:
  naumenkogs:
    utACK c58c249
  glozow:
    ACK c58c249
  mzumsande:
    Code Review ACK c58c249

Tree-SHA512: 3186c346f21e60440266a2a80a9d23d7b96071414e14b2b3bfe50457c04c18b1eab109c3d8c2a7726a6b10a2eda1f0512510a52c102da112820a26f5d96f12de
Copy link

@Wenyu1227 Wenyu1227 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/net_processing.cpp

Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request May 29, 2024
Summary:
```
We currently process orphans by assigning them to the peer that provided a missing parent; instead assign them to the peer that provided the orphan in the first place. This prevents a peer from being able to marginally delay another peer's transactions and also simplifies the internal API slightly. Because we're now associating orphan processing with the peer that provided the orphan originally, we no longer process orphans immediately after receiving the parent, but defer until a future call to ProcessMessage.
```

Backport of [[bitcoin/bitcoin#26551 | core#26551]].

Depends on D16235.

Test Plan:
  ninja all check-all

Reviewers: #bitcoin_abc, PiRK

Reviewed By: #bitcoin_abc, PiRK

Subscribers: PiRK

Differential Revision: https://reviews.bitcoinabc.org/D16236
@bitcoin bitcoin locked and limited conversation to collaborators Sep 26, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

8 participants