Skip to content

Conversation

@pstratem
Copy link
Contributor

Currently there is a 100ms sleep in the message processing thread that adds delay to processing messages.

This patch skips the sleep when there is additional work to do.

Additionally ProcessMessages & ProcessGetData are modified such that their maximum run time is reduced; this improves the average latency peers see when there is high load. (This has a significant impact on high latency networks during initial block synchronization).

@gmaxwell
Copy link
Contributor

I'd really rather be never sleeping (except when waiting for IO, then event driven). Otherwise the delay will still delay block propagation.

@pstratem
Copy link
Contributor Author

The problem is this isn't entirely about IO.

SendMessages is actually largely a polling function with timers and other logic not dependent on IO.

@cjdelisle
Copy link

I understand this is important and it should be validated and pulled. I took a look at the code and as I understand it, it does the following:

  1. If a single getdata contains multiple requests, only the first those up to and including the first request for a block will be serviced.
  2. ProcessMessages will only process one message at a time so one message will be parsed from each node with outstanding messages to process rather than processing all outstanding messages from each node before moving on the the next.
  3. Don't sleep if there are still outstanding messages to be processed, this alone should improve performance.

Although I'm not particularly well versed in the bitcoin codebase this commit has my ACK on a code basis.

I have built and tested it on my laptop and a vm which I -connect= and it synced the chain just fine.
My laptop is showing surprisingly high processor load but it's also serving a lot of block downloads on the order of 100kB-1500kB/s upload speed. Does anyone know if 100% processor load on one core is normal for bitcoin under these circumstances? I want to be sure there is no perf regression in the ProcessMessages codepath since it drops out of the function for each message when there are multiple messages stacked.

src/net.cpp Outdated

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that this is being set erroneously given I'm getting 100% CPU pretty reliably and not when the patch is removed.
https://github.com/pstratem/bitcoin/blob/3c08b7a23fe3afc8025658316b99f750c9a90fbf/src/main.cpp#L3980
leaves vRecvMsg non-empty if the send buffer is full which will cause an infiniloop condition when a buffer fills up.

@cjdelisle
Copy link

My assumption 1. turns out to have been wrong. This does not actually change the behavior of the node WRT how it handles requests with multiple getdata(block) in them. It simply drops out and leaves the balance of the requests till later which will be at most 100ms later when ProcessMessages() is next called.

In https://github.com/pstratem/bitcoin/blob/3c08b7a23fe3afc8025658316b99f750c9a90fbf/src/main.cpp#L3974

    if (!pfrom->vRecvGetData.empty())
        ProcessGetData(pfrom);

You'll want to add another if (!pfrom->vRecvGetData.empty()) return fOk; so that it doesn't try to continue parsing until the previous getdata is completely parsed.

@pstratem
Copy link
Contributor Author

The goal here is for the loop to process 1 item from vRecvGetData and/or 1 item from vRecvMsg per peer.

Probably ProcessGetData should be moved outside of ProcessMessages entirely, but that is outside the scope of this patch (I think).

@pstratem
Copy link
Contributor Author

I see what you're getting at, I have made the suggested change.

@gavinandresen
Copy link
Contributor

ACK on the general idea of round-robin-processing requests. I haven't had time to code review or test.

@sipa
Copy link
Member

sipa commented Oct 30, 2013

ACK on approach and implementation. Haven't tested.

Some commits can be squashed together. In particular 3rd, which is a bugfix for the second.

@cjdelisle
Copy link

Tested the patched version, still getting occasional 100% CPU. I think perhaps we should return a different value from ProcessMessages() if we do not want the calling routine to sleep, that allows us to be more explicit than checking that the queues are not empty. It's just difficult to reliably prove that there is no other way out of ProcessMessages() which leaves non-empty queues.

@pstratem
Copy link
Contributor Author

There are only 2 paths where ProcessMessages can run without consuming a message in vRecvMsg.

Either the send buffer is full or the next message in the queue is not complete.

I agree that a more general solution should be applied, but probably later as part of a larger reorganization of the networking code.

@sipa
Copy link
Member

sipa commented Nov 2, 2013

Rebase please?

@BitcoinPullTester
Copy link

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/75ef87dd936cd9c84d9a9fd3afce6198409636c4 for binaries and test log.
This test script verifies pulls every time they are updated. It, however, dies sometimes and fails to test properly. If you are waiting on a test, please check timestamps to verify that the test.log is moving at http://jenkins.bluematt.me/pull-tester/current/
Contact BlueMatt on freenode if something looks broken.

@gavinandresen
Copy link
Contributor

Tested on OSX, behaves properly.

Message-handling should be eventually be reworked to use boost:asio (never sleep, be I/O event driven as @gmaxwell suggests) and to use some sort of active queue management to mitigate denial-of-service attacks.

But this code is better than what we've got, so I am going to merge.

gavinandresen added a commit that referenced this pull request Nov 4, 2013
Reduce latency in network processing
@gavinandresen gavinandresen merged commit 97f844d into bitcoin:master Nov 4, 2013
@darkhosis
Copy link

Ah, I already had a similar change in my bitcoind for a year or so (though in my case, I simply reduced this 100ms to 10ms). I guess that's why blockchain.info always picked up so much stuff. I thought it may be because of the 20ms latency..

It would explain the four blocks picked up by blockchain.info from 198.12.127.2, which I'm running w/ similar settings (just sending out far fewer transactions).. didn't think it'd have such an impact

@rebroad
Copy link
Contributor

rebroad commented Feb 17, 2014

#2125 was raised ages ago to address this. Glad to see it finally implemented!

LogPrintf("ProcessMessage(%s, %u bytes) FAILED\n", strCommand.c_str(), nMessageSize);

break;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't really need this while loop any longer since we break out after the first iteration of it....

theuni added a commit to theuni/bitcoin that referenced this pull request Jan 6, 2017
In order to sleep accurately, the message handler needs to know if _any_ node
has more processing that it should do before the entire thread sleeps.

Rather than returning a value that represents whether ProcessMessages
encountered a message that should trigger a disconnnect, interpret the return
value as whether or not that node has more work to do.

Also, use a global fProcessWake value that can be set by other threads,
which takes precedence (for one cycle) over the messagehandler's decision.

Note that the previous behavior was to only process one message per loop
(except in the case of a bad checksum or invalid header). That was changed in
PR bitcoin#3180.

The only change here in that regard is that the current node now falls to the
back of the processing queue for the bad checksum/invalid header cases.
theuni added a commit to theuni/bitcoin that referenced this pull request Jan 13, 2017
In order to sleep accurately, the message handler needs to know if _any_ node
has more processing that it should do before the entire thread sleeps.

Rather than returning a value that represents whether ProcessMessages
encountered a message that should trigger a disconnnect, interpret the return
value as whether or not that node has more work to do.

Also, use a global fProcessWake value that can be set by other threads,
which takes precedence (for one cycle) over the messagehandler's decision.

Note that the previous behavior was to only process one message per loop
(except in the case of a bad checksum or invalid header). That was changed in
PR bitcoin#3180.

The only change here in that regard is that the current node now falls to the
back of the processing queue for the bad checksum/invalid header cases.
UdjinM6 referenced this pull request in dashpay/dash Aug 23, 2017
…1586)

* net: fix typo causing the wrong receive buffer size

Surprisingly this hasn't been causing me any issues while testing, probably
because it requires lots of large blocks to be flying around.

Send/Recv corks need tests!

* net: make vRecvMsg a list so that we can use splice()

* net: make GetReceiveFloodSize public

This will be needed so that the message processor can cork incoming messages

* net: only disconnect if fDisconnect has been set

These conditions are problematic to check without locking, and we shouldn't be
relying on the refcount to disconnect.

* net: wait until the node is destroyed to delete its recv buffer

when vRecvMsg becomes a private buffer, it won't make sense to allow other
threads to mess with it anymore.

* net: set message deserialization version when it's actually time to deserialize

We'll soon no longer have access to vRecvMsg, and this is more intuitive anyway.

* net: handle message accounting in ReceiveMsgBytes

This allows locking to be pushed down to only where it's needed

Also reuse the current time rather than checking multiple times.

* net: record bytes written before notifying the message processor

* net: Add a simple function for waking the message handler

This may be used publicly in the future

* net: remove useless comments

* net: remove redundant max sendbuffer size check

This is left-over from before there was proper accounting. Hitting 2x the
sendbuffer size should not be possible.

* net: rework the way that the messagehandler sleeps

In order to sleep accurately, the message handler needs to know if _any_ node
has more processing that it should do before the entire thread sleeps.

Rather than returning a value that represents whether ProcessMessages
encountered a message that should trigger a disconnnect, interpret the return
value as whether or not that node has more work to do.

Also, use a global fProcessWake value that can be set by other threads,
which takes precedence (for one cycle) over the messagehandler's decision.

Note that the previous behavior was to only process one message per loop
(except in the case of a bad checksum or invalid header). That was changed in
PR #3180.

The only change here in that regard is that the current node now falls to the
back of the processing queue for the bad checksum/invalid header cases.

* net: add a new message queue for the message processor

This separates the storage of messages from the net and queued messages for
processing, allowing the locks to be split.

* net: add a flag to indicate when a node's process queue is full

Messages are dumped very quickly from the socket handler to the processor, so
it's the depth of the processing queue that's interesting.

The socket handler checks the process queue's size during the brief message
hand-off and pauses if necessary, and the processor possibly unpauses each time
a message is popped off of its queue.

* net: add a flag to indicate when a node's send buffer is full

Similar to the recv flag, but this one indicates whether or not the net's send
buffer is full.

The socket handler checks the send queue when a new message is added and pauses
if necessary, and possibly unpauses after each message is drained from its buffer.

* net: remove cs_vRecvMsg

vRecvMsg is now only touched by the socket handler thread.

The accounting vars (nRecvBytes/nLastRecv/mapRecvBytesPerMsgCmd) are also
only used by the socket handler thread, with the exception of queries from
rpc/gui. These accesses are not threadsafe, but they never were. This needs to
be addressed separately.

Also, update comment describing data flow
OlegGirko referenced this pull request in OlegGirko/dash Apr 17, 2018
In order to sleep accurately, the message handler needs to know if _any_ node
has more processing that it should do before the entire thread sleeps.

Rather than returning a value that represents whether ProcessMessages
encountered a message that should trigger a disconnnect, interpret the return
value as whether or not that node has more work to do.

Also, use a global fProcessWake value that can be set by other threads,
which takes precedence (for one cycle) over the messagehandler's decision.

Note that the previous behavior was to only process one message per loop
(except in the case of a bad checksum or invalid header). That was changed in
PR dashpay#3180.

The only change here in that regard is that the current node now falls to the
back of the processing queue for the bad checksum/invalid header cases.
lateminer pushed a commit to lateminer/bitcoin that referenced this pull request Jan 4, 2019
In order to sleep accurately, the message handler needs to know if _any_ node
has more processing that it should do before the entire thread sleeps.

Rather than returning a value that represents whether ProcessMessages
encountered a message that should trigger a disconnnect, interpret the return
value as whether or not that node has more work to do.

Also, use a global fProcessWake value that can be set by other threads,
which takes precedence (for one cycle) over the messagehandler's decision.

Note that the previous behavior was to only process one message per loop
(except in the case of a bad checksum or invalid header). That was changed in
PR bitcoin#3180.

The only change here in that regard is that the current node now falls to the
back of the processing queue for the bad checksum/invalid header cases.
KolbyML pushed a commit to KolbyML/bitcoin that referenced this pull request Dec 5, 2020
In order to sleep accurately, the message handler needs to know if _any_
node has more processing that it should do before the entire thread
sleeps.

Rather than returning a value that represents whether ProcessMessages
encountered a message that should trigger a disconnnect, interpret the
return value as whether or not that node has more work to do.

Also, use a global fProcessWake value that can be set by other threads,
which takes precedence (for one cycle) over the messagehandler's
decision.

Note that the previous behavior was to only process one message per loop
(except in the case of a bad checksum or invalid header). That was
changed in PR bitcoin#3180.

The only change here in that regard is that the current node now falls
to the back of the processing queue for the bad checksum/invalid header
cases.
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants