-
Notifications
You must be signed in to change notification settings - Fork 38.7k
Parallel ThreadMessageHandler #9488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel ThreadMessageHandler #9488
Conversation
Surprisingly this hasn't been causing me any issues while testing, probably because it requires lots of large blocks to be flying around. Send/Recv corks need tests!
This will be needed so that the message processor can cork incoming messages
These conditions are problematic to check without locking, and we shouldn't be relying on the refcount to disconnect.
when vRecvMsg becomes a private buffer, it won't make sense to allow other threads to mess with it anymore.
This is left-over from before there was proper accounting. Hitting 2x the sendbuffer size should not be possible.
…eserialize We'll soon no longer have access to vRecvMsg, and this is more intuitive anyway.
This allows locking to be pushed down to only where it's needed Also reuse the current time rather than checking multiple times.
This may be used publicly in the future
In order to sleep accurately, the message handler needs to know if _any_ node has more processing that it should do before the entire thread sleeps. Rather than returning a value that represents whether ProcessMessages encountered a message that should trigger a disconnnect, interpret the return value as whether or not that node has more work to do. Also, use a global fProcessSleep value that can be set by other threads, which takes precedence (for one cycle) over the messagehandler's decision. Note that the previous behavior was to only process one message per loop (except in the case of a bad checksum or invalid header). That was added in The only change here in that regard is that the current node now falls to the back of the processing queue for the bad checksum/invalid header cases.
This separates the storage of messages from the net and queued messages for processing, allowing the locks to be split.
Messages are dumped very quickly from the socket handler to the processor, so it's the depth of the processing queue that's interesting. The socket handler checks the process queue's size during the brief message hand-off and pauses if necessary, and the processor possibly unpauses each time a message is popped off of its queue.
Similar to the recv flag, but this one indicates whether or not the net's send buffer is full. The socket handler checks the send queue when a new message is added and pauses if necessary, and possibly unpauses after each message is drained from its buffer.
This is representative of how messages will be sent out once processing is abstracted out. Makes it clear that the processor _must_ accept all messages. It also pushes the use of cs_vProcessMsg to a function with narrow scope.
It's now only used for message size/time accounting
src/net_processing.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cs_main is taken by ProcessGetData at like 2456 above. (github won't let me comment there).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
I am somewhat dubious that it's feasible to determine and maintain that various bits of code do not take cs_main anywhere in their call graph. I think code like this might need a new debug tool where you can poison_lock(cs_main) which increments a thread local counter while it is in scope. And if the counter is positive when cs_main is taken it causes an assertion/debug print. |
e8351a2 to
73e696c
Compare
|
Yea, I thought about adding something like that and never got around to it...I'll try to add it to this in the coming days. |
This uses the new return value from ProcessMessages which indicates whether the next message might not need cs_main, trying to process any messages which do not need cs_main, even while another thread is running which takes cs_main for messages.
73e696c to
ac0a3ad
Compare
|
Added a DEBUG_LOCKORDER check to prevent cs_main. |
|
Closing for now. Will work towards a more whole solution for 0.15. |
Based on a (now outdated) #9441.
This runs multiple ThreadMessageHandlers, but only allows them to do (relatively limited) work - it has a whitelisted list of commands which are expected to not take cs_main, and only runs those in a secondary thread, but running anything in the "main thread" (a concept based on randomly acquiring an atomic_bool at the top of the processing loop).
Additionally, it will never be in the ProcessMessages/SendMessages part of the loop for one node in both threads.
With this, #9419, and #9375 (plus changing the whitelisted list of messages) we can respond to getblocktxn requests while another ProcessMessages is busy connecting the block.