-
Notifications
You must be signed in to change notification settings - Fork 333
Fix nullptr clientModel access during shutdown #801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix nullptr clientModel access during shutdown #801
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. |
|
|
||
| void BitcoinGUI::updateNetworkState() | ||
| { | ||
| if (!clientModel) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, this could still race, since there is no lock, so clientModel may still be deleted after this statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, this could still race, since there is no lock, so clientModel may still be deleted after this statement?
Check it now.
BitcoinApplication::requestShutdown() sets the main window (BitcoinGUI) clientModel to nullptr without clearing the subscribed signals (this is the window->setClientModel(nullptr) line), which leaves an open window where the signal handlers are still active but the main window client model is nullptr. So, by unsubscribing the signals prior to the setClientModel(nullptr) call, we avoid any concurrent event happening in-between the main window nullptr client model and the final clientModel destruction at the end of BitcoinApplication::requestShutdown() (which was the one destructing the event handlers).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about events that are already running? Does the destructor ensure they are flushed/discarded, or will it just disconnect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about events that are already running? Does the destructor ensure they are flushed/discarded, or will it just disconnect?
I haven't looked in detail, but it seems like in principle, if any boost signal events are currently running, and they send asynchronous Q_EMIT qt signals that are run in the GUI event loop thread, there should not be a problem. That is assuming this method is always running the GUI event loop thread, and the code deleting clientModel is also running in the GUI event loop thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but then it means this check in this line could be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but then it means this check in this line could be removed?
I think not if the events are queued, but again I'm not too sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit late but here again. Sorry, life called.
it means this check in this line could be removed?
Not really. While requestShutdown() is being executed, no other task can be processed in the main thread but new tasks can be enqueued to be executed by the main thread after it. So, any event coming from the backend while requestShutdown() is running (prior to the unsubscribeFromCoreSignals call) will be executed right after it finishes it.
So.. this means that we also need to guard RPCConsole::updateNetworkState() function for a nullptr clientModel. The same segfault could arise there too.
I think not if the events are queued, but again I'm not too sure.
Yes. Both events are running on the main thread. The diff is on who enqueues them: the quit action is appended by the main thread (when we press the quit button) and the "network active change" is enqueued by the backend thread, when the RPC changes the value.
That being said, it is all theoretical. Let me do some tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think not if the events are queued, but again I'm not too sure.
Yes. Both events are running on the main thread. The diff is on who enqueues them: the quit action is appended by the main thread (when we press the quit button) and the "network active change" is enqueued by the backend thread, when the RPC changes the value.
Thanks for clarifying. Indeed, if the events all run on a single thread (haven't checked this), this fix should be correct.
I guess I was confusing myself with the second commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the events all run on a single thread (haven't checked this), this fix should be correct.
Yeah. As far as I have tested, they are running on the main thread. But, this doesn't mean that the code will always act in the same way.. we are using the default qt connection() function, which provides Qt::ConnectionType = Qt::AutoConnection, which per https://doc.qt.io/qt-6/qt.html#ConnectionType-enum it might use Qt::DirectConnection, which executes the task on the signalling thread.
To solve this, we could force the main thread execution by changing the slot connection side but, as I haven't been able to trigger the segfault after the fix, will leave this investigation for a follow-up.
cc337f8 to
4b47c78
Compare
During shutdown, already queue events dispatched from the backend such 'numConnectionsChanged' and 'networkActiveChanged' could try to access the clientModel object, which might not exist because we manually delete it inside 'BitcoinApplication::requestShutdown()'.
4b47c78 to
f357578
Compare
|
The second commit is a refactor? If yes, it would be good to clarify |
|
Concept ACK. I did my own research and I agree with @furszy's analysis if the problem. |
hebasto
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK f357578, I have reviewed the code and it looks OK. Tested on Ubuntu 23.10.
nit: The second commit message might be amended according to #801 (comment).
Preventing dangling signals.
f357578 to
b7aa717
Compare
hebasto
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-ACK b7aa717
Summary: ``` During shutdown, already queue events dispatched from the backend such 'numConnectionsChanged' and 0networkActiveChanged' could try to access the clientModel object, which might not exist because we manually delete it inside 'BitcoinApplication::requestShutdown()'. This happen because boost does not clears the queued events when they arise concurrently with the signal disconnection (see https://www.boost.org/doc/libs/1_55_0/doc/html/signals2/thread-safety.html). From the docs: "Note that since we unlock the connection's mutex before executing its associated slot, it is possible a slot will still be executing after it has been disconnected by a connection::disconnect(), if the disconnect was called concurrently with signal invocation." "The fact that concurrent signal invocations use the same combiner object means you need to insure any custom combiner you write is thread-safe" So, we need to guard clientModel before accessing it at the handler side. ``` Backport of [[bitcoin-core/gui#801 | core-gui#801]]. Test Plan: ninja all check-all On OSX, run the Qt with Chronik enabled, let it sync a bit then kill it. Check it doesn't segfault upon shutdown. Reviewers: #bitcoin_abc, PiRK Reviewed By: #bitcoin_abc, PiRK Differential Revision: https://reviews.bitcoinabc.org/D17682
Fixing #800.
During shutdown, already queue events dispatched from the backend such
'numConnectionsChanged' and 0networkActiveChanged' could try to access
the clientModel object, which might not exist because we manually delete
it inside 'BitcoinApplication::requestShutdown()'.
This happen because boost does not clears the queued events when they arise
concurrently with the signal disconnection (see https://www.boost.org/doc/libs/1_55_0/doc/html/signals2/thread-safety.html).
From the docs:
So, we need to guard
clientModelbefore accessing it at the handler side.