-
Notifications
You must be signed in to change notification settings - Fork 38.8k
refactor: Replace boost::mutex,condition_var,chrono with std equivalents in scheduler #18234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @theuni. I know you have been working on similar refactors recently. |
ff3821b to
42a5326
Compare
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
src/sync.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pending comments on my post here: #18088 (comment)
Maybe replace it with PASTE2(revlock,PASTE2(__FILE__,__LINE__))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already use __COUNTER__ for LOCK() in sync.h, so reusing it here seems fine. Using __FILE__ seems unlikely to generate a usable symbol name too :)
|
Concept ACK Thanks for working on de-boosting! :) |
jonatack
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 42a5326 I reviewed the changes and surrounding code in detail, built/ran tests for each commit, ran bitcoind for 2 days (debian), and shut down/restarted bitcoind several times. This looks good.
src/test/txindex_tests.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 69460c9 could these be also removed in src/test/transaction_tests.cpp L::457-458 if they are handled with TestingSetup::~TestingSetup(); the test compiles and runs without them.
--- a/src/test/transaction_tests.cpp
+++ b/src/test/transaction_tests.cpp
@@ -453,9 +453,6 @@ BOOST_AUTO_TEST_CASE(test_big_witness_transaction)
bool controlCheck = control.Wait();
assert(controlCheck);
-
- threadGroup.interrupt_all();
- threadGroup.join_all();
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why this should be safe to remove. The scheduler needs to be stopped before this test ends, so the interrupt_all should be replaced by a scheduler.stop(). I think that otherwise you reintroduce bug #15410
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I was just now looking at src/test/checkqueue_tests.cpp which has similar code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonatack In transaction_tests, that's a locally declared threadGroup, and it's a BasicTestingSetup too, so the TestingSetup destructor wouldn't execute, and even if it did, wouldn't clean up that threadGroup.
@MarcoFalke Ah, you're quite right. Have undeleted the code and added a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajtowns right, makes sense. Thanks for the explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in commit 306f71b:
After adding the scheduler.stop(), they are again redundant and confusing. I'd suggest to remove the threadGroup calls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling stop() doesn't actually cause the thread to stop, you need to wait for it to complete. Could do that by looping and waiting for AreThreadsServicingQueue() to be false, but that still leaves the possibility you continue on and destruct the object in between --nThreadsServicingQueue; and newTaskScheduled.notify_one(); at the end of serviceQueue. Could drop the interrupt_all but if some other thread in the group needed an interrupt before being joined, that'd cause a hang too... So until we change something else, this doesn't seem redundant to me?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ups, correct.
|
Ok, I see that in the description. Re-reviewing. |
|
I would've expected this to require more work, but it looks like the scheduler was already surprisingly interruptible. Concept ACK. @TheBlueMatt might be aware of more scheduling minefields. |
src/init.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these guaranteed to fully execute in-order? Can an interruption-point hit while the scheduler is still tearing down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you're asking; node.scheduler->stop() just sets a boolean to say "exit the loop" and notifies everything waiting on the condition var to prevent it staying asleep, all the tear down happens in the thread. The interruption could certainly happen while it's tearing down, stop(); interrupt_all() happen in one thread, and only afterwards does the other thread even wake up. I don't think boost interrupt will have an effect anymore though, because serviceQueue doesn't use any boost primitives and boost interrupt only sets a flag which is checked by other boost things.
42a5326 to
9426759
Compare
|
Should add to OP? |
|
@ajtowns Can you rebase on master now that #16117 is merged. Also, feel free to include a commit here to kill the last |
Calling interrupt_all() will immediately stop the scheduler, so it's safe to invoke stop() beforehand, and this removes the reliance on boost to interrupt serviceQueue().
Changes from boost::chrono to std::chrono, boost::condition_var to std::condition_var, boost::mutex to sync.h Mutex, and reverselock.h to sync.h REVERSE_LOCK. Also adds threadsafety annotations to CScheduler members.
9426759 to
70a6b52
Compare
|
Great work! I'm really glad this is happening, thanks for working on this. |
maflcko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 70a6b52, just style-nits 🌩
Show signature and timestamp
Signature:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
ACK 70a6b529f306ff72ea1badf25e970a92b2b17ab3, just style-nits 🌩
-----BEGIN PGP SIGNATURE-----
iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
pUjbywwAhqXP0sWwiqNLLsI0h9fBTK3DQ7qRxX2cxV9RmUoiok2nKutVPGef2v4L
brJk/asu9hCJ5nb9/SMf6NRqnyhJ4lbTbt961Tu1au0LNfloCUfvfb8JxY42yDXD
zxzKYI/vQeUnaNBMINmZ9rOLkLMYqnxXb7fjejNL9l8txmE4jmRp53lvZj34IfDr
0Mhu3rq+FwLFgvwMWPEbPR1jG/AN2XaW0sHMMe1Xpw/intRxJLh3BSpPHZ9+AjIF
kWF6Dqp4Onf2dO4jmfaB5ipQI9IYFFZh3+jD3wePBD9kZuL2iqsfXvfblm+rXGZ4
XMOSXjJoNreRp/hH14gcGuKQrOOiraemT15rH4XiC/DUI6lLMa1FWMNGn1Pob5A0
EwjIUwnHVNbCqpJIUcGXLEHTZ0pE/1cQ6nBKEFoPcTEgk7D1gH5mX4wP3UBdPw57
Vka9bjoTAOLqdy7B2FE8qg+dNiVqaDm9eRxixS1uqDY+lWHfNkeubg9SaE6LkC//
5qT991xc
=GGMb
-----END PGP SIGNATURE-----
Timestamp of file with hash 8b7a37edb1d87523e3df70995e0c51ee4dd4dafc50f78c0b12195a66e49c4ce2 -
src/test/txindex_tests.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in commit 306f71b:
After adding the scheduler.stop(), they are again redundant and confusing. I'd suggest to remove the threadGroup calls
| push_lock(cs, CLockLocation(pszName, pszFile, nLine, fTry, util::ThreadGetInternalName())); | ||
| } | ||
|
|
||
| void CheckLastCritical(void* cs, std::string& lockname, const char* guardname, const char* file, int line) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in commit b9c4260:
Why is this not called GetLastLockName, which better describes what this function does, imo.
|
|
||
| UniqueLock& lock; | ||
| UniqueLock templock; | ||
| std::string lockname; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in commit b9c4260:
Should be m_lockname. Also for other members?
| reverse_lock& operator=(reverse_lock const&); | ||
|
|
||
| UniqueLock& lock; | ||
| UniqueLock templock; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in commit b9c4260:
I think it would help readers of the code to briefly explain why this is needed. Also, m_templock? ;)
| // | ||
| // NOTE: | ||
| // boost::thread / boost::chrono should be ported to std::thread / std::chrono | ||
| // boost::thread should be ported to std::thread |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in commit d0ebd93:
Actually boost::thread is no longer needed an can be replaced with std::thread in this file and the scheduler tests. I know that the thread group is still boost, but the scheduler doesn't care about that.
Summary: Calling interrupt_all() will immediately stop the scheduler, so it's safe to invoke stop() beforehand, and this removes the reliance on boost to interrupt serviceQueue(). This is a partial abckport of Core [[bitcoin/bitcoin#18234 | PR18234]] : bitcoin/bitcoin@306f71b Test Plan: ninja all check-all Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D6531
Summary: This is a partial backport of Core [[bitcoin/bitcoin#18234 | PR18234]] : bitcoin/bitcoin@b9c4260 Depends on D6531 Test Plan: ninja all check-all Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D6532
Summary: Changes from boost::chrono to std::chrono, boost::condition_var to std::condition_var, boost::mutex to sync.h Mutex, and reverselock.h to sync.h REVERSE_LOCK. Also adds threadsafety annotations to CScheduler members. This is a partial backport of Core [[bitcoin/bitcoin#18234 | PR18234]] : bitcoin/bitcoin@d0ebd93 Depends on D6532 Test Plan: ninja all check-all Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D6533
Summary: This is a partial backport of Core [[bitcoin/bitcoin#18234 | PR18234]] : bitcoin/bitcoin@cea19f6 Depends on D6533 Test Plan: ninja all check-all Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D6535
…t's wait_until ed0223e scheduler: Workaround negative nsecs bug in boost's wait_until (Luke Dashjr) Pull request description: Some boost versions have a bug that can cause a time prior to system boot (or wake from sleep) to throw an exception instead of return timeout See boostorg/thread#308 NOTE: This was addressed in master with a refactor (#18234), so this isn't a strict backport and needs full review. Fixes #18227 Cleanly merges to 0.14+ ACKs for top commit: laanwj: ACK ed0223e gruve-p: ACK ed0223e Tree-SHA512: 57edd0a22d7cf8f04b427e23d1ba10746a492638021d4438781b9d313dd0459418f64f0489be72d8e2286bbc8e8762d77e673868c25eb3bf4f0423a8fe8cdffa
Summary: > After [[bitcoin/bitcoin#18234 | core#18234]] the scheduler itself no longer cares if the > serviceQueue is run in a std::thread or boost::thread. Change the > documentation to std::thread because we switched to C++11. This is a backport of [[bitcoin/bitcoin#19090 | core#19090]] [3/6] bitcoin/bitcoin@fab2950 Test Plan: Proofreading Reviewers: #bitcoin_abc, majcosta Reviewed By: #bitcoin_abc, majcosta Differential Revision: https://reviews.bitcoinabc.org/D9957
… std equivalents in scheduler b2ac70f [Cleanup] Drop unused reverselock.h (random-zebra) 55ac6de scheduler: switch from boost to std (Anthony Towns) 15f292b sync.h: add REVERSE_LOCK (Anthony Towns) a8705a1 scheduler: don't rely on boost interrupt on shutdown (random-zebra) 004c064 util: Add UnintrruptibleSleep (MarcoFalke) Pull request description: Backports bitcoin#18234 [Anthony Towns] > Motivated by 18227, but should stand alone. Changing from boost::condition_var to std::condition_var means threadGroup.interrupt_all isn't enough to interrupt serviceQueue anymore, so that means calling stop() before join_all() is needed. And the existing reverselock.h code doesn't work with sync.h's DebugLock code (because the reversed lock won't be removed from g_lockstack which then leads to incorrect potential deadlock warnings), so I've replaced that with a dedicated class and macro that's aware of our debug lock behaviour. Fixes 16027, Fixes 14200, Fixes 18227 First commit coming from bitcoin#16117 This should fix #2515 ACKs for top commit: furszy: code review ACK b2ac70f Fuzzbawls: utACK b2ac70f Tree-SHA512: 5a39f8047948d02f75bb3cce99b8a9ce8d31e30c2e12a8ebfe2a871ed8e0fbe552c6e9deaad5d3e9c0830e7286f2cb627f0605d5887c4407f91cc2f9d667e8b1
…_var,chrono with std equivalents in scheduler This backport does not include changes that depend on bitcoin pr 18037 70a6b52 lint-cppcheck: Remove -DHAVE_WORKING_BOOST_SLEEP_FOR (Anthony Towns) 294937b scheduler_tests: re-enable mockforward test (Anthony Towns) cea19f6 Drop unused reverselock.h (Anthony Towns) d0ebd93 scheduler: switch from boost to std (Anthony Towns) b9c4260 sync.h: add REVERSE_LOCK (Anthony Towns) 306f71b scheduler: don't rely on boost interrupt on shutdown (Anthony Towns) Pull request description: Replacing boost functionality with C++11 stuff. Motivated by bitcoin#18227, but should stand alone. Changing from `boost::condition_var` to `std::condition_var` means `threadGroup.interrupt_all` isn't enough to interrupt `serviceQueue` anymore, so that means calling `stop()` before `join_all()` is needed. And the existing reverselock.h code doesn't work with sync.h's DebugLock code (because the reversed lock won't be removed from `g_lockstack` which then leads to incorrect potential deadlock warnings), so I've replaced that with a dedicated class and macro that's aware of our debug lock behaviour. Fixes bitcoin#16027, Fixes bitcoin#14200, Fixes bitcoin#18227 ACKs for top commit: laanwj: ACK 70a6b52 Tree-SHA512: d1da13adeabcf9186d114e2dad9a4fdbe2e440f7afbccde0c13dfbaf464efcd850b69d3371c5bf8b179d7ceb9d81f4af3cc22960b90834e41eaaf6d52ef7d331 # Conflicts: # src/reverselock.h # src/rpc/misc.cpp # src/scheduler.cpp # src/scheduler.h # src/sync.cpp # src/sync.h # src/test/reverselock_tests.cpp # src/test/scheduler_tests.cpp # src/test/test_dash.cpp # test/lint/extended-lint-cppcheck.sh
Replacing boost functionality with C++11 stuff.
Motivated by #18227, but should stand alone. Changing from
boost::condition_vartostd::condition_varmeansthreadGroup.interrupt_allisn't enough to interruptserviceQueueanymore, so that means callingstop()beforejoin_all()is needed. And the existing reverselock.h code doesn't work with sync.h's DebugLock code (because the reversed lock won't be removed fromg_lockstackwhich then leads to incorrect potential deadlock warnings), so I've replaced that with a dedicated class and macro that's aware of our debug lock behaviour.Fixes #16027, Fixes #14200, Fixes #18227