-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Fix data races in bls_worker and use ctpl_stl queue. #4240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix data races in bls_worker and use ctpl_stl queue. #4240
Conversation
f237a37 to
42c99be
Compare
PastaPastaPasta
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! I have a couple of thoughts / questions and one change
src/bls/bls_worker.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this vec (and below) no longer passed by reference?
I guess it is a reference to a shared_ptr, so likely makes sense to just pass by value. Is that the intention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, only the lambda (line 257) has to capture vec by value as the lambda is going to be passed to another thread and we want to share the ownership of vec. I will change them back to shared_ptr& to avoid useless increments of the refcounter. Also, I will mark the shared_ptr& param as const.
src/bls/bls_worker.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why you decided to start them in this loop as opposed to the way it was done before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once an Aggregator is created, it can be started, there is no reason to delay it, but before there was a memory management issue that prevented starting the aggregator earlier: this (i.e. the VectorAggregator) was raw sent to the Aggregator together with a doneCallback which deletes this. Thus, the following scenario could happen: we can't directly start the aggregator here as it might be so fast that it deletes "this" while we are still in this loop (mentioned in a comment, line 380). Now that we are managing the memory of this using a shared_ptr (VectorAggregator is created using make_shared), this object will be alive as long as someone requires it and we can safely start the aggregator as soon as it is created.
src/ctpl_stl.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| #endif // __ctpl_stl_thread_pool_H__ | |
| #endif // __ctpl_stl_thread_pool_H__ | |
42c99be to
15f3e63
Compare
Change ctpl implementation to use STL queue & mutex. Use ctpl synchronized queue instead of boost lockfree queue in bls worker aggregator. Use smart pointers for memory management of Aggregator and VectorAggregator. With 'delete this;' the objects are prone to data race on the delete operator. Use smart pointers for memory management of ContributionVerifier. Pass shared_ptr by value to other threads via worker pool.
15f3e63 to
4498e4f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK
👍
slightly tested ACK
PastaPastaPasta
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK
c76e7b6 [Core] Fix data races in bls_worker and use ctpl_stl queue (Fuzzbawls) 55babf4 [Build] Improve std::atomic test macro (Fuzzbawls) Pull request description: Coming from dashpay#4240, bitcoin#20938, and bitcoin#21920, this resolves numerous compile-time inconsistencies/failures on certain combos of gcc and cpu arch that were observed on launchpad.net over the 5.4.0 release cycle. ACKs for top commit: furszy: ACK c76e7b6 random-zebra: ACK c76e7b6 Tree-SHA512: d931142f978d0d22668b9ff2e16d48c148b7d163b35aeef149ab08a13174536c035f2581a8ec5bc9782226fcb27ed33b08aec475d1522d8e8532d00a12c5df72
Change ctpl implementation to use STL queue & mutex. Use ctpl synchronized queue instead of boost lockfree queue in bls worker aggregator. Use smart pointers for memory management of Aggregator and VectorAggregator. With 'delete this;' the objects are prone to data race on the delete operator. Use smart pointers for memory management of ContributionVerifier. Pass shared_ptr by value to other threads via worker pool.
Thread sanitizer triggers data races on bls worker. Steps to reproduce:
This pull request aims to fix all bls_worker related data races. The above testcase also trigger a data race on wallet/wallet.cpp function
KeepKeywhich I am going to tackle in a different pull request as suggested by @PastaPastaPastaFixes:
delete this;the objects are prone to data race on the delete operator.Benchmark for develop: https://pastebin.com/5qPBERhr
Benchmark for ctpl STL queue: https://pastebin.com/hx1UiqdZ
Benchmark for concurrent queue: https://pastebin.com/VXCXG5ak
There's no actual difference (very small) between them while used in the bls_worker, but I was curious to find out what causes the performance impact and I also did a profilling using
gprofon the BLS bench. Here you can find the results: https://pastebin.com/AS14eSbr My opinion is that changing the queue is a micro-improvement when it comes to performance, but it is essential for the project to drop theboost::lockfree::queueas it has a data race where one can pop & access an already deleted element. My suggestion is to use the ctpl STL queue as it is correct, small (easy to maintain), easy to be understood and has tiny impact on performance (see benchmarks). I do not have a strong opinion on which queue we should use as long as it has little to no impact.