Fix data races in bls_worker and use ctpl_stl queue. #4240

gabriel-bjg · 2021-07-10T17:44:34Z

Thread sanitizer triggers data races on bls worker. Steps to reproduce:

./configure --enable-debug --prefix=`pwd`/depends/x86_64-pc-linux-gnu/ --with-sanitizers=thread
make clean
make -j4
python3 test/functional/feature_llmq_is_cl_conflicts.py

This pull request aims to fix all bls_worker related data races. The above testcase also trigger a data race on wallet/wallet.cpp function KeepKey which I am going to tackle in a different pull request as suggested by @PastaPastaPasta

Fixes:

Change ctpl implementation to use STL queue & mutex as suggested here Use std based ctpl header only library #4126 (comment)
Use ctpl synchronized queue instead of boost lockfree queue in bls worker aggregator.
Use smart pointers for memory management of Aggregator and VectorAggregator. With delete this; the objects are prone to data race on the delete operator.
Use smart pointers for memory management of ContributionVerifier. Same reason as for Aggregator and VectorAggregator.
Pass shared_ptr by value to other threads via worker pool.

Benchmark for develop: https://pastebin.com/5qPBERhr
Benchmark for ctpl STL queue: https://pastebin.com/hx1UiqdZ
Benchmark for concurrent queue: https://pastebin.com/VXCXG5ak

There's no actual difference (very small) between them while used in the bls_worker, but I was curious to find out what causes the performance impact and I also did a profilling using gprof on the BLS bench. Here you can find the results: https://pastebin.com/AS14eSbr My opinion is that changing the queue is a micro-improvement when it comes to performance, but it is essential for the project to drop the boost::lockfree::queue as it has a data race where one can pop & access an already deleted element. My suggestion is to use the ctpl STL queue as it is correct, small (easy to maintain), easy to be understood and has tiny impact on performance (see benchmarks). I do not have a strong opinion on which queue we should use as long as it has little to no impact.

PastaPastaPasta

Thanks for this! I have a couple of thoughts / questions and one change

PastaPastaPasta · 2021-07-10T19:37:21Z

src/bls/bls_worker.cpp

Why is this vec (and below) no longer passed by reference?

I guess it is a reference to a shared_ptr, so likely makes sense to just pass by value. Is that the intention?

You are right, only the lambda (line 257) has to capture vec by value as the lambda is going to be passed to another thread and we want to share the ownership of vec. I will change them back to shared_ptr& to avoid useless increments of the refcounter. Also, I will mark the shared_ptr& param as const.

PastaPastaPasta · 2021-07-10T19:40:28Z

src/bls/bls_worker.cpp

Can you explain why you decided to start them in this loop as opposed to the way it was done before?

Once an Aggregator is created, it can be started, there is no reason to delay it, but before there was a memory management issue that prevented starting the aggregator earlier: this (i.e. the VectorAggregator) was raw sent to the Aggregator together with a doneCallback which deletes this. Thus, the following scenario could happen: we can't directly start the aggregator here as it might be so fast that it deletes "this" while we are still in this loop (mentioned in a comment, line 380). Now that we are managing the memory of this using a shared_ptr (VectorAggregator is created using make_shared), this object will be alive as long as someone requires it and we can safely start the aggregator as soon as it is created.

PastaPastaPasta · 2021-07-10T19:42:00Z

src/ctpl_stl.h

Suggested change

#endif // __ctpl_stl_thread_pool_H__

#endif // __ctpl_stl_thread_pool_H__

Change ctpl implementation to use STL queue & mutex. Use ctpl synchronized queue instead of boost lockfree queue in bls worker aggregator. Use smart pointers for memory management of Aggregator and VectorAggregator. With 'delete this;' the objects are prone to data race on the delete operator. Use smart pointers for memory management of ContributionVerifier. Pass shared_ptr by value to other threads via worker pool.

UdjinM6

~~utACK~~

👍
slightly tested ACK

PastaPastaPasta

utACK

c76e7b6 [Core] Fix data races in bls_worker and use ctpl_stl queue (Fuzzbawls) 55babf4 [Build] Improve std::atomic test macro (Fuzzbawls) Pull request description: Coming from dashpay#4240, bitcoin#20938, and bitcoin#21920, this resolves numerous compile-time inconsistencies/failures on certain combos of gcc and cpu arch that were observed on launchpad.net over the 5.4.0 release cycle. ACKs for top commit: furszy: ACK c76e7b6 random-zebra: ACK c76e7b6 Tree-SHA512: d931142f978d0d22668b9ff2e16d48c148b7d163b35aeef149ab08a13174536c035f2581a8ec5bc9782226fcb27ed33b08aec475d1522d8e8532d00a12c5df72

Change ctpl implementation to use STL queue & mutex. Use ctpl synchronized queue instead of boost lockfree queue in bls worker aggregator. Use smart pointers for memory management of Aggregator and VectorAggregator. With 'delete this;' the objects are prone to data race on the delete operator. Use smart pointers for memory management of ContributionVerifier. Pass shared_ptr by value to other threads via worker pool.

gabriel-bjg marked this pull request as ready for review July 10, 2021 17:44

gabriel-bjg force-pushed the thread-sanitizer-fixes branch from f237a37 to 42c99be Compare July 10, 2021 18:52

PastaPastaPasta requested changes Jul 10, 2021

View reviewed changes

gabriel-bjg force-pushed the thread-sanitizer-fixes branch from 42c99be to 15f3e63 Compare July 11, 2021 08:11

gabriel-bjg force-pushed the thread-sanitizer-fixes branch from 15f3e63 to 4498e4f Compare July 11, 2021 09:17

UdjinM6 added this to the 18 milestone Jul 11, 2021

UdjinM6 approved these changes Jul 11, 2021

View reviewed changes

PastaPastaPasta approved these changes Jul 11, 2021

View reviewed changes

PastaPastaPasta merged commit ee6d2f5 into dashpay:develop Jul 12, 2021

Fuzzbawls mentioned this pull request Dec 28, 2021

[Core][Build] Fix atomic data races in BLS operations PIVX-Project/PIVX#2702

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix data races in bls_worker and use ctpl_stl queue. #4240

Fix data races in bls_worker and use ctpl_stl queue. #4240

Uh oh!

gabriel-bjg commented Jul 10, 2021 •

edited

Loading

Uh oh!

PastaPastaPasta left a comment

Uh oh!

PastaPastaPasta Jul 10, 2021

Uh oh!

gabriel-bjg Jul 11, 2021

Uh oh!

PastaPastaPasta Jul 10, 2021

Uh oh!

gabriel-bjg Jul 11, 2021 •

edited

Loading

Uh oh!

PastaPastaPasta Jul 10, 2021

Uh oh!

UdjinM6 left a comment •

edited

Loading

Uh oh!

PastaPastaPasta left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	#endif // __ctpl_stl_thread_pool_H__
	#endif // __ctpl_stl_thread_pool_H__

Fix data races in bls_worker and use ctpl_stl queue. #4240

Fix data races in bls_worker and use ctpl_stl queue. #4240

Uh oh!

Conversation

gabriel-bjg commented Jul 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PastaPastaPasta left a comment

Choose a reason for hiding this comment

Uh oh!

PastaPastaPasta Jul 10, 2021

Choose a reason for hiding this comment

Uh oh!

gabriel-bjg Jul 11, 2021

Choose a reason for hiding this comment

Uh oh!

PastaPastaPasta Jul 10, 2021

Choose a reason for hiding this comment

Uh oh!

gabriel-bjg Jul 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PastaPastaPasta Jul 10, 2021

Choose a reason for hiding this comment

Uh oh!

UdjinM6 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PastaPastaPasta left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gabriel-bjg commented Jul 10, 2021 •

edited

Loading

gabriel-bjg Jul 11, 2021 •

edited

Loading

UdjinM6 left a comment •

edited

Loading