Skip to content

Conversation

@gabriel-bjg
Copy link

@gabriel-bjg gabriel-bjg commented Jul 10, 2021

Thread sanitizer triggers data races on bls worker. Steps to reproduce:

./configure --enable-debug --prefix=`pwd`/depends/x86_64-pc-linux-gnu/ --with-sanitizers=thread
make clean
make -j4
python3 test/functional/feature_llmq_is_cl_conflicts.py

This pull request aims to fix all bls_worker related data races. The above testcase also trigger a data race on wallet/wallet.cpp function KeepKey which I am going to tackle in a different pull request as suggested by @PastaPastaPasta

Fixes:

  • Change ctpl implementation to use STL queue & mutex as suggested here Use std based ctpl header only library #4126 (comment)
  • Use ctpl synchronized queue instead of boost lockfree queue in bls worker aggregator.
  • Use smart pointers for memory management of Aggregator and VectorAggregator. With delete this; the objects are prone to data race on the delete operator.
  • Use smart pointers for memory management of ContributionVerifier. Same reason as for Aggregator and VectorAggregator.
  • Pass shared_ptr by value to other threads via worker pool.

Benchmark for develop: https://pastebin.com/5qPBERhr
Benchmark for ctpl STL queue: https://pastebin.com/hx1UiqdZ
Benchmark for concurrent queue: https://pastebin.com/VXCXG5ak

There's no actual difference (very small) between them while used in the bls_worker, but I was curious to find out what causes the performance impact and I also did a profilling using gprof on the BLS bench. Here you can find the results: https://pastebin.com/AS14eSbr My opinion is that changing the queue is a micro-improvement when it comes to performance, but it is essential for the project to drop the boost::lockfree::queue as it has a data race where one can pop & access an already deleted element. My suggestion is to use the ctpl STL queue as it is correct, small (easy to maintain), easy to be understood and has tiny impact on performance (see benchmarks). I do not have a strong opinion on which queue we should use as long as it has little to no impact.

@gabriel-bjg gabriel-bjg marked this pull request as ready for review July 10, 2021 17:44
@gabriel-bjg gabriel-bjg force-pushed the thread-sanitizer-fixes branch from f237a37 to 42c99be Compare July 10, 2021 18:52
Copy link
Member

@PastaPastaPasta PastaPastaPasta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! I have a couple of thoughts / questions and one change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this vec (and below) no longer passed by reference?

I guess it is a reference to a shared_ptr, so likely makes sense to just pass by value. Is that the intention?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, only the lambda (line 257) has to capture vec by value as the lambda is going to be passed to another thread and we want to share the ownership of vec. I will change them back to shared_ptr& to avoid useless increments of the refcounter. Also, I will mark the shared_ptr& param as const.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why you decided to start them in this loop as opposed to the way it was done before?

Copy link
Author

@gabriel-bjg gabriel-bjg Jul 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once an Aggregator is created, it can be started, there is no reason to delay it, but before there was a memory management issue that prevented starting the aggregator earlier: this (i.e. the VectorAggregator) was raw sent to the Aggregator together with a doneCallback which deletes this. Thus, the following scenario could happen: we can't directly start the aggregator here as it might be so fast that it deletes "this" while we are still in this loop (mentioned in a comment, line 380). Now that we are managing the memory of this using a shared_ptr (VectorAggregator is created using make_shared), this object will be alive as long as someone requires it and we can safely start the aggregator as soon as it is created.

src/ctpl_stl.h Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#endif // __ctpl_stl_thread_pool_H__
#endif // __ctpl_stl_thread_pool_H__

@gabriel-bjg gabriel-bjg force-pushed the thread-sanitizer-fixes branch from 42c99be to 15f3e63 Compare July 11, 2021 08:11
Change ctpl implementation to use STL queue & mutex.

Use ctpl synchronized queue instead of boost lockfree queue in bls worker aggregator.

Use smart pointers for memory management of Aggregator and VectorAggregator. With 'delete this;' the objects are prone to data race on the delete operator.

Use smart pointers for memory management of ContributionVerifier.

Pass shared_ptr by value to other threads via worker pool.
@gabriel-bjg gabriel-bjg force-pushed the thread-sanitizer-fixes branch from 15f3e63 to 4498e4f Compare July 11, 2021 09:17
@UdjinM6 UdjinM6 added this to the 18 milestone Jul 11, 2021
Copy link

@UdjinM6 UdjinM6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK

👍
slightly tested ACK

Copy link
Member

@PastaPastaPasta PastaPastaPasta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK

@PastaPastaPasta PastaPastaPasta merged commit ee6d2f5 into dashpay:develop Jul 12, 2021
furszy added a commit to PIVX-Project/PIVX that referenced this pull request Jan 9, 2022
c76e7b6 [Core] Fix data races in bls_worker and use ctpl_stl queue (Fuzzbawls)
55babf4 [Build] Improve std::atomic test macro (Fuzzbawls)

Pull request description:

  Coming from dashpay#4240, bitcoin#20938, and bitcoin#21920, this resolves numerous compile-time inconsistencies/failures on certain combos of gcc and cpu arch that were observed on launchpad.net over the 5.4.0 release cycle.

ACKs for top commit:
  furszy:
    ACK c76e7b6
  random-zebra:
    ACK c76e7b6

Tree-SHA512: d931142f978d0d22668b9ff2e16d48c148b7d163b35aeef149ab08a13174536c035f2581a8ec5bc9782226fcb27ed33b08aec475d1522d8e8532d00a12c5df72
gades pushed a commit to cosanta/cosanta-core that referenced this pull request May 4, 2022
Change ctpl implementation to use STL queue & mutex.

Use ctpl synchronized queue instead of boost lockfree queue in bls worker aggregator.

Use smart pointers for memory management of Aggregator and VectorAggregator. With 'delete this;' the objects are prone to data race on the delete operator.

Use smart pointers for memory management of ContributionVerifier.

Pass shared_ptr by value to other threads via worker pool.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants