Skip to content

Conversation

@hebasto
Copy link
Member

@hebasto hebasto commented Oct 30, 2021

This PR significantly reduces lock contention in the CCheckQueue class by releasing a mutex before calling std::condition_variable::notify_one and std::condition_variable::notify_all.

From C++ docs:

The notifying thread does not need to hold the lock on the same mutex as the one held by the waiting thread(s); in fact doing so is a pessimization, since the notified thread would immediately block again, waiting for the notifying thread to release the lock.

Related to:

@hebasto hebasto changed the title 211030 contention Avoid excessive lock contention in CCheckQueue::Add Oct 30, 2021
@hebasto
Copy link
Member Author

hebasto commented Oct 30, 2021

cc @ajtowns @theuni @martinus @jonatack

Copy link
Contributor

@lsilva01 lsilva01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tACK 5bcfd70 on Ubuntu 20.04

This PR moves LOCK(m_mutex) to a smaller scope, so when code reaches notify_one()or notify_all(), the mutex is already released.
It also changes CCheckQueue ::Add to exit earlier if vChecks is empty.

@jamesob
Copy link
Contributor

jamesob commented Nov 1, 2021

@hebasto do you expect there are significant performance implications for this change? Should I benchmark?

@hebasto
Copy link
Member Author

hebasto commented Nov 1, 2021

@jamesob

@hebasto do you expect there are significant performance implications for this change?

Not sure about "significant" :)

Should I benchmark?

I'll appreciate it! Also I have a plan to improve CCheckQueue performance further (not here, of course).

@sipa
Copy link
Member

sipa commented Nov 1, 2021

Concept ACK, but left some nits.

Copy link
Contributor

@martinus martinus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 5bcfd70. I doubt that this makes any measurable difference, but I've often been wrong with benchmark guess and shorter locks are always better 👍

src/checkqueue.h Outdated
queue.push_back(T());
check.swap(queue.back());
}
num_of_new_checks = vChecks.size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think you could also write const auto num_of_new_checks = vChecks.size(); above, saves one line of code in the lock and doesn't keep an uninitialized variable around for a few lines.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually none of that is necessary. vChecks doesn't change size during the adding loop. The old vChecks.size() based checks can be kept.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vChecks doesn't change size during the adding loop.

Correct. But vChecks is passed in CCheckQueue::Add by a non-const reference, and that makes possible its modification by a concurrent thread between leaving a critical section and checking num_of_new_checks == 1.

Although, the only caller is located in CChainState::ConnectBlock which is guarded by the cs_main mutex. Therefore, you are right.

The old vChecks.size() based checks can be kept.

Yes, it will work. But while looking into CCheckQueue::Add code, it is not obvious that vChecks.size() is constant in multi-threading environment. I think that adding a local variable makes code more readable and robust considering any possible changes at the caller site(s) in the future.

OTOH, it seems all could look clearer if we pass vChecks as an rvalue reference using move semantics. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would never think that a reference argument could be changed by another thread while the function is executing. I hope we don't do that anywhere in the code. Because how would the function avoid races? Which mutex protects the argument? That would require something like (it does not compile, of course):

void func(T& arg GUARDED_BY(some_mutex), ...)
{
    ...
    LOCK(some_mutex);
    access arg;
    ...

So, I think it is safe to assume that vChecks will not change while Add() is executing. Indeed the only caller passes in a local variable that is not visible by other threads (and then throws it away, so the "swap" is actually a "move").

... it seems all could look clearer if we pass vChecks as an rvalue reference using move semantics ...

Yes :) and std::move() each element from vChecks into queue. It is unnecessary to first create a default-constructed object only to swap it with another and throw it away. This is what std::move() is for.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hebasto That's not how const works in C++. The lack thereof means you (the called function in this case) are allowed to modify the argument; it says nothing at all about whether another thread might be modifying the same variable. Even if the argument was const, nothing prevents another thread from holding a mutable reference to the same object.

That said, passing an argument (const or not) to a function while it is being modified by other threads would be completely non-idiomatic, dangerous code. It is possible, and sometimes necessary of course, but at least you'd expect the passed object to have internal mutexes or other synchronization mechanisms to protect its state in that case; not something the callee should deal with. Virtually every STL function will exhibit undefined behavior if you pass it references to arguments that are being modified concurrently. Outside of very exceptional cases that should raise big red flags, you can very much assume that arguments passed to a function are under exclusive control of that function for the duration of its execution.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sipa Thanks for your detailed explanation.

Going to address all of the comments.

@JeremyRubin
Copy link
Contributor

would love to see some thorough testing on this. I could be misremembering, but check cross platform that there's no reason to continue to hold the lock. IIRC there was some sort of edge condition (maybe in window?) around a notify being able to be missed or something, but the details are like 5-6 years out of my cache :)

@hebasto
Copy link
Member Author

hebasto commented Nov 3, 2021

@sipa @martinus @vasild

Thank you for your reviews and your suggestions!

Updated 5bcfd70 -> 459e208 (pr23397.01 -> pr23397.02, diff):

  • addressed all of the comments

Copy link
Contributor

@vasild vasild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 459e208

The CI failure looks unrelated.

@jamesob
Copy link
Contributor

jamesob commented Nov 3, 2021

ACKish - I haven't reviewed the code, but it works for the benchmarks and so I guess in some sense I've tested it. As expected, no big diff in performance.

#23397 vs. $mergebase (absolute)

bench name x #23397 $mergebase
ibd.local.range.500000.540000.total_secs 2 4826.3510 (± 0.4951) 4831.4970 (± 6.8399)
ibd.local.range.500000.540000.peak_rss_KiB 2 6410470.0000 (± 1750.0000) 6406678.0000 (± 2714.0000)
ibd.local.range.500000.540000.cpu_kernel_secs 2 345.3800 (± 3.1200) 348.2850 (± 0.8050)
ibd.local.range.500000.540000.cpu_user_secs 2 30102.6800 (± 5.2600) 30086.9150 (± 24.1150)

#23397 vs. $mergebase (relative)

bench name x #23397 $mergebase
ibd.local.range.500000.540000.total_secs 2 1.000 1.001
ibd.local.range.500000.540000.peak_rss_KiB 2 1.001 1.000
ibd.local.range.500000.540000.cpu_kernel_secs 2 1.000 1.008
ibd.local.range.500000.540000.cpu_user_secs 2 1.001 1.000

@martinus
Copy link
Contributor

martinus commented Nov 3, 2021

ACK 459e208, codereview and tested. I first thought this introduced a segfault in psbt_wallet_tests/psbt_updater_test because that test failed for me, but thats a different issue fixed in #23403.

Copy link
Contributor

@theStack theStack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code-review ACK 459e208

Maybe I'm missing something, but IMHO there is currently no possible way that CCheckQueue:Add gets called with an empty vector as argument, as the size always equals the number of inputs of a transaction (see last out parameter of CheckInputScripts). It'd be only empty for coinbase transactions, but for those control.Add(...) doesn't get called. That said, it doesn't hurt if the empty check in Add is still there.

@maflcko maflcko merged commit b4f647f into bitcoin:master Nov 29, 2021
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Nov 29, 2021
459e208 Exit early for an empty vChecks in CCheckQueue::Add (Hennadii Stepanov)
c43aa62 Avoid excessive lock contention in CCheckQueue::Add (Hennadii Stepanov)

Pull request description:

  This PR significantly reduces lock contention in the `CCheckQueue` class by releasing a mutex before calling `std::condition_variable::notify_one` and `std::condition_variable::notify_all`.

  From C++ [docs](https://en.cppreference.com/w/cpp/thread/condition_variable/notify_one):
  > The notifying thread does not need to hold the lock on the same mutex as the one held by the waiting thread(s); in fact doing so is a pessimization, since the notified thread would immediately block again, waiting for the notifying thread to release the lock.

  Related to:
  - bitcoin#23167
  - bitcoin#23223

ACKs for top commit:
  martinus:
    ACK 459e208, codereview and tested. I first thought this introduced a segfault in `psbt_wallet_tests/psbt_updater_test` because that test failed for me, but thats a different issue fixed in bitcoin#23403.
  vasild:
    ACK 459e208
  theStack:
    Code-review ACK 459e208

Tree-SHA512: c197858656392ba3ebcd638d713cf93c9fb48b7b3bad193209490d2828f9c7e3ae4dee6f84674f2f34dceed894139562e29579ee7299e06756c8c990caddc5ed
@hebasto hebasto deleted the 211030-contention branch November 29, 2021 20:48
RandyMcMillan pushed a commit to RandyMcMillan/mempool-tab that referenced this pull request Dec 23, 2021
…kQueue::Add

93bb704 Exit early for an empty vChecks in CCheckQueue::Add (Hennadii Stepanov)
6917a86 Avoid excessive lock contention in CCheckQueue::Add (Hennadii Stepanov)

Pull request description:

  This PR significantly reduces lock contention in the `CCheckQueue` class by releasing a mutex before calling `std::condition_variable::notify_one` and `std::condition_variable::notify_all`.

  From C++ [docs](https://en.cppreference.com/w/cpp/thread/condition_variable/notify_one):
  > The notifying thread does not need to hold the lock on the same mutex as the one held by the waiting thread(s); in fact doing so is a pessimization, since the notified thread would immediately block again, waiting for the notifying thread to release the lock.

  Related to:
  - #23167
  - #23223

ACKs for top commit:
  martinus:
    ACK 93bb704, codereview and tested. I first thought this introduced a segfault in `psbt_wallet_tests/psbt_updater_test` because that test failed for me, but thats a different issue fixed in #23403.
  vasild:
    ACK 93bb704
  theStack:
    Code-review ACK 93bb704

Tree-SHA512: c197858656392ba3ebcd638d713cf93c9fb48b7b3bad193209490d2828f9c7e3ae4dee6f84674f2f34dceed894139562e29579ee7299e06756c8c990caddc5ed
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Apr 11, 2022
459e208 Exit early for an empty vChecks in CCheckQueue::Add (Hennadii Stepanov)
c43aa62 Avoid excessive lock contention in CCheckQueue::Add (Hennadii Stepanov)

Pull request description:

  This PR significantly reduces lock contention in the `CCheckQueue` class by releasing a mutex before calling `std::condition_variable::notify_one` and `std::condition_variable::notify_all`.

  From C++ [docs](https://en.cppreference.com/w/cpp/thread/condition_variable/notify_one):
  > The notifying thread does not need to hold the lock on the same mutex as the one held by the waiting thread(s); in fact doing so is a pessimization, since the notified thread would immediately block again, waiting for the notifying thread to release the lock.

  Related to:
  - bitcoin#23167
  - bitcoin#23223

ACKs for top commit:
  martinus:
    ACK 459e208, codereview and tested. I first thought this introduced a segfault in `psbt_wallet_tests/psbt_updater_test` because that test failed for me, but thats a different issue fixed in bitcoin#23403.
  vasild:
    ACK 459e208
  theStack:
    Code-review ACK 459e208

Tree-SHA512: c197858656392ba3ebcd638d713cf93c9fb48b7b3bad193209490d2828f9c7e3ae4dee6f84674f2f34dceed894139562e29579ee7299e06756c8c990caddc5ed
@bitcoin bitcoin locked and limited conversation to collaborators Nov 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants