Skip to content

Conversation

@vasild
Copy link
Contributor

@vasild vasild commented Jun 1, 2020

Always lock the mutexes cs_main and m_cached_tip_mutex in
the same order: cs_main, m_cached_tip_mutex. Otherwise we may end up
in a deadlock.

ClientModel::m_cached_tip_blocks is protected by
ClientModel::m_cached_tip_mutex. There are two access paths that
lock the two mutexes in opposite order:

validation.cpp:2868 CChainState::ActivateBestChain(): lock cs_main
validation.cpp:2916 CChainState::ActivateBestChain(): call uiInterface.NotifyBlockTip()
ui_interface.cpp:52 CClientUIInterface::NotifyBlockTip(): go deep in boost
...
qt/clientmodel.cpp:255 BlockTipChanged(): lock m_cached_tip_mutex

and

qt/clientmodel.cpp:119 ClientModel::getBestBlockHash(): lock m_cached_tip_mutex
qt/clientmodel.cpp:121 ClientModel::getBestBlockHash(): call m_node.getBestBlockHash()
interfaces/node.cpp:200 NodeImpl::getBestBlockHash(): lock cs_main

From debug.log:

POTENTIAL DEADLOCK DETECTED
Previous lock order was:
 m_cs_chainstate validation.cpp:2851
 (1) cs_main validation.cpp:2868
 ::mempool.cs validation.cpp:2868
 (2) clientmodel->m_cached_tip_mutex qt/clientmodel.cpp:255
Current lock order is:
 (2) m_cached_tip_mutex qt/clientmodel.cpp:119
 (1) ::cs_main interfaces/node.cpp:200

The possible deadlock was introduced in #17993

@vasild
Copy link
Contributor Author

vasild commented Jun 1, 2020

Notice: this is just one possible fix. Other possible fixes are:

  1. Reverse the other access path, so we always have the order as m_cached_tip_mutex, cs_main. I think this would require bigger changes.

  2. Ditch m_cached_tip_mutex and protect ClientModel::m_cached_tip_blocks by cs_main. Adding more stuff under the protection of cs_main would in general reduce concurrency. Maybe not in this particular case if we are always holding cs_main anyway when we try to access ClientModel::m_cached_tip_blocks. Do we?

cc @furszy

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the requirement to check IsNull twice is dropped, then I think this whole function can be written shorter in just 4 lines of code:

     uint256 tip{WITH_LOCK(m_cached_tip_mutex, return m_cached_tip_blocks)};
     if (!tip.IsNull()) return tip;

     tip = m_node.getBestBlockHash(); // Will lock `cs_main` (and release it). This is needed to be done early to not violate the implicit lock order cs_main -> m_cached_tip_mutex

     return WITH_LOCK(m_cached_tip_mutex, return m_cached_tip_blocks = tip);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above snippet can end up setting/overwriting m_cached_tip_blocks if it is not null. I.e. if it is null during the if, later some other thread sets it to some non null value (while we are calling m_node.getBestBlockHash()) and we end up overwriting what the other thread set.

Maybe that is ok in this case or maybe it cannot happen due to some constraints outside of this function. This code is new to me so I played safe :)

Btw if (A) return B; has two disadvantages compared to putting the return on a separate line:

  1. it is not possible to set a breakpoint on return B; only. For example if the condition is evaluated 1000 times, but A is true only a few times and one wants to break when return B; is executed.
  2. it will skew line-based code coverage.

Always lock the mutexes `cs_main` and `m_cached_tip_mutex` in
the same order: `cs_main`, `m_cached_tip_mutex`. Otherwise we may end up
in a deadlock.

`ClientModel::m_cached_tip_blocks` is protected by
`ClientModel::m_cached_tip_mutex`. There are two access paths that
lock the two mutexes in opposite order:

```
validation.cpp:2868 CChainState::ActivateBestChain(): lock cs_main
validation.cpp:2916 CChainState::ActivateBestChain(): call uiInterface.NotifyBlockTip()
ui_interface.cpp:52 CClientUIInterface::NotifyBlockTip(): go deep in boost
...
qt/clientmodel.cpp:255 BlockTipChanged(): lock m_cached_tip_mutex
```

and

```
qt/clientmodel.cpp:119 ClientModel::getBestBlockHash(): lock m_cached_tip_mutex
qt/clientmodel.cpp:121 ClientModel::getBestBlockHash(): call m_node.getBestBlockHash()
interfaces/node.cpp:200 NodeImpl::getBestBlockHash(): lock cs_main
```

From `debug.log`:

```
POTENTIAL DEADLOCK DETECTED
Previous lock order was:
 m_cs_chainstate validation.cpp:2851
 (1) cs_main validation.cpp:2868
 ::mempool.cs validation.cpp:2868
 (2) clientmodel->m_cached_tip_mutex qt/clientmodel.cpp:255
Current lock order is:
 (2) m_cached_tip_mutex qt/clientmodel.cpp:119
 (1) ::cs_main interfaces/node.cpp:200
```

The possible deadlock was introduced in
bitcoin#17993
@vasild vasild force-pushed the lock_order_m_cached_tip_mutex branch from 367c8a6 to f46b678 Compare June 1, 2020 15:15
@furszy
Copy link
Member

furszy commented Jun 1, 2020

Hmm, good catch.

Would actually think it on the other way around. The problem shouldn't be on the slot side (the slot can be inside the same process or be another process -- GUI division -- or whatever). Better to review the reason behind cs_main being locked on the signal trigger side on the first place.

Why not doing the same as what is being done on the NotifyHeaderTip function. Trigger the signal without cs_main locked.

@maflcko
Copy link
Member

maflcko commented Jun 1, 2020

See commit d86edd3

@jonasschnelli
Copy link
Contributor

Tested ACK f46b678

@jonasschnelli jonasschnelli removed their request for review June 5, 2020 07:23
@jonasschnelli jonasschnelli merged commit f4f2220 into bitcoin:master Jun 5, 2020
@vasild vasild deleted the lock_order_m_cached_tip_mutex branch June 5, 2020 07:27
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Jun 5, 2020
f46b678 qt: lock cs_main, m_cached_tip_mutex in that order (Vasil Dimov)

Pull request description:

  Always lock the mutexes `cs_main` and `m_cached_tip_mutex` in
  the same order: `cs_main`, `m_cached_tip_mutex`. Otherwise we may end up
  in a deadlock.

  `ClientModel::m_cached_tip_blocks` is protected by
  `ClientModel::m_cached_tip_mutex`. There are two access paths that
  lock the two mutexes in opposite order:

  ```
  validation.cpp:2868 CChainState::ActivateBestChain(): lock cs_main
  validation.cpp:2916 CChainState::ActivateBestChain(): call uiInterface.NotifyBlockTip()
  ui_interface.cpp:52 CClientUIInterface::NotifyBlockTip(): go deep in boost
  ...
  qt/clientmodel.cpp:255 BlockTipChanged(): lock m_cached_tip_mutex
  ```

  and

  ```
  qt/clientmodel.cpp:119 ClientModel::getBestBlockHash(): lock m_cached_tip_mutex
  qt/clientmodel.cpp:121 ClientModel::getBestBlockHash(): call m_node.getBestBlockHash()
  interfaces/node.cpp:200 NodeImpl::getBestBlockHash(): lock cs_main
  ```

  From `debug.log`:

  ```
  POTENTIAL DEADLOCK DETECTED
  Previous lock order was:
   m_cs_chainstate validation.cpp:2851
   (1) cs_main validation.cpp:2868
   ::mempool.cs validation.cpp:2868
   (2) clientmodel->m_cached_tip_mutex qt/clientmodel.cpp:255
  Current lock order is:
   (2) m_cached_tip_mutex qt/clientmodel.cpp:119
   (1) ::cs_main interfaces/node.cpp:200
  ```

  The possible deadlock was introduced in bitcoin#17993

ACKs for top commit:
  jonasschnelli:
    Tested ACK f46b678

Tree-SHA512: 904f24b39bdc97c4d0ecb897a6980d8d479814535eb167e23105238800ea2f1f85273e3370cf894db58bc597f94c4f2e81fb68d0ff3362d468c16af5ce8f5d78
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Feb 23, 2021
Summary:
PR description
> Rationale:
>
> The height based polling in the GUI is an issue on chain reorgs. Any new tip signal with the same height as the one that it's cached in the model was not triggering the GUI update (interpreting it as the same same block when it could receive a different one). Ending up with bad information presented in the GUI.
>
> This PR essentially changes the last cached height to be a last cached block hash.

bitcoin/bitcoin@2f86720
> [ClientModel] best header/block hash cached.

bitcoin/bitcoin@f46b678
> qt: lock cs_main, m_cached_tip_mutex in that order
>
> Always lock the mutexes `cs_main` and `m_cached_tip_mutex` in
> the same order: `cs_main`, `m_cached_tip_mutex`. Otherwise we may end up
> in a deadlock.
>
> `ClientModel::m_cached_tip_blocks` is protected by
> `ClientModel::m_cached_tip_mutex`. There are two access paths that
> lock the two mutexes in opposite order:
>
> ```
> validation.cpp:2868 CChainState::ActivateBestChain(): lock cs_main
> validation.cpp:2916 CChainState::ActivateBestChain(): call uiInterface.NotifyBlockTip()
> ui_interface.cpp:52 CClientUIInterface::NotifyBlockTip(): go deep in boost
> ...
> qt/clientmodel.cpp:255 BlockTipChanged(): lock m_cached_tip_mutex
> ```
>
> and
>
> ```
> qt/clientmodel.cpp:119 ClientModel::getBestBlockHash(): lock m_cached_tip_mutex
> qt/clientmodel.cpp:121 ClientModel::getBestBlockHash(): call m_node.getBestBlockHash()
> interfaces/node.cpp:200 NodeImpl::getBestBlockHash(): lock cs_main
> ```
>
> From `debug.log`:
>
> ```
> POTENTIAL DEADLOCK DETECTED
> Previous lock order was:
>  m_cs_chainstate validation.cpp:2851
>  (1) cs_main validation.cpp:2868
>  ::mempool.cs validation.cpp:2868
>  (2) clientmodel->m_cached_tip_mutex qt/clientmodel.cpp:255
> Current lock order is:
>  (2) m_cached_tip_mutex qt/clientmodel.cpp:119
>  (1) ::cs_main interfaces/node.cpp:200
> ```
>
> The possible deadlock was introduced in
> bitcoin/bitcoin#17993

This is a backport of Core [[bitcoin/bitcoin#17993 | PR17993]] [1/2] and [[bitcoin/bitcoin#19132 | PR19132]]

Test Plan:
`ninja && src/qt/bitcoin-qt`
Send a transaction, wait for a few block and check that the tx status is correctly updated in the interface.

Reviewers: #bitcoin_abc, majcosta

Reviewed By: #bitcoin_abc, majcosta

Subscribers: majcosta

Differential Revision: https://reviews.bitcoinabc.org/D9253
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants