Skip to content

Conversation

@sipa
Copy link
Member

@sipa sipa commented Feb 14, 2019

This PR makes a number of improvements to the InvalidateBlock (invalidateblock RPC) and RewindBlockIndex functions, primarily around breaking up their long-term cs_main holding. In addition:

  • They're made safely interruptible (bitcoind can be shutdown, and no progress in either will be lost, though if incomplete, invalidateblock won't continue after restart and will need to be called again)
  • The validation queue is prevented from overflowing (meaning invalidateblock on a very old block will not drive bitcoind OOM) (see Unbounded growth of scheduler queue #14289).
  • invalidateblock won't bother to move transactions back into the mempool after 10 blocks (optimization).

This is not an optimal solution, as we're relying on the scheduler call sites to make sure the scheduler doesn't overflow. Ideally, the scheduler would guarantee this directly, but that needs a few further changes (moving the signal emissions out of cs_main) to prevent deadlocks.

I have manually tested the invalidateblock changes (including interrupting, and running with -checkblockindex and -checkmempool), but haven't tried the rewinding (which is probably becoming increasingly unnecessary, as very few pre-0.13.1 nodes remain that would care to upgrade).

@sipa sipa force-pushed the 201902_limitrewindinvalidate branch 3 times, most recently from d6ba031 to 01b0066 Compare February 14, 2019 02:08
@DrahtBot
Copy link
Contributor

DrahtBot commented Feb 14, 2019

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #15494 (rpc: Return whether the block was invalidated on invalidateblock by promag)
  • #15305 ([validation] Crash if disconnecting a block fails by sdaftuar)
  • #14193 (validation: Add missing mempool locks by MarcoFalke)
  • #13582 (Extract AppInitLoadBlockIndex from AppInitMain by Empact)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@laanwj laanwj added this to the 0.18.0 milestone Feb 14, 2019
@sipa sipa force-pushed the 201902_limitrewindinvalidate branch 2 times, most recently from 3ef4214 to 3af503b Compare February 14, 2019 23:37
maflcko pushed a commit to maflcko/bitcoin-core that referenced this pull request Feb 16, 2019
…ing destructor

fab6b07 test: txindex: interrupt threadGroup before calling destructor (MarcoFalke)

Pull request description:

  Fixes the data races with the tread sanitizer such as

  * https://travis-ci.org/MarcoFalke/bitcoin/jobs/492330554
  * bitcoin#15402 (comment)
  * ...

Tree-SHA512: 40608c70d92a1dd68efc1d41eecc8e2fb7738508e21f91f0ad353adcceed60fa624f15bf72a5b69a9444157b261183abbe9fc4cc5dd8aebc1c49506b239e8e88
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "Release cs_main during RewindBlockIndex operation" (8994e394c0b4e76616c10acdd309d0225eabad9e)

Could you add some rationale in this comment saying why this is done here? Previous behavior of erasing in one place instead of two places seems simpler. Is this an optimization? Is it needed for correctness for some reason now that cs_main is released?

Copy link
Member Author

@sipa sipa Feb 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved to to a separate step instead of lumping it together with the nHeight calculation loop.

There's no need for it to be done while holding the lock. The reason it's separate is because the disconnect/erase of active blocks should be done simultaneously (so that things get erased even when interrupted, and erasing can't be done before disconnecting), but for non-mainchain blocks we need a separate loop still.

I've added some comments to elaborate, but I'll leave it to you to comment further or marking it as resolved.

Copy link
Member

@Sjors Sjors left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested 3af503b on macOS 10.14.3 by calling invalidateblock 0000000000000000001c8018d9cb3b742ef25114f27563e3fc4a1902167f9893 (SegWit activation at height 481,824) from inside the QT console, without setting -dbcache.

On master memory usage exceeds 10 GB by block 558,000. It also required a force quit to abort.

With this PR memory stays well below 1 GB and I'm able to gracefully exit during this. It then marks the most recent block as invalid, so when you restart it doesn't suddenly start syncing to the tip again. For a user upgrading from pre-segwit, would they have to manually restart the process by calling reconsiderblock [...]?

Rewinding to SegWit activation took 7 hours on a one year old iMac. At this point it's probably faster to just wipe the chain and sync from scratch, but that's not for this PR. QT sync window won't update during this, but I can live with that (something to solve for the next soft fork...).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This note seems out of place now; it used to be above if (IsWitnessEnabled(pindexIter->pprev, params.GetConsensus()) && !(pindexIter->nStatus & BLOCK_OPT_WITNESS) && !chainActive.Contains(pindexIter))

sipa added 2 commits February 24, 2019 12:38
Note that the former 'else' branch in RewindBlockIndex is now
dealt with more naturally inside the EraseBlockData call (by
checking whether the parent needs to be re-added as candidate
after deleting a child).
This lets us simplify the iteration to just walking back in the chain,
rather than looping over all of mapBlockIndex.
@sipa sipa force-pushed the 201902_limitrewindinvalidate branch from 3af503b to 7bcac88 Compare February 24, 2019 21:31
@sipa sipa force-pushed the 201902_limitrewindinvalidate branch from 7bcac88 to d4ad413 Compare February 25, 2019 02:43
@sipa sipa force-pushed the 201902_limitrewindinvalidate branch from d4ad413 to ac64328 Compare February 25, 2019 02:55
Copy link
Member

@Sjors Sjors left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tACK ac64328

Note that QT doesn't update the block count (etc) as this unwinding happens, although it does update the mempool count. I can live with that for now.
schermafbeelding 2019-02-25 om 13 35 52

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK ac64328b96fbe7475d2b12a9a0c2ba4787267204

I left some feedback in earlier commits, but then it solved itself after reading the later commits. Review felt like reading a story that tells itself. Would review again 10/10

@promag
Copy link
Contributor

promag commented Feb 27, 2019

Tested:

  • concurrent invalidateblock calls
  • shutdown during invalidateblock

After these tests I felt that invalidateblock could return whether the invalidation finished or not, or other meaningful data (other PR).

However when testing concurrent invalidateblock and reconsiderblock, got the following assertion on start:

...
2019-02-27T11:53:54Z Checking all blk files are present...
2019-02-27T11:53:54Z Opening LevelDB in /Users/joao/Library/Application Support/Bitcoin/regtest/chainstate
2019-02-27T11:53:54Z Opened LevelDB successfully
2019-02-27T11:53:54Z Using obfuscation key for /Users/joao/Library/Application Support/Bitcoin/regtest/chainstate: 326eeb3c93ca8734
Assertion failed: (!setBlockIndexCandidates.empty()), function PruneBlockIndexCandidates, file validation.cpp, line 2534.

To reproduce apply the patch

--- a/src/validation.cpp
+++ b/src/validation.cpp
@@ -2840,6 +2840,8 @@ bool CChainState::InvalidateBlock(CValidationState& state, const CChainParams& c
         to_mark_failed = invalid_walk_tip;
     }

+    MilliSleep(10000);
+
     {
         // Mark pindex itself as invalid, regardless of whether it was in the main chain or not.
         LOCK(cs_main);

Then:

  • invalidateblock <old block hash>
  • during the above sleep call reconsiderblock <previous tip>
  • restart bitcoind.

@promag
Copy link
Contributor

promag commented Feb 27, 2019

Conceptually, should InvalidateBlock interrupt/error if it disconnects the same tip twice? Or should invalidateblock and reconsiderblock be exclusive? Something to think and probably tackle in a different PR.

@promag
Copy link
Contributor

promag commented Feb 28, 2019

@sipa I did that, sleeps all over the place, and that case sounded like scratching the chain. I'll repeat the tests.

@sipa
Copy link
Member Author

sipa commented Feb 28, 2019

@promag An alternative is only sleeping when "disconnected==1" (which would be in the middle for any invalidate that's longer than 1 deep, but not sleep all the time).

It's possible that the resulting effect is a reconsider and invalidate that start fighting with eachother and disconnect/connect the same blocks over and over again. It's not too hard to detect this scenario inside invalidateblock and bail out, but I'd rather not overload the change here further with such edge cases.

Copy link
Member

@Sjors Sjors left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 8d22041. Several improvements in 9bb32eb Release cs_main during InvalidateBlock iterations since my previous tACK. It's more readable, e.g thanks to the additional variable to_mark_failed.

In the event this PR needs further changes (hopefully not) consider splitting that commit; moving LOCK(cs_main); into the loop and adding if (ShutdownRequested()) break; vs. everything else.

I would also prefer dealing with edge cases with concurrent RPC calls in a later PR. We can point out in the release note that it's currently not 100% safe to do that.

setBlockIndexCandidates.erase(pindex);
m_failed_blocks.insert(pindex);
{
// Mark pindex (or the last disconnected block) as invalid, regardless of whether it was in the main chain or not.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, move this comment below the interference check, replace "something is interfering" with "some other thread may have interfered before we reestablished the cs_main lock".

@gmaxwell
Copy link
Contributor

gmaxwell commented Mar 2, 2019

ACK

LOCK(cs_main);
if (chainActive.Contains(to_mark_failed)) {
// If the to-be-marked invalid block is in the active chain, something is interfering and we can't proceed.
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that return value is not used.

@sipa
Copy link
Member Author

sipa commented Mar 3, 2019

I added an extra commit that solves the situation where if you used invalidateblock, had a cache flush before it completed, then had your node crash, and then restarted with -checkblockindex would assert with a state corruption.

I've now tested a few more invalidateblock scenarios (with -checkblockindex and -checkmempool enabled), including interrupting it in the middle, killing/crashing it, and running reconsiderblock in parallel. They all seem to work fine. The RewindBlockIndex changes are untested still.

@gmaxwell
Copy link
Contributor

gmaxwell commented Mar 4, 2019

re-ACK

@sipa
Copy link
Member Author

sipa commented Mar 6, 2019

I think this is ready.

@laanwj laanwj merged commit 519b0bc into bitcoin:master Mar 7, 2019
laanwj added a commit that referenced this pull request Mar 7, 2019
519b0bc Make last disconnected block BLOCK_FAILED_VALID, even when aborted (Pieter Wuille)
8d22041 Optimization: don't add txn back to mempool after 10 invalidates (Pieter Wuille)
9ce9c37 Prevent callback overruns in InvalidateBlock and RewindBlockIndex (Pieter Wuille)
9bb32eb Release cs_main during InvalidateBlock iterations (Pieter Wuille)
9b1ff5c Call InvalidateBlock without cs_main held (Pieter Wuille)
241b2c7 Make RewindBlockIndex interruptible (Pieter Wuille)
880ce7d Call RewindBlockIndex without cs_main held (Pieter Wuille)
436f7d7 Release cs_main during RewindBlockIndex operation (Pieter Wuille)
1d34287 Merge the disconnection and erasing loops in RewindBlockIndex (Pieter Wuille)
32b2696 Move erasure of non-active blocks to a separate loop in RewindBlockIndex (Pieter Wuille)
9d6dcc5 Abstract EraseBlockData out of RewindBlockIndex (Pieter Wuille)

Pull request description:

  Same repo and branch like  "Granular invalidateblock and RewindBlockIndex #15402 ", can be merged as is.

  This saves us all the cherry-picks and review of the backport cherry-picks.

Tree-SHA512: 20c27c5f807c3d85e0072f9e2cdefad4ad7d329d6b26658a00844d5fcf0ed729059daf765e04e6382db2b5915117b15949cd4989d864917ab105c92e2e5e9986
laanwj added a commit that referenced this pull request Mar 7, 2019
519b0bc Make last disconnected block BLOCK_FAILED_VALID, even when aborted (Pieter Wuille)
8d22041 Optimization: don't add txn back to mempool after 10 invalidates (Pieter Wuille)
9ce9c37 Prevent callback overruns in InvalidateBlock and RewindBlockIndex (Pieter Wuille)
9bb32eb Release cs_main during InvalidateBlock iterations (Pieter Wuille)
9b1ff5c Call InvalidateBlock without cs_main held (Pieter Wuille)
241b2c7 Make RewindBlockIndex interruptible (Pieter Wuille)
880ce7d Call RewindBlockIndex without cs_main held (Pieter Wuille)
436f7d7 Release cs_main during RewindBlockIndex operation (Pieter Wuille)
1d34287 Merge the disconnection and erasing loops in RewindBlockIndex (Pieter Wuille)
32b2696 Move erasure of non-active blocks to a separate loop in RewindBlockIndex (Pieter Wuille)
9d6dcc5 Abstract EraseBlockData out of RewindBlockIndex (Pieter Wuille)

Pull request description:

  This PR makes a number of improvements to the InvalidateBlock (`invalidateblock` RPC) and RewindBlockIndex functions, primarily around breaking up their long-term cs_main holding. In addition:
  * They're made safely interruptible (`bitcoind` can be shutdown, and no progress in either will be lost, though if incomplete, `invalidateblock` won't continue after restart and will need to be called again)
  * The validation queue is prevented from overflowing (meaning `invalidateblock` on a very old block will not drive bitcoind OOM) (see #14289).
  * `invalidateblock` won't bother to move transactions back into the mempool after 10 blocks (optimization).

  This is not an optimal solution, as we're relying on the scheduler call sites to make sure the scheduler doesn't overflow. Ideally, the scheduler would guarantee this directly, but that needs a few further changes (moving the signal emissions out of cs_main) to prevent deadlocks.

  I have manually tested the `invalidateblock` changes (including interrupting, and running with -checkblockindex and -checkmempool), but haven't tried the rewinding (which is probably becoming increasingly unnecessary, as very few pre-0.13.1 nodes remain that would care to upgrade).

Tree-SHA512: 692e42758bd3d3efc2eb701984a8cb5db25fbeee32e7575df0183a00d0c2c30fdf72ce64c7625c32ad8c8bdc56313da72a7471658faeb0d39eefe39c4b8b8474
jasonbcox pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Aug 21, 2020
…indBlockIndex

Summary:
bitcoin/bitcoin@9ce9c37

---

This thing has been merged (D4803) and reverted (D4959) before. Seems to be working correctly now, with the lock to m_cs_chainstate in ActivateBestChain having a corresponding one in UnwindBlock.

Concludes backport of Core [[bitcoin/bitcoin#15402 | PR15402]]

Test Plan:
  cmake -GNinja -DCMAKE_BUILD_TYPE=Debug
  ninja check check-functional

also

build without this patch, invalidate block #6 or similarly deep, see that bitcoind memory usage just keeps going up, even past 30% on my 16GB machine. build with this patch and do the same, memory usage stays steady at ~12% no matter how far you disconnect blocks.

Reviewers: #bitcoin_abc, jasonbcox, deadalnix

Reviewed By: #bitcoin_abc, jasonbcox, deadalnix

Subscribers: jasonbcox

Differential Revision: https://reviews.bitcoinabc.org/D7213
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jul 1, 2021
…ing destructor

fab6b07 test: txindex: interrupt threadGroup before calling destructor (MarcoFalke)

Pull request description:

  Fixes the data races with the tread sanitizer such as

  * https://travis-ci.org/MarcoFalke/bitcoin/jobs/492330554
  * bitcoin#15402 (comment)
  * ...

Tree-SHA512: 40608c70d92a1dd68efc1d41eecc8e2fb7738508e21f91f0ad353adcceed60fa624f15bf72a5b69a9444157b261183abbe9fc4cc5dd8aebc1c49506b239e8e88
vijaydasmp pushed a commit to vijaydasmp/dash that referenced this pull request Sep 6, 2021
…ing destructor

fab6b07 test: txindex: interrupt threadGroup before calling destructor (MarcoFalke)

Pull request description:

  Fixes the data races with the tread sanitizer such as

  * https://travis-ci.org/MarcoFalke/bitcoin/jobs/492330554
  * bitcoin#15402 (comment)
  * ...

Tree-SHA512: 40608c70d92a1dd68efc1d41eecc8e2fb7738508e21f91f0ad353adcceed60fa624f15bf72a5b69a9444157b261183abbe9fc4cc5dd8aebc1c49506b239e8e88
UdjinM6 pushed a commit to UdjinM6/dash that referenced this pull request Oct 29, 2021
519b0bc Make last disconnected block BLOCK_FAILED_VALID, even when aborted (Pieter Wuille)
8d22041 Optimization: don't add txn back to mempool after 10 invalidates (Pieter Wuille)
9ce9c37 Prevent callback overruns in InvalidateBlock and RewindBlockIndex (Pieter Wuille)
9bb32eb Release cs_main during InvalidateBlock iterations (Pieter Wuille)
9b1ff5c Call InvalidateBlock without cs_main held (Pieter Wuille)
241b2c7 Make RewindBlockIndex interruptible (Pieter Wuille)
880ce7d Call RewindBlockIndex without cs_main held (Pieter Wuille)
436f7d7 Release cs_main during RewindBlockIndex operation (Pieter Wuille)
1d34287 Merge the disconnection and erasing loops in RewindBlockIndex (Pieter Wuille)
32b2696 Move erasure of non-active blocks to a separate loop in RewindBlockIndex (Pieter Wuille)
9d6dcc5 Abstract EraseBlockData out of RewindBlockIndex (Pieter Wuille)

Pull request description:

  This PR makes a number of improvements to the InvalidateBlock (`invalidateblock` RPC) and RewindBlockIndex functions, primarily around breaking up their long-term cs_main holding. In addition:
  * They're made safely interruptible (`bitcoind` can be shutdown, and no progress in either will be lost, though if incomplete, `invalidateblock` won't continue after restart and will need to be called again)
  * The validation queue is prevented from overflowing (meaning `invalidateblock` on a very old block will not drive bitcoind OOM) (see bitcoin#14289).
  * `invalidateblock` won't bother to move transactions back into the mempool after 10 blocks (optimization).

  This is not an optimal solution, as we're relying on the scheduler call sites to make sure the scheduler doesn't overflow. Ideally, the scheduler would guarantee this directly, but that needs a few further changes (moving the signal emissions out of cs_main) to prevent deadlocks.

  I have manually tested the `invalidateblock` changes (including interrupting, and running with -checkblockindex and -checkmempool), but haven't tried the rewinding (which is probably becoming increasingly unnecessary, as very few pre-0.13.1 nodes remain that would care to upgrade).

Tree-SHA512: 692e42758bd3d3efc2eb701984a8cb5db25fbeee32e7575df0183a00d0c2c30fdf72ce64c7625c32ad8c8bdc56313da72a7471658faeb0d39eefe39c4b8b8474
pravblockc pushed a commit to pravblockc/dash that referenced this pull request Nov 18, 2021
519b0bc Make last disconnected block BLOCK_FAILED_VALID, even when aborted (Pieter Wuille)
8d22041 Optimization: don't add txn back to mempool after 10 invalidates (Pieter Wuille)
9ce9c37 Prevent callback overruns in InvalidateBlock and RewindBlockIndex (Pieter Wuille)
9bb32eb Release cs_main during InvalidateBlock iterations (Pieter Wuille)
9b1ff5c Call InvalidateBlock without cs_main held (Pieter Wuille)
241b2c7 Make RewindBlockIndex interruptible (Pieter Wuille)
880ce7d Call RewindBlockIndex without cs_main held (Pieter Wuille)
436f7d7 Release cs_main during RewindBlockIndex operation (Pieter Wuille)
1d34287 Merge the disconnection and erasing loops in RewindBlockIndex (Pieter Wuille)
32b2696 Move erasure of non-active blocks to a separate loop in RewindBlockIndex (Pieter Wuille)
9d6dcc5 Abstract EraseBlockData out of RewindBlockIndex (Pieter Wuille)

Pull request description:

  This PR makes a number of improvements to the InvalidateBlock (`invalidateblock` RPC) and RewindBlockIndex functions, primarily around breaking up their long-term cs_main holding. In addition:
  * They're made safely interruptible (`bitcoind` can be shutdown, and no progress in either will be lost, though if incomplete, `invalidateblock` won't continue after restart and will need to be called again)
  * The validation queue is prevented from overflowing (meaning `invalidateblock` on a very old block will not drive bitcoind OOM) (see bitcoin#14289).
  * `invalidateblock` won't bother to move transactions back into the mempool after 10 blocks (optimization).

  This is not an optimal solution, as we're relying on the scheduler call sites to make sure the scheduler doesn't overflow. Ideally, the scheduler would guarantee this directly, but that needs a few further changes (moving the signal emissions out of cs_main) to prevent deadlocks.

  I have manually tested the `invalidateblock` changes (including interrupting, and running with -checkblockindex and -checkmempool), but haven't tried the rewinding (which is probably becoming increasingly unnecessary, as very few pre-0.13.1 nodes remain that would care to upgrade).

Tree-SHA512: 692e42758bd3d3efc2eb701984a8cb5db25fbeee32e7575df0183a00d0c2c30fdf72ce64c7625c32ad8c8bdc56313da72a7471658faeb0d39eefe39c4b8b8474
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Dec 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants