-
Notifications
You must be signed in to change notification settings - Fork 38.6k
Fix some concurrency issues in ActivateBestChain() #13023
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jamesob
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK, just a few nits. Nice test!
src/Makefile.test.include
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there's an alphabetic ordering here that may be good to preserve.
src/test/scheduler_tests.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be required here
src/test/validation_block_tests.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be worth parameterizing Block with a bool make_invalid option to avoid the duplication here, but that's your call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extracted into FinalizeBlock - its probably worth refactoring BlockAssembler into a builder style class - would make unit tests much easier to write ... maybe one day
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, definitely agree.
src/test/validation_block_tests.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we don't have columnar limits in the styleguide, but this line's pretty long...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree with you - i originally had this on two lines - and then the linter put it all back on one
src/test/validation_block_tests.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Braces needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-format would put those in I believe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it doesn't! but i put them in now
src/test/validation_block_tests.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling this ignored might be more straightforward
980f3ec to
8c09051
Compare
|
Updated to address reviewer comments and fix a bug that this test seems to have uncovered |
ryanofsky
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 6094ce7. Fun tests!
src/validation.cpp
Outdated
| if (pindexMostWork == nullptr) { | ||
| pindexMostWork = FindMostWorkChain(); | ||
| } | ||
| pindexMostWork = FindMostWorkChain(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to add a comment here saying why it's important to recompute pindexMostWork each loop iteration (how sync could get stuck otherwise).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might cause substantial slow-down during IBD or reindex, because currently FindMostWorkChain walks from the candidate tip to the fork point on the current chain to ensure none of the blocks are invalid. We may be able to optimize that, but we also might want to consider another solution.
src/test/validation_block_tests.cpp
Outdated
| pubKey << i++ << OP_TRUE; | ||
|
|
||
| auto ptemplate = BlockAssembler(Params()).CreateNewBlock(pubKey, false); | ||
| CBlock* pblock = new CBlock(ptemplate->block); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd think this could be written more simply as:
auto block = make_shared<CBlock>(ptemplate->block);
block->hashPrevBlock = ...
return block;
It looks like pblock is leaked currently, or is this not the case?
|
Thanks for opening this PR. I thought it would be helpful for other reviewers to clarify the (IMO significant) bug here, so that we can properly evaluate fixes. If ActivateBestChain is invoked simultaneously in separate threads, then we can end up at a lower-work tip, and remain stuck until the next block comes in.
Some travis failures have been observed due to this bug, as seen for instance here: https://travis-ci.org/bitcoin/bitcoin/jobs/370848272. The test that sometimes fails, |
sdaftuar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work finding this issue and good start on the new test. I haven't finished my review yet, but here's my initial feedback.
src/validation.cpp
Outdated
| if (pindexMostWork == nullptr) { | ||
| pindexMostWork = FindMostWorkChain(); | ||
| } | ||
| pindexMostWork = FindMostWorkChain(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might cause substantial slow-down during IBD or reindex, because currently FindMostWorkChain walks from the candidate tip to the fork point on the current chain to ensure none of the blocks are invalid. We may be able to optimize that, but we also might want to consider another solution.
src/test/validation_block_tests.cpp
Outdated
| auto pblock = Block(prev_hash); | ||
|
|
||
| // a second coinbase will make this block invalid | ||
| pblock->vtx.push_back(pblock->vtx[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method of constructing a bad block is not ideal, because this kind of invalidity is detected well before ConnectBlock. In order to get better code coverage, I think it would be better to generate different kinds of invalid blocks, to test failure at different points in the validation process.
(As it is, I believe this method of making an invalid block is detectable in CheckBlock (merkle-tree fails validation due to duplicate transactions). It would also fail in CheckBlock for having more than one coinbase, and it would fail in ContextualCheckBlock for having an invalid witness commitment.)
|
I have a branch where I changed the test you wrote to produce invalid blocks that would be detected in ConnectBlock, and where I have an alternate fix for the bug you found here, as well as a fix for the bug I mentioned in #13092: https://github.com/sdaftuar/bitcoin/commits/2018-04-alternate-abc-fix. For fixing the bug here, rather than invoke I compared this approach vs master for doing a reindex on my workstation, and saw < 3% slowdown (some slowdown was expected due to checking setBlockIndexCandidates each time). |
|
Nice - thanks for this. I wonder if it might be a bit more readable to explore feasibility of caching results internally in findmostworkchain instead of doing it in the activatebestchain loop - the control flow there is getting harder and harder to follow. I'm happy to explore that. |
|
@TheBlueMatt and I discussed that idea offline (of rewriting |
|
@sdaftuar - pulled in your commits. Its still hard for me to convince myself that the set of conditions you've defined are the only cases in which Just wanted to throw out one alternate way to do determine when to refresh most work tip that's (imo) a bit easier to convince myself behaves correctly. Instead of caching |
|
Copy of travis output: |
|
From some quick web searching, it seems that BOOST_CHECK may not be thread safe, so invoking it in each thread is likely the problem here. Perhaps we should just |
|
I do occasionally see this failure when I run validation_block_tests in a tight loop (after fixing the BOOST_CHECK issue above): Edit: perhaps this is also a threading issue though? |
|
Hmmm thats the genesis block which connects through a different path I believe |
|
@sdaftuar - thats the genesis block getting activated. Feels like it may be more of a test artefact (wasn't there a recent change that randomizes test case execution order) than a threading issue? I think solution for that is to just call PNB(genesis) at the very beginning. |
|
Backported in #13317 |
Technically, some internal datastructures may be in an inconsistent state if we do this, though there are no known bugs there. Still, for future safety, its much better to only unlock cs_main if we've made progress (not just tried a reorg which may make progress). Github-Pull: bitcoin#13023 Rebased-From: ecc3c4a
If multiple threads are invoking ActivateBestChain, it was possible to have them working towards different tips, and we could arrive at a less work tip than we should. Fix this by introducing a ChainState lock which must be held for the entire duration of ActivateBestChain to enforce exclusion in ABC. Github-Pull: bitcoin#13023 Rebased-From: a3ae8e6
After a recent bug discovered in callback ordering in MainSignals, this test checks invariants in ordering of BlockConnected / BlockDisconnected / UpdatedChainTip signals Github-Pull: bitcoin#13023 Rebased-From: dd435ad
…nt the memory model cbeaa91 Update ValidationInterface() documentation to explicitly specify threading and memory model (Jesse Cohen) b296b42 Update documentation for SingleThreadedSchedulerClient() to specify the memory model (Jesse Cohen) 9994d01 Add Unit Test for SingleThreadedSchedulerClient (Jesse Cohen) Pull request description: As discussed in #13023 I've split this test out into a separate pr This test (and documentation update) makes explicit the guarantee (previously undefined, but implied by the 'SingleThreaded' in `SingleThreadedSchedulerClient()`) - that callbacks pushed to the `SingleThreadedSchedulerClient()` obey the single threaded model for memory and execution - specifically, the callbacks are executed fully and in order, and even in cases where a subsequent callback is executed by a different thread, sequential consistency of memory for all threads executing these callbacks is maintained. Maintaining memory consistency should make the api more developer friendly - especially for users of the validationinterface. To the extent that there are performance implications from this decision, these are not currently present in practice because all use of this scheduler happens on a single thread currently, furthermore the lock should guarantee consistency across callback executions even when callbacks are executed by multiple threads (as the test does). Tree-SHA512: 5d95a7682c402e5ad76b05bc9dfbca99ca64105f62ab9e78f6fc0f6ea8c5277aa399fbb94298e35cc677b0c2181ff17259584bb7ae230e38aa68b85ecbc22856
… document the memory model cbeaa91 Update ValidationInterface() documentation to explicitly specify threading and memory model (Jesse Cohen) b296b42 Update documentation for SingleThreadedSchedulerClient() to specify the memory model (Jesse Cohen) 9994d01 Add Unit Test for SingleThreadedSchedulerClient (Jesse Cohen) Pull request description: As discussed in bitcoin#13023 I've split this test out into a separate pr This test (and documentation update) makes explicit the guarantee (previously undefined, but implied by the 'SingleThreaded' in `SingleThreadedSchedulerClient()`) - that callbacks pushed to the `SingleThreadedSchedulerClient()` obey the single threaded model for memory and execution - specifically, the callbacks are executed fully and in order, and even in cases where a subsequent callback is executed by a different thread, sequential consistency of memory for all threads executing these callbacks is maintained. Maintaining memory consistency should make the api more developer friendly - especially for users of the validationinterface. To the extent that there are performance implications from this decision, these are not currently present in practice because all use of this scheduler happens on a single thread currently, furthermore the lock should guarantee consistency across callback executions even when callbacks are executed by multiple threads (as the test does). Tree-SHA512: 5d95a7682c402e5ad76b05bc9dfbca99ca64105f62ab9e78f6fc0f6ea8c5277aa399fbb94298e35cc677b0c2181ff17259584bb7ae230e38aa68b85ecbc22856
…tChain + optimization 50dbec5 Add unit tests for signals generated by ProcessNewBlock() (furszy) f68251d Validation: rename one of the two instances using "bad-prevblk" to its correct description of "prevblk-not-found" (furszy) a51a755 Fix concurrency-related bugs in ActivateBestChain (Jesse Cohen) 198f435 Do not unlock cs_main in ABC unless we've actually made progress. (Matt Corallo) 8d15cf5 Optimize ActivateBestChain for long chains (Pieter Wuille) 8640be1 Fix fast-shutdown crash if genesis block was not loaded (Matt Corallo) ef24337 Hold cs_main while calling UpdatedBlockTip() and ui.NotifyBlockTip (Jesse Cohen) f9d2ab3 Update ValidationInterface() documentation to explicitly specify threading and memory model (Jesse Cohen) cb9bb25 Update documentation for SingleThreadedSchedulerClient() to specify the memory model (Jesse Cohen) 4ea2048 Add Unit Test for SingleThreadedSchedulerClient (Jesse Cohen) Pull request description: Made some back ports and adaptations to validate further the work introduced in #2203 and #2209. Now that the scheduler thread is processing most of the chain events, the callbacks execution order must be verified to dispatch the events serially, ensuring a consistent memory state in each event processing invocation. There are some concurrency issues solved, as well a an optimization for larger chains for the ABC workflow. Each commit details them well. List: bitcoin#7917 (only fb8fad1) bitcoin#12988 bitcoin#13023 bitcoin#13247 bitcoin#13835 This is built on top of #2284 (and the reason that made me notice that problem). So, 2284 comes first, then this one. ACKs for top commit: Fuzzbawls: ACK 50dbec5 random-zebra: ACK 50dbec5 and merging... Tree-SHA512: 573144cc96d2caa49ed2f0887dc8c53b5aca0efd54b6a25626063ccb780da426f56b3c6b7491648e7abefc1960c82b1354a4ff2c51425bfe99adaa4394562608
Technically, some internal datastructures may be in an inconsistent state if we do this, though there are no known bugs there. Still, for future safety, its much better to only unlock cs_main if we've made progress (not just tried a reorg which may make progress). zcash: cherry picked from commit ecc3c4a zcash: bitcoin/bitcoin#13023
If multiple threads are invoking ActivateBestChain, it was possible to have them working towards different tips, and we could arrive at a less work tip than we should. Fix this by introducing a ChainState lock which must be held for the entire duration of ActivateBestChain to enforce exclusion in ABC. zcash: cherry picked from commit a3ae8e6 zcash: bitcoin/bitcoin#13023
Technically, some internal datastructures may be in an inconsistent state if we do this, though there are no known bugs there. Still, for future safety, its much better to only unlock cs_main if we've made progress (not just tried a reorg which may make progress). zcash: cherry picked from commit ecc3c4a zcash: bitcoin/bitcoin#13023
If multiple threads are invoking ActivateBestChain, it was possible to have them working towards different tips, and we could arrive at a less work tip than we should. Fix this by introducing a ChainState lock which must be held for the entire duration of ActivateBestChain to enforce exclusion in ABC. zcash: cherry picked from commit a3ae8e6 zcash: bitcoin/bitcoin#13023
dd435ad Add unit tests for signals generated by ProcessNewBlock() (Jesse Cohen) a3ae8e6 Fix concurrency-related bugs in ActivateBestChain (Jesse Cohen) ecc3c4a Do not unlock cs_main in ABC unless we've actually made progress. (Matt Corallo) Pull request description: Originally this PR was just to add tests around concurrency in block validation - those tests seem to have uncovered another bug in ActivateBestChain - this now fixes that bug and adds tests. ActivateBestChain (invoked after a new block is validated) proceeds in steps - acquiring and releasing cs_main while incrementally disconnecting and connecting blocks to sync to the most work chain known (FindMostWorkChain()). Every time cs_main is released the result of FindMostWorkChain() can change - but currently that value is cached across acquisitions of cs_main and only refreshed when an invalid chain is explored. It needs to be refreshed every time cs_main is reacquired. The test added in bitcoin@6094ce7 will occasionally fail without the commit fixing this issue bitcoin@26bfdba Original description below -- After a bug discovered where UpdatedBlockTip() notifications could be triggered out of order (bitcoin#12978), these unit tests check certain invariants about these signals. The scheduler test asserts that a SingleThreadedSchedulerClient processes callbacks fully and sequentially. The block validation test generates a random chain and calls ProcessNewBlock from multiple threads at random and in parallel. ValidationInterface callbacks verify that the ordering of BlockConnected BlockDisconnected and UpdatedBlockTip events occur as expected. Tree-SHA512: 4102423a03d2ea28580c7a70add8a6bdb22ef9e33b107c3aadef80d5af02644cdfaae516c44933924717599c81701e0b96fbf9cf38696e9e41372401a5ee1f3c
Originally this PR was just to add tests around concurrency in block validation - those tests seem to have uncovered another bug in ActivateBestChain - this now fixes that bug and adds tests.
ActivateBestChain (invoked after a new block is validated) proceeds in steps - acquiring and releasing cs_main while incrementally disconnecting and connecting blocks to sync to the most work chain known (FindMostWorkChain()). Every time cs_main is released the result of FindMostWorkChain() can change - but currently that value is cached across acquisitions of cs_main and only refreshed when an invalid chain is explored. It needs to be refreshed every time cs_main is reacquired. The test added in 6094ce7 will occasionally fail without the commit fixing this issue 26bfdba
Original description below
After a bug discovered where UpdatedBlockTip() notifications could be triggered out of order (#12978), these unit tests check certain invariants about these signals.
The scheduler test asserts that a SingleThreadedSchedulerClient processes callbacks fully and sequentially.
The block validation test generates a random chain and calls ProcessNewBlock from multiple threads at random and in parallel. ValidationInterface callbacks verify that the ordering of BlockConnected BlockDisconnected and UpdatedBlockTip events occur as expected.