-
Notifications
You must be signed in to change notification settings - Fork 38.6k
kernel: Flush in ChainstateManager destructor #31382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31382. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please copy-paste ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
|
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
l0rinc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
Closed #31362 in favor of this
6d3ec7a to
90f5a20
Compare
|
Seems like I am running into this here: #25073 (comment), maybe time to introduce a replace mempool method like Carl suggested there? |
9425bf5 to
6c3bc47
Compare
Did this now, ready for review. |
stickies-v
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
Simplifies the shutdown logic, and makes ChainstateManager more self contained. I'll need to think a bit more about how to test or verify the safety of changes such as in 7a37f20.
src/test/util/setup_common.cpp
Outdated
| m_node.mempool = std::make_unique<CTxMemPool>(MemPoolOptionsForTest(m_node), error); | ||
| auto& dummy_chainstate{static_cast<DummyChainState&>(m_node.chainman->ActiveChainstate())}; | ||
| dummy_chainstate.SetMempool(m_node.mempool.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would make sense to (probably in a separate PR, I suspect it'll be quite a big change) remove the Chainstate::m_mempool member. It feels like a footgunny shortcut (but need to investigate more), conceptually doesn't make a lot of sense to me, and not having it would avoid having to do e.g. theDummyChainState gymnastics. Is this something you've explored already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not come up with a satisfactory alternative yet, suggestions welcome!
| m_node.chainman.reset(); | ||
| // Process all callbacks referring to the old manager before creating a | ||
| // new one. | ||
| m_node.validation_signals->SyncWithValidationInterfaceQueue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do this before calling m_node.chainman.reset();? I'm not sure if any of the callbacks actually depend on the chainman (it doesn't look like it, but I'll need to investigate more), but I think it's a bit more logical at least?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The chainman reset may trigger validation interface notifications, so I think it makes sense to sync with the queue afterwards? Unlike the shutdown procedure, we keep the validation interface around beyond the restarts there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The chainman reset may trigger validation interface notifications
Ah, right, ChainStateFlushed would be called. Yeah, I think with that knowledge doing it afterwards makes more sense. Also, callbacks are processed regardless of SyncWithValidationInterfaceQueue() being called, so I suppose this doesn't really change anything wrt callbacks assuming m_node.chainman exists.
| for (Chainstate* cs : chainman.GetAll()) { | ||
| LOCK(::cs_main); | ||
| cs->ForceFlushStateToDisk(); | ||
| } | ||
| // Process all callbacks referring to the old manager before wiping it. | ||
| m_node.validation_signals->SyncWithValidationInterfaceQueue(); | ||
| LOCK(::cs_main); | ||
| chainman.ResetChainstates(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the docstring is now a bit out of date:
// Simulate a restart of the node by flushing all state to disk, clearing the
// existing ChainstateManager, and unloading the block index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fine as is? I'd prefer not to document these deeper behaviours, that the caller can expect to "just work".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I meant to to remove the flushing reference since that's now automatically handled by clearing chainman, so I agree with your comment.
Simulate a restart of the node by clearing the existing ChainstateManager, and unloading the block index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do that next time I have to push.
6c3bc47 to
8e86e77
Compare
|
Thank you for the review @stickies-v, Updated 6c3bc47 -> 8e86e77 (chainman_flush_destructor_0 -> chainman_flush_destructor_1, compare)
|
mzumsande
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
| LOCK(::cs_main); | ||
| for (Chainstate* chainstate : GetAll()) { | ||
| if (chainstate->CanFlushToDisk()) { | ||
| chainstate->ForceFlushStateToDisk(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not too familiar with the shutdown sequence - I don't fully understand why we need to flush the chainstates twice.
Do you know any examples of events that result in the second flush on destruction doing any actual work in bitcoind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIU the second flush is the original one, while the first flush was introduced in 3192975 to help cleanly process all remaining callbacks. Maybe it could have removed the second flush in that commit?
That said I think it is good practice to be defensive here and keep both.
|
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
8e86e77 to
fe7d09b
Compare
|
Rebased 6137cec -> 0adf8761c1f8860eb89f6162d7eec47f15a4c01b (chainman_flush_destructor_7 -> chainman_flush_destructor_8, compare)
|
53abfae to
01a845e
Compare
acc96a4 to
d9cc26d
Compare
d9cc26d to
ba29aaf
Compare
|
Rebased d9cc26d -> ba29aaf (chainman_flush_destructor_8 -> chainman_flush_destructor_9, compare) |
I don't think this is accurate? |
Thanks, updated. |
Next to being more consistent, since tearing down the mempool first leaves a dangling pointer in the chainman, this also enables the changes in the next commits, since flushing calls GetCoinsCacheSizeState, which would access this dangling pointer otherwise.
This is done in preparation for the next commit to also change the mempool pointer within the chainstate once the old mempool is reset. Without this the old, dangling pointer will be reused when destructing the chainstate manager.
Moving the second flush out of the Shutdown function into the ChainstateManager destructor, should be safe since there should be no new events created after the first flush. This simplifies the shutdown code a bit and allows for getting rid of the `goto` in bitcoin-chainstate.
Also remove the reference to the destructed chainman, which could cause UB once the chainman is reset.
Now that its only callsite has been moved to the destructor, let the smart pointer do its thing.
There is no need to flush the background callbacks, since the immediate task runner is used for the validation signals, which does not have any pending callbacks. This allows for removal of the code in the destructor.
ba29aaf to
8ad5e19
Compare
|
Rebased ba29aaf -> 8ad5e19 (chainman_flush_destructor_9 -> chainman_flush_destructor_10, compare)
|
This simplifies the shutdown code a bit and reduces the amount of extra logic required in kernel/bitcoinkernel.cpp to tear down the chainman.
Moving the second flush out of the Shutdown function into the ChainstateManager destructor, should be safe since there should be no new events created after the first flush. Tweak the chainman tests to actually exercise this new tear down behaviour.
This PR is part of the libbitcoinkernel project