Skip to content

Conversation

@sedited
Copy link
Contributor

@sedited sedited commented Nov 27, 2024

This simplifies the shutdown code a bit and reduces the amount of extra logic required in kernel/bitcoinkernel.cpp to tear down the chainman.

Moving the second flush out of the Shutdown function into the ChainstateManager destructor, should be safe since there should be no new events created after the first flush. Tweak the chainman tests to actually exercise this new tear down behaviour.


This PR is part of the libbitcoinkernel project

@DrahtBot
Copy link
Contributor

DrahtBot commented Nov 27, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31382.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept ACK l0rinc, stickies-v, mzumsande

If your review is incorrectly listed, please copy-paste <!--meta-tag:bot-skip--> into the comment that the bot should ignore.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #34132 (refactor: inline CCoinsViewErrorCatcher into CCoinsViewDB by l0rinc)
  • #34075 (fees: Introduce Mempool Based Fee Estimation to reduce overestimation by ismaelsadeeq)
  • #33421 (node: add BlockTemplateCache by ismaelsadeeq)
  • #32427 ((RFC) kernel: Replace leveldb-based BlockTreeDB with flat-file based store by sedited)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@sedited sedited marked this pull request as draft November 27, 2024 13:34
@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed.
Debug: https://github.com/bitcoin/bitcoin/runs/33601554977

Hints

Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:

  • Possibly due to a silent merge conflict (the changes in this pull request being
    incompatible with the current code in the target branch). If so, make sure to rebase on the latest
    commit of the target branch.

  • A sanitizer issue, which can only be found by compiling with the sanitizer and running the
    affected test.

  • An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

Copy link
Contributor

@l0rinc l0rinc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

Closed #31362 in favor of this

@sedited sedited force-pushed the chainman_flush_destructor branch 2 times, most recently from 6d3ec7a to 90f5a20 Compare November 27, 2024 14:21
@sedited
Copy link
Contributor Author

sedited commented Nov 27, 2024

Seems like I am running into this here: #25073 (comment), maybe time to introduce a replace mempool method like Carl suggested there?

@sedited sedited force-pushed the chainman_flush_destructor branch 3 times, most recently from 9425bf5 to 6c3bc47 Compare November 27, 2024 21:10
@sedited
Copy link
Contributor Author

sedited commented Nov 28, 2024

maybe time to introduce a replace mempool method like Carl suggested there?

Did this now, ready for review.

Copy link
Contributor

@stickies-v stickies-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

Simplifies the shutdown logic, and makes ChainstateManager more self contained. I'll need to think a bit more about how to test or verify the safety of changes such as in 7a37f20.

Comment on lines 313 to 342
m_node.mempool = std::make_unique<CTxMemPool>(MemPoolOptionsForTest(m_node), error);
auto& dummy_chainstate{static_cast<DummyChainState&>(m_node.chainman->ActiveChainstate())};
dummy_chainstate.SetMempool(m_node.mempool.get());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would make sense to (probably in a separate PR, I suspect it'll be quite a big change) remove the Chainstate::m_mempool member. It feels like a footgunny shortcut (but need to investigate more), conceptually doesn't make a lot of sense to me, and not having it would avoid having to do e.g. theDummyChainState gymnastics. Is this something you've explored already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not come up with a satisfactory alternative yet, suggestions welcome!

m_node.chainman.reset();
// Process all callbacks referring to the old manager before creating a
// new one.
m_node.validation_signals->SyncWithValidationInterfaceQueue();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do this before calling m_node.chainman.reset();? I'm not sure if any of the callbacks actually depend on the chainman (it doesn't look like it, but I'll need to investigate more), but I think it's a bit more logical at least?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chainman reset may trigger validation interface notifications, so I think it makes sense to sync with the queue afterwards? Unlike the shutdown procedure, we keep the validation interface around beyond the restarts there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chainman reset may trigger validation interface notifications

Ah, right, ChainStateFlushed would be called. Yeah, I think with that knowledge doing it afterwards makes more sense. Also, callbacks are processed regardless of SyncWithValidationInterfaceQueue() being called, so I suppose this doesn't really change anything wrt callbacks assuming m_node.chainman exists.

Comment on lines 376 to 383
for (Chainstate* cs : chainman.GetAll()) {
LOCK(::cs_main);
cs->ForceFlushStateToDisk();
}
// Process all callbacks referring to the old manager before wiping it.
m_node.validation_signals->SyncWithValidationInterfaceQueue();
LOCK(::cs_main);
chainman.ResetChainstates();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the docstring is now a bit out of date:

    // Simulate a restart of the node by flushing all state to disk, clearing the
    // existing ChainstateManager, and unloading the block index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine as is? I'd prefer not to document these deeper behaviours, that the caller can expect to "just work".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant to to remove the flushing reference since that's now automatically handled by clearing chainman, so I agree with your comment.

Simulate a restart of the node by clearing the existing ChainstateManager, and unloading the block index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do that next time I have to push.

@sedited sedited force-pushed the chainman_flush_destructor branch from 6c3bc47 to 8e86e77 Compare January 3, 2025 20:54
@sedited
Copy link
Contributor Author

sedited commented Jan 3, 2025

Thank you for the review @stickies-v,

Updated 6c3bc47 -> 8e86e77 (chainman_flush_destructor_0 -> chainman_flush_destructor_1, compare)

  • Addressed @stickies-v's comment, moved Assert checking that there is no mempool construction error to right after constructing the mempool.
  • Addressed @stickies-v's comment, consolidated teardown notification comment into a single block.

Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

LOCK(::cs_main);
for (Chainstate* chainstate : GetAll()) {
if (chainstate->CanFlushToDisk()) {
chainstate->ForceFlushStateToDisk();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too familiar with the shutdown sequence - I don't fully understand why we need to flush the chainstates twice.
Do you know any examples of events that result in the second flush on destruction doing any actual work in bitcoind?

Copy link
Contributor Author

@sedited sedited Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIU the second flush is the original one, while the first flush was introduced in 3192975 to help cleanly process all remaining callbacks. Maybe it could have removed the second flush in that commit?

That said I think it is good practice to be defensive here and keep both.

@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed.
Debug: https://github.com/bitcoin/bitcoin/runs/35129562635

Hints

Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:

  • Possibly due to a silent merge conflict (the changes in this pull request being
    incompatible with the current code in the target branch). If so, make sure to rebase on the latest
    commit of the target branch.

  • A sanitizer issue, which can only be found by compiling with the sanitizer and running the
    affected test.

  • An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

@sedited sedited force-pushed the chainman_flush_destructor branch from 8e86e77 to fe7d09b Compare March 4, 2025 20:42
@sedited
Copy link
Contributor Author

sedited commented Nov 5, 2025

Rebased 6137cec -> 0adf8761c1f8860eb89f6162d7eec47f15a4c01b (chainman_flush_destructor_7 -> chainman_flush_destructor_8, compare)

@sedited
Copy link
Contributor Author

sedited commented Nov 25, 2025

@maflcko
Copy link
Member

maflcko commented Nov 25, 2025

allows for getting rid of the goto in bitcoin-chainstate

I don't think this is accurate?

@sedited
Copy link
Contributor Author

sedited commented Nov 25, 2025

Re #31382 (comment)

I don't think this is accurate?

Thanks, updated.

Next to being more consistent, since tearing down the mempool first
leaves a dangling pointer in the chainman, this also enables the changes
in the next commits, since flushing calls GetCoinsCacheSizeState, which
would access this dangling pointer otherwise.
This is done in preparation for the next commit to also change the
mempool pointer within the chainstate once the old mempool is reset.
Without this the old, dangling pointer will be reused when destructing
the chainstate manager.
Moving the second flush out of the Shutdown function into the
ChainstateManager destructor, should be safe since there should be no
new events created after the first flush.

This simplifies the shutdown code a bit and allows for getting rid of
the `goto` in bitcoin-chainstate.
Also remove the reference to the destructed chainman, which could cause
UB once the chainman is reset.
Now that its only callsite has been moved to the destructor, let the
smart pointer do its thing.
There is no need to flush the background callbacks, since the immediate
task runner is used for the validation signals, which does not have any
pending callbacks. This allows for removal of the code in the
destructor.
@sedited sedited force-pushed the chainman_flush_destructor branch from ba29aaf to 8ad5e19 Compare December 16, 2025 16:18
@sedited
Copy link
Contributor Author

sedited commented Dec 16, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Architecture & Performance

Development

Successfully merging this pull request may close these issues.

7 participants