Skip to content

Conversation

@pstratem
Copy link
Contributor

@pstratem pstratem commented Jul 6, 2025

As IsInitialBlockDownload latches to false only once the Tip is sufficiently
advanced there is no need to check the Tip everytime IsIBD is called.

By caching this in advance we can avoid extra work and more importantly a lock.

@DrahtBot
Copy link
Contributor

DrahtBot commented Jul 6, 2025

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32885.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept ACK luke-jr, l0rinc

If your review is incorrectly listed, please copy-paste <!--meta-tag:bot-skip--> into the comment that the bot should ignore.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #30342 (kernel, logging: Pass Logger instances to kernel objects by ryanofsky)
  • #29640 (Fix tiebreak when loading blocks from disk (and add tests for comparing chain ties) by sr-gi)
  • #24230 (indexes: Stop using node internal types and locking cs_main, improve sync logic by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@pstratem pstratem force-pushed the 2025-07-05-lockless-isibd branch 4 times, most recently from c5084ac to 5e85698 Compare July 6, 2025 23:52
@DrahtBot DrahtBot removed the CI failed label Jul 7, 2025
@pstratem pstratem force-pushed the 2025-07-05-lockless-isibd branch 2 times, most recently from 0709758 to f3ee281 Compare July 7, 2025 01:58
@DrahtBot
Copy link
Contributor

DrahtBot commented Jul 7, 2025

🚧 At least one of the CI tasks failed.
Task tidy: https://github.com/bitcoin/bitcoin/runs/45440913265
LLM reason (✨ experimental): The CI failure is caused by compilation errors due to missing mutex lock assertions in validation.cpp.

Hints

Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:

  • Possibly due to a silent merge conflict (the changes in this pull request being
    incompatible with the current code in the target branch). If so, make sure to rebase on the latest
    commit of the target branch.

  • A sanitizer issue, which can only be found by compiling with the sanitizer and running the
    affected test.

  • An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

@pstratem pstratem force-pushed the 2025-07-05-lockless-isibd branch from f3ee281 to 2e3fefb Compare July 7, 2025 02:18
@pstratem pstratem changed the title Cache m_cached_finished_ibd in ActivateBestChain. Cache m_cached_finished_ibd where SetTip is called. Jul 7, 2025
@DrahtBot DrahtBot changed the title Cache m_cached_finished_ibd where SetTip is called. Cache m_cached_finished_ibd where SetTip is called. Jul 7, 2025
@DrahtBot DrahtBot removed the CI failed label Jul 7, 2025
Copy link
Contributor

@stickies-v stickies-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptually not a bad idea to cache and lock less, but imo this makes the code more brittle (and harder to understand), e.g. if any tip updates happen without the cache being updated separately.

Do you have any data as to the actual performance improvements from this PR?

Copy link
Member

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something like this, if properly implemented (I haven't thought much about the code yet), would reduce the GUI freezes during IBD in a noticeable manner.

@pstratem
Copy link
Contributor Author

pstratem commented Jul 8, 2025

Conceptually not a bad idea to cache and lock less, but imo this makes the code more brittle (and harder to understand), e.g. if any tip updates happen without the cache being updated separately.

Do you have any data as to the actual performance improvements from this PR?

I'm (very) open to suggestions on how to make the caching call more robust. (Indeed I expected some.)

There's no performance improvement from this PR, it's the first in a series of proposed changes I'll be making to remove locking where it's not necessary, with the end goal being some form of concurrency being possible in message processing.

@pstratem
Copy link
Contributor Author

pstratem commented Jul 8, 2025

I think something like this, if properly implemented (I haven't thought much about the code yet), would reduce the GUI freezes during IBD in a noticeable manner.

I hadn't even considered that, but certainly that's a possible direct improvement.

@l0rinc
Copy link
Contributor

l0rinc commented Jul 8, 2025

@pstratem, not sure if you saw this, but could be helpful: #25081

@maflcko
Copy link
Member

maflcko commented Jul 8, 2025

tight polling of is_ibd seems like a mistake in the first place, so i am not sure if this is something to optimize for.

Looking at the remaining call sites of the ibd check, most have cs_main already, so they won't be affected by this? The remaining ones (I only found MaybeSendFeefilter), if they are relevant, could either re-order their code to call it less often, or cache the bool themselves? For the gui, see also #17145

Copy link
Member

@luke-jr luke-jr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK, but I'm not convinced this implementation is safe as-is. If we want to maintain the current behaviour, it's not sufficient to update only when the tip changes. We also need to re-check when importing/reindexing completes, and schedule an update timer if max_tip_age is the final cause of not exiting IBD.

@pstratem
Copy link
Contributor Author

Concept ACK, but I'm not convinced this implementation is safe as-is. If we want to maintain the current behaviour, it's not sufficient to update only when the tip changes. We also need to re-check when importing/reindexing completes, and schedule an update timer if max_tip_age is the final cause of not exiting IBD.

This made me revisit the function and consider what we're trying to achieve.

The function is only interesting when it can latch to the IBD finished state.

That's only possible when all four conditions are met, which can only happen when the tip is updated.

The final time based condition can only change when the tip changes as it gets further away with time, not closer.

@pstratem
Copy link
Contributor Author

@pstratem, not sure if you saw this, but could be helpful: #25081

That seems like it would be useful, but for now I'm just parsing the lock contention messages out of debug.log

Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is only interesting when it can latch to the IBD finished state.

That's only possible when all four conditions are met, which can only happen when the tip is updated.

The final time based condition can only change when the tip changes as it gets further away with time, not closer.

I think that @luke-jr is right. If we reindex, we set m_importing to true in ImportBlocks, so any blocks we connect there can never result in getting out of IBD due to the m_blockman.LoadingBlocks() early return.
Therefore we need a call to CacheIsInitialBlockDownload() after ImportingNow goes out of scope in ImportBlocks().

Copy link
Contributor

@mzumsande mzumsande Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First I thought this was not necessary because disconnecting a block shouldn't usually get you out of IBD, but I guess there are edge cases (starting up, with the old tip having a lower timestamp than it's parent block) where this could lead to get us out of IBD?!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's technically possible for disconnecting a block to get us out of IBD, though I really don't think that particular edge case is super important.

I was just trying to be thorough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the function is annotated with EXCLUSIVE_LOCKS_REQUIRED(cs_main) anyway, why not put it to the beginning of the function, as it is done in most other places?

@pstratem pstratem force-pushed the 2025-07-05-lockless-isibd branch from 2e3fefb to 116bdc7 Compare July 26, 2025 20:01
@pstratem
Copy link
Contributor Author

Ok I thought about it and it just wasn't obviously correct enough.

So I've rewritten into three commits to be simpler.

pstratem added 2 commits July 26, 2025 16:05
…ntipRecent.

On systems with sane clocks the chain tip checks can only change when the tip
changes. The gap between the chain tip and the current time only grows.
@DrahtBot
Copy link
Contributor

🐙 This pull request conflicts with the target branch and needs rebase.

Copy link
Contributor

@l0rinc l0rinc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK, makes sense to push the burden to the writer instead of the reader.
But we need to restructure it slightly so that it tells a story of what we're extracting, delegating and caching exactly.

I have implemented an example in l0rinc#60 (prototype, may not pass all tests yet).

void TestChainstateManager::ResetIbd()
{
m_cached_finished_ibd = false;
m_cached_chaintip_recent = false;
Copy link
Contributor

@l0rinc l0rinc Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the existence of his whole method very hacky, we're testing something that cannot happen in reality so if the test passes or fails after this, it won't increase my confidence in the product.
But if you insist on updating it (which we likely have to), we should update JumpOutOfIbd as well for symmetry.

return false;
}

void ChainstateManager::UpdateCachedChaintipRecent()
Copy link
Contributor

@l0rinc l0rinc Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're introducing dead code in the first commit without context about where these values are coming from.
What if instead we extract the internal checks from IsInitialBlockDownload and slowly migrate that behavior away from there.

Note also that ActiveTip() already returns the tip we need.
I'm also not exactly sure why we're calling the current state "cached".
And we're already in ChainstateManager, simply referring to "tip" is already unambiguous.

The first commit could lay the groundwork by extracting-and-reusing the recency check only, the second commit could route active chain SetTip through ChainstateManager to make sure each state change updates this as well, the third commit could cache the locked recency calculations, and the last one could finally eliminate the lock from the reader side.


/** Check whether we are doing an initial block download (synchronizing from disk or network) */
bool IsInitialBlockDownload() const;
void UpdateCachedChaintipRecent() EXCLUSIVE_LOCKS_REQUIRED(cs_main);
Copy link
Contributor

@l0rinc l0rinc Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be const getter instead and it could use some comment (and I'd specialize it to just return the value instead of mutating the state, we can do that in the SetTip method instead)

Suggested change
void UpdateCachedChaintipRecent() EXCLUSIVE_LOCKS_REQUIRED(cs_main);
/** Check whether the active chain tip exists, has enough work, and is recent. */
bool IsTipRecent() const EXCLUSIVE_LOCKS_REQUIRED(cs_main);

}
}

m_chain.SetTip(*pindexDelete->pprev);
Copy link
Contributor

@l0rinc l0rinc Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the reason for separating this work from SetTip, if it's related to it? We could move it to the manager which would call both method, the tip update, followed by the IBD state update

if (chain.Tip()->nChainWork < MinimumChainWork()) return;
if (chain.Tip()->Time() < Now<NodeSeconds>() - m_options.max_tip_age) return;

m_cached_chaintip_recent = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we guard this method with this being false?

* const, which latches this for caching purposes.
*/
mutable std::atomic<bool> m_cached_finished_ibd{false};
mutable std::atomic<bool> m_cached_chaintip_recent{false};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to mention chain here and we can use std::atomic_bool instead and should add some description to it

@l0rinc
Copy link
Contributor

l0rinc commented Dec 29, 2025

@pstratem, are you still working on this or would you like me to take over?

@sedited
Copy link
Contributor

sedited commented Jan 11, 2026

There hasn't been progress here in many months. Maybe time to re-open it?

@l0rinc
Copy link
Contributor

l0rinc commented Jan 11, 2026

I will open an alternative PR for this today
Edit: pushed #34253

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants