Skip to content

Conversation

@Raimo33
Copy link
Contributor

@Raimo33 Raimo33 commented Sep 7, 2025

Summary

This PR optimizes CBlockIndexWorkComparator::operator() by removing 4 redundant branches.

The previous implementation used multiple separate comparisons with explicit branches for greater-than and less-than cases, resulting in unnecessary code paths.

The new implementation consolidates comparisons into single inequality checks and reduces complexity while preserving its original behavior. This change is particularly beneficial for loading blocks from files and reindexing.

Benchmarks

taskset -c 1 ./bin/bench_bitcoin --filter="(CheckBlockIndex|LoadExternalBlockFile)" -output-csv=bench_old.csv --min-time=30000

Before:

ns/op op/s err% total benchmark
129,988.82 7,692.97 0.0% 33.02 CheckBlockIndex
21,661,396.96 46.17 0.5% 32.37 LoadExternalBlockFile

After:

ns/op op/s err% total benchmark
115,346.65 8,669.52 0.0% 33.00 CheckBlockIndex
20,389,679.85 49.04 0.4% 31.76 LoadExternalBlockFile

Compared to master:

  • CheckBlockIndex +12.7% faster
  • LoadExternalBlockFile +6.2% faster

@DrahtBot
Copy link
Contributor

DrahtBot commented Sep 7, 2025

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33334.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept ACK l0rinc

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #33637 (refactor: optimize block index comparisons (1.4-7.7x faster) by l0rinc)
  • #29640 (Fix tiebreak when loading blocks from disk (and add tests for comparing chain ties) by sr-gi)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

Refactor the comparator logic in CBlockIndexWorkComparator::operator() to reduce the amounts of branches and improve readability without changing semantics.

The previous implementation used multiple separate comparisons with explicit branches for greater-than and less-than cases, resulting in
unnecessary code paths.

The new implementation consolidates comparisons into single inequality checks and reduces complexity while preserving its original behavior.
This change is particularly beneficial for loading blocks from files and reindexing.

taskset -c 1 ./bin/bench_bitcoin --filter="(CheckBlockIndex|LoadExternalBlockFile|BlockToJsonVerboseWrite)" -output-csv=bench_old.csv --min-time=30000

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|       26,557,419.20 |               37.65 |    0.1% |     32.85 | `BlockToJsonVerboseWrite`
|          129,988.82 |            7,692.97 |    0.0% |     33.02 | `CheckBlockIndex`
|       21,661,396.96 |               46.17 |    0.5% |     32.37 | `LoadExternalBlockFile`

taskset -c 1 ./bin/bench_bitcoin --filter="(CheckBlockIndex|LoadExternalBlockFile|BlockToJsonVerboseWrite|WalletIsMineDescriptors)" -output-csv=bench_new.csv --min-time=30000

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|       27,930,130.95 |               35.80 |    0.1% |     32.96 | `BlockToJsonVerboseWrite`
|          115,346.65 |            8,669.52 |    0.0% |     33.00 | `CheckBlockIndex`
|       20,389,679.85 |               49.04 |    0.4% |     31.76 | `LoadExternalBlockFile`
@Raimo33 Raimo33 force-pushed the index-work-comparator-branchless branch from 5a9e710 to 80ac046 Compare September 9, 2025 16:30
Copy link
Contributor

@l0rinc l0rinc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK - please consider my suggestion

Comment on lines 158 to 173
bool CBlockIndexWorkComparator::operator()(const CBlockIndex* pa, const CBlockIndex* pb) const
{
// First sort by most total work, ...
if (pa->nChainWork > pb->nChainWork) return false;
if (pa->nChainWork < pb->nChainWork) return true;
if (pa->nChainWork != pb->nChainWork) {
return pa->nChainWork < pb->nChainWork;
}

// ... then by earliest time received, ...
if (pa->nSequenceId < pb->nSequenceId) return false;
if (pa->nSequenceId > pb->nSequenceId) return true;
if (pa->nSequenceId != pb->nSequenceId) {
return pa->nSequenceId > pb->nSequenceId;
}

// Use pointer address as tie breaker (should only happen with blocks
// loaded from disk, as those all have id 0).
if (pa < pb) return false;
if (pa > pb) return true;

// Identical blocks.
return false;
return pa > pb;
}
Copy link
Contributor

@l0rinc l0rinc Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this does is compare by 3 parameters - wouldn't it be simpler to use std::tie for lexicographical comparison instead?

Suggested change
bool CBlockIndexWorkComparator::operator()(const CBlockIndex* pa, const CBlockIndex* pb) const
{
// First sort by most total work, ...
if (pa->nChainWork > pb->nChainWork) return false;
if (pa->nChainWork < pb->nChainWork) return true;
if (pa->nChainWork != pb->nChainWork) {
return pa->nChainWork < pb->nChainWork;
}
// ... then by earliest time received, ...
if (pa->nSequenceId < pb->nSequenceId) return false;
if (pa->nSequenceId > pb->nSequenceId) return true;
if (pa->nSequenceId != pb->nSequenceId) {
return pa->nSequenceId > pb->nSequenceId;
}
// Use pointer address as tie breaker (should only happen with blocks
// loaded from disk, as those all have id 0).
if (pa < pb) return false;
if (pa > pb) return true;
// Identical blocks.
return false;
return pa > pb;
}
bool CBlockIndexWorkComparator::operator()(const CBlockIndex* pa, const CBlockIndex* pb) const
{
// First sort by most total work (descending), then by earliest activatable time (ascending), then by pointer value (ascending)
return std::tie(pa->nChainWork, pb->nSequenceId, pb) < std::tie(pb->nChainWork, pa->nSequenceId, pa);
}

Note that pa & pb usages are mixed to follow the original algorithm.

Since https://corecheck.dev/bitcoin/bitcoin/pulls/33334 hasn't finished yet, I ran the benchmarks on my Mac with min-time of 100 seconds for stability several times, and took the fastest of each execution (since noise can only slow down execution), which reveals that:

  • BlockToJsonVerboseWrite speed is completely unaffected in all cases;
  • LoadExternalBlockFile is the same in the current implementation but would be ~7% faster with the suggested solution;
  • CheckBlockIndex is indeed 17% faster for the current solution, but would be 24% faster with the proposed one - all while making the code a lot simpler;

Before:

ns/op op/s err% total benchmark
25,322,734.53 39.49 0.2% 109.93 BlockToJsonVerboseWrite
39,049.20 25,608.72 0.2% 110.15 CheckBlockIndex
5,955,124.54 167.92 0.1% 110.26 LoadExternalBlockFile

After:

ns/op op/s err% total benchmark
25,333,028.32 39.47 0.2% 109.61 BlockToJsonVerboseWrite
33,173.57 30,144.48 0.3% 109.82 CheckBlockIndex
6,169,392.66 162.09 0.4% 110.79 LoadExternalBlockFile

Suggested std::tie:

ns/op op/s err% total benchmark
25,360,146.33 39.43 0.3% 109.90 BlockToJsonVerboseWrite
31,343.25 31,904.80 0.7% 104.88 CheckBlockIndex
5,535,101.51 180.67 0.7% 967.85 LoadExternalBlockFile

If you decide to take the suggestion, please add me as a coauthor.

Copy link
Contributor Author

@Raimo33 Raimo33 Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I experimented with std::tie before. whether it is more readable is debatable. I'd say they're the same in terms of readability (one might not be familiar with std::tie). I ran the benchmarks on my x86 with gcc laptop (not the same machine as my original benchmarks) and got different results from yours.

taskset -c 0 ./bin/bench_bitcoin --filter="(CheckBlockIndex|LoadExternalBlockFile)" --min-time=10000

Master:

ns/op op/s err% total benchmark
200,279.21 4,993.03 0.1% 10.95 CheckBlockIndex
29,798,546.03 33.56 0.3% 10.82 LoadExternalBlockFile

This PR:

ns/op op/s err% total benchmark
166,134.37 6,019.22 0.1% 10.91 CheckBlockIndex
29,547,138.28 33.84 0.1% 10.59 LoadExternalBlockFile

CheckBlockIndex: +17% faster than master
LoadExternalBlockFile: same as master

@l0rinc suggestion:

ns/op op/s err% total benchmark
193,085.31 5,179.06 0.0% 11.00 CheckBlockIndex
30,456,267.03 32.83 0.3% 10.96 LoadExternalBlockFile

CheckBlockIndex: +3.6% faster than master
LoadExternalBlockFile: -2.2% slower than master

Overall very different results on different machines/compilers. probably std::tie would be the most standard implementation, giving room for smarter future compilers to optimize. but I genuinely don't know how this could be less than 2 branches.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I genuinely don't know how this could be less than 2 branches

let's find out - can you create a godbold reproducer with latest compilers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://godbolt.org/z/6TjP137z1

Looks like the current version wins both on x86 and arm, but I'm no assembly expert.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think the std::tie version is better, pushed it as an alternative as part of a bigger related optimization I had. Added you as coauthor there: #33637

luke-jr pushed a commit to bitcoinknots/bitcoin that referenced this pull request Sep 19, 2025
Refactor the comparator logic in CBlockIndexWorkComparator::operator() to reduce the amounts of branches and improve readability without changing semantics.

The previous implementation used multiple separate comparisons with explicit branches for greater-than and less-than cases, resulting in
unnecessary code paths.

The new implementation consolidates comparisons into single inequality checks and reduces complexity while preserving its original behavior.
This change is particularly beneficial for loading blocks from files and reindexing.

taskset -c 1 ./bin/bench_bitcoin --filter="(CheckBlockIndex|LoadExternalBlockFile|BlockToJsonVerboseWrite)" -output-csv=bench_old.csv --min-time=30000

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|       26,557,419.20 |               37.65 |    0.1% |     32.85 | `BlockToJsonVerboseWrite`
|          129,988.82 |            7,692.97 |    0.0% |     33.02 | `CheckBlockIndex`
|       21,661,396.96 |               46.17 |    0.5% |     32.37 | `LoadExternalBlockFile`

taskset -c 1 ./bin/bench_bitcoin --filter="(CheckBlockIndex|LoadExternalBlockFile|BlockToJsonVerboseWrite|WalletIsMineDescriptors)" -output-csv=bench_new.csv --min-time=30000

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|       27,930,130.95 |               35.80 |    0.1% |     32.96 | `BlockToJsonVerboseWrite`
|          115,346.65 |            8,669.52 |    0.0% |     33.00 | `CheckBlockIndex`
|       20,389,679.85 |               49.04 |    0.4% |     31.76 | `LoadExternalBlockFile`

Github-Pull: bitcoin#33334
Rebased-From: 80ac046
@Raimo33
Copy link
Contributor Author

Raimo33 commented Oct 17, 2025

Follow up in #33637

@Raimo33 Raimo33 closed this Oct 17, 2025
@Raimo33 Raimo33 deleted the index-work-comparator-branchless branch October 24, 2025 09:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants