node: optimize CBlockIndexWorkComparator #33334

Raimo33 · 2025-09-07T22:00:42Z

Summary

This PR optimizes CBlockIndexWorkComparator::operator() by removing 4 redundant branches.

The previous implementation used multiple separate comparisons with explicit branches for greater-than and less-than cases, resulting in unnecessary code paths.

The new implementation consolidates comparisons into single inequality checks and reduces complexity while preserving its original behavior. This change is particularly beneficial for loading blocks from files and reindexing.

Benchmarks

taskset -c 1 ./bin/bench_bitcoin --filter="(CheckBlockIndex|LoadExternalBlockFile)" -output-csv=bench_old.csv --min-time=30000

Before:

ns/op	op/s	err%	total	benchmark
129,988.82	7,692.97	0.0%	33.02	`CheckBlockIndex`
21,661,396.96	46.17	0.5%	32.37	`LoadExternalBlockFile`

After:

ns/op	op/s	err%	total	benchmark
115,346.65	8,669.52	0.0%	33.00	`CheckBlockIndex`
20,389,679.85	49.04	0.4%	31.76	`LoadExternalBlockFile`

Compared to master:

CheckBlockIndex +12.7% faster
LoadExternalBlockFile +6.2% faster

DrahtBot · 2025-09-07T22:00:47Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33334.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept ACK	l0rinc

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#33637 (refactor: optimize block index comparisons (1.4-7.7x faster) by l0rinc)
#29640 (Fix tiebreak when loading blocks from disk (and add tests for comparing chain ties) by sr-gi)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

Refactor the comparator logic in CBlockIndexWorkComparator::operator() to reduce the amounts of branches and improve readability without changing semantics. The previous implementation used multiple separate comparisons with explicit branches for greater-than and less-than cases, resulting in unnecessary code paths. The new implementation consolidates comparisons into single inequality checks and reduces complexity while preserving its original behavior. This change is particularly beneficial for loading blocks from files and reindexing. taskset -c 1 ./bin/bench_bitcoin --filter="(CheckBlockIndex|LoadExternalBlockFile|BlockToJsonVerboseWrite)" -output-csv=bench_old.csv --min-time=30000 | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 26,557,419.20 | 37.65 | 0.1% | 32.85 | `BlockToJsonVerboseWrite` | 129,988.82 | 7,692.97 | 0.0% | 33.02 | `CheckBlockIndex` | 21,661,396.96 | 46.17 | 0.5% | 32.37 | `LoadExternalBlockFile` taskset -c 1 ./bin/bench_bitcoin --filter="(CheckBlockIndex|LoadExternalBlockFile|BlockToJsonVerboseWrite|WalletIsMineDescriptors)" -output-csv=bench_new.csv --min-time=30000 | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 27,930,130.95 | 35.80 | 0.1% | 32.96 | `BlockToJsonVerboseWrite` | 115,346.65 | 8,669.52 | 0.0% | 33.00 | `CheckBlockIndex` | 20,389,679.85 | 49.04 | 0.4% | 31.76 | `LoadExternalBlockFile`

l0rinc

Concept ACK - please consider my suggestion

l0rinc · 2025-09-09T21:07:37Z

src/node/blockstorage.cpp

 bool CBlockIndexWorkComparator::operator()(const CBlockIndex* pa, const CBlockIndex* pb) const
 {
    // First sort by most total work, ...
-    if (pa->nChainWork > pb->nChainWork) return false;
-    if (pa->nChainWork < pb->nChainWork) return true;
+    if (pa->nChainWork != pb->nChainWork) {
+        return pa->nChainWork < pb->nChainWork;
+    }

    // ... then by earliest time received, ...
-    if (pa->nSequenceId < pb->nSequenceId) return false;
-    if (pa->nSequenceId > pb->nSequenceId) return true;
+    if (pa->nSequenceId != pb->nSequenceId) {
+        return pa->nSequenceId > pb->nSequenceId;
+    }

    // Use pointer address as tie breaker (should only happen with blocks
    // loaded from disk, as those all have id 0).
-    if (pa < pb) return false;
-    if (pa > pb) return true;
-
-    // Identical blocks.
-    return false;
+    return pa > pb;
 }


All this does is compare by 3 parameters - wouldn't it be simpler to use std::tie for lexicographical comparison instead?

Suggested change

bool CBlockIndexWorkComparator::operator()(const CBlockIndex* pa, const CBlockIndex* pb) const

{

// First sort by most total work, ...

if (pa->nChainWork > pb->nChainWork) return false;

if (pa->nChainWork < pb->nChainWork) return true;

if (pa->nChainWork != pb->nChainWork) {

return pa->nChainWork < pb->nChainWork;

}

// ... then by earliest time received, ...

if (pa->nSequenceId < pb->nSequenceId) return false;

if (pa->nSequenceId > pb->nSequenceId) return true;

if (pa->nSequenceId != pb->nSequenceId) {

return pa->nSequenceId > pb->nSequenceId;

}

// Use pointer address as tie breaker (should only happen with blocks

// loaded from disk, as those all have id 0).

if (pa < pb) return false;

if (pa > pb) return true;

// Identical blocks.

return false;

return pa > pb;

}

bool CBlockIndexWorkComparator::operator()(const CBlockIndex* pa, const CBlockIndex* pb) const

{

// First sort by most total work (descending), then by earliest activatable time (ascending), then by pointer value (ascending)

return std::tie(pa->nChainWork, pb->nSequenceId, pb) < std::tie(pb->nChainWork, pa->nSequenceId, pa);

}

Note that pa & pb usages are mixed to follow the original algorithm.

Since https://corecheck.dev/bitcoin/bitcoin/pulls/33334 hasn't finished yet, I ran the benchmarks on my Mac with min-time of 100 seconds for stability several times, and took the fastest of each execution (since noise can only slow down execution), which reveals that:

BlockToJsonVerboseWrite speed is completely unaffected in all cases;

LoadExternalBlockFile is the same in the current implementation but would be ~7% faster with the suggested solution;

CheckBlockIndex is indeed 17% faster for the current solution, but would be 24% faster with the proposed one - all while making the code a lot simpler;

Before:

ns/op op/s err% total benchmark

25,322,734.53 39.49 0.2% 109.93 BlockToJsonVerboseWrite

39,049.20 25,608.72 0.2% 110.15 CheckBlockIndex

5,955,124.54 167.92 0.1% 110.26 LoadExternalBlockFile

After:

ns/op op/s err% total benchmark

25,333,028.32 39.47 0.2% 109.61 BlockToJsonVerboseWrite

33,173.57 30,144.48 0.3% 109.82 CheckBlockIndex

6,169,392.66 162.09 0.4% 110.79 LoadExternalBlockFile

Suggested std::tie:

ns/op op/s err% total benchmark

25,360,146.33 39.43 0.3% 109.90 BlockToJsonVerboseWrite

31,343.25 31,904.80 0.7% 104.88 CheckBlockIndex

5,535,101.51 180.67 0.7% 967.85 LoadExternalBlockFile

If you decide to take the suggestion, please add me as a coauthor.

I experimented with std::tie before. whether it is more readable is debatable. I'd say they're the same in terms of readability (one might not be familiar with std::tie). I ran the benchmarks on my x86 with gcc laptop (not the same machine as my original benchmarks) and got different results from yours.

taskset -c 0 ./bin/bench_bitcoin --filter="(CheckBlockIndex|LoadExternalBlockFile)" --min-time=10000

Master:

ns/op op/s err% total benchmark

200,279.21 4,993.03 0.1% 10.95 CheckBlockIndex

29,798,546.03 33.56 0.3% 10.82 LoadExternalBlockFile

This PR:

ns/op op/s err% total benchmark

166,134.37 6,019.22 0.1% 10.91 CheckBlockIndex

29,547,138.28 33.84 0.1% 10.59 LoadExternalBlockFile

CheckBlockIndex: +17% faster than master
LoadExternalBlockFile: same as master

@l0rinc suggestion:

ns/op op/s err% total benchmark

193,085.31 5,179.06 0.0% 11.00 CheckBlockIndex

30,456,267.03 32.83 0.3% 10.96 LoadExternalBlockFile

CheckBlockIndex: +3.6% faster than master
LoadExternalBlockFile: -2.2% slower than master

Overall very different results on different machines/compilers. probably std::tie would be the most standard implementation, giving room for smarter future compilers to optimize. but I genuinely don't know how this could be less than 2 branches.

but I genuinely don't know how this could be less than 2 branches

let's find out - can you create a godbold reproducer with latest compilers?

https://godbolt.org/z/6TjP137z1

Looks like the current version wins both on x86 and arm, but I'm no assembly expert.

I still think the std::tie version is better, pushed it as an alternative as part of a bigger related optimization I had. Added you as coauthor there: #33637

Refactor the comparator logic in CBlockIndexWorkComparator::operator() to reduce the amounts of branches and improve readability without changing semantics. The previous implementation used multiple separate comparisons with explicit branches for greater-than and less-than cases, resulting in unnecessary code paths. The new implementation consolidates comparisons into single inequality checks and reduces complexity while preserving its original behavior. This change is particularly beneficial for loading blocks from files and reindexing. taskset -c 1 ./bin/bench_bitcoin --filter="(CheckBlockIndex|LoadExternalBlockFile|BlockToJsonVerboseWrite)" -output-csv=bench_old.csv --min-time=30000 | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 26,557,419.20 | 37.65 | 0.1% | 32.85 | `BlockToJsonVerboseWrite` | 129,988.82 | 7,692.97 | 0.0% | 33.02 | `CheckBlockIndex` | 21,661,396.96 | 46.17 | 0.5% | 32.37 | `LoadExternalBlockFile` taskset -c 1 ./bin/bench_bitcoin --filter="(CheckBlockIndex|LoadExternalBlockFile|BlockToJsonVerboseWrite|WalletIsMineDescriptors)" -output-csv=bench_new.csv --min-time=30000 | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 27,930,130.95 | 35.80 | 0.1% | 32.96 | `BlockToJsonVerboseWrite` | 115,346.65 | 8,669.52 | 0.0% | 33.00 | `CheckBlockIndex` | 20,389,679.85 | 49.04 | 0.4% | 31.76 | `LoadExternalBlockFile` Github-Pull: bitcoin#33334 Rebased-From: 80ac046

Raimo33 · 2025-10-17T09:17:36Z

Follow up in #33637

DrahtBot mentioned this pull request Sep 9, 2025

Fix tiebreak when loading blocks from disk (and add tests for comparing chain ties) #29640

Merged

Raimo33 force-pushed the index-work-comparator-branchless branch from 5a9e710 to 80ac046 Compare September 9, 2025 16:30

l0rinc suggested changes Sep 9, 2025

View reviewed changes

janb84 mentioned this pull request Oct 14, 2025

doc: archive release notes for v29.2 #33610

Merged

l0rinc mentioned this pull request Oct 16, 2025

refactor: optimize block index comparisons (1.4-6.8x faster) #33637

Open

Raimo33 closed this Oct 17, 2025

Raimo33 deleted the index-work-comparator-branchless branch October 24, 2025 09:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

node: optimize CBlockIndexWorkComparator #33334

node: optimize CBlockIndexWorkComparator #33334

Uh oh!

Raimo33 commented Sep 7, 2025 •

edited

Loading

Uh oh!

DrahtBot commented Sep 7, 2025 •

edited

Loading

Uh oh!

l0rinc left a comment

Uh oh!

l0rinc Sep 9, 2025 •

edited

Loading

Uh oh!

Raimo33 Sep 9, 2025 •

edited

Loading

Uh oh!

l0rinc Sep 9, 2025

Uh oh!

Raimo33 Sep 10, 2025

Uh oh!

l0rinc Oct 16, 2025

Uh oh!

Raimo33 commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ns/op	op/s	err%	total	benchmark
25,322,734.53	39.49	0.2%	109.93	`BlockToJsonVerboseWrite`
39,049.20	25,608.72	0.2%	110.15	`CheckBlockIndex`
5,955,124.54	167.92	0.1%	110.26	`LoadExternalBlockFile`

ns/op	op/s	err%	total	benchmark
25,333,028.32	39.47	0.2%	109.61	`BlockToJsonVerboseWrite`
33,173.57	30,144.48	0.3%	109.82	`CheckBlockIndex`
6,169,392.66	162.09	0.4%	110.79	`LoadExternalBlockFile`

ns/op	op/s	err%	total	benchmark
25,360,146.33	39.43	0.3%	109.90	`BlockToJsonVerboseWrite`
31,343.25	31,904.80	0.7%	104.88	`CheckBlockIndex`
5,535,101.51	180.67	0.7%	967.85	`LoadExternalBlockFile`

ns/op	op/s	err%	total	benchmark
200,279.21	4,993.03	0.1%	10.95	`CheckBlockIndex`
29,798,546.03	33.56	0.3%	10.82	`LoadExternalBlockFile`

ns/op	op/s	err%	total	benchmark
166,134.37	6,019.22	0.1%	10.91	`CheckBlockIndex`
29,547,138.28	33.84	0.1%	10.59	`LoadExternalBlockFile`

ns/op	op/s	err%	total	benchmark
193,085.31	5,179.06	0.0%	11.00	`CheckBlockIndex`
30,456,267.03	32.83	0.3%	10.96	`LoadExternalBlockFile`

node: optimize CBlockIndexWorkComparator #33334

node: optimize CBlockIndexWorkComparator #33334

Uh oh!

Conversation

Raimo33 commented Sep 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmarks

Uh oh!

DrahtBot commented Sep 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage & Benchmarks

Reviews

Conflicts

Uh oh!

l0rinc left a comment

Choose a reason for hiding this comment

Uh oh!

l0rinc Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Raimo33 Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

l0rinc Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

Raimo33 Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

l0rinc Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Raimo33 commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Raimo33 commented Sep 7, 2025 •

edited

Loading

DrahtBot commented Sep 7, 2025 •

edited

Loading

l0rinc Sep 9, 2025 •

edited

Loading

Raimo33 Sep 9, 2025 •

edited

Loading