Skip to content

Conversation

@l0rinc
Copy link
Contributor

@l0rinc l0rinc commented Oct 16, 2025

Context

The comparator is often called either directly or for the sorting of setBlockIndexCandidates, a red-black tree-set containing valid block headers where the comparator is invoked extensively.
Besided the sorted set usages, the comparator is used directly in Chainstate in FindMostWorkChain, PruneBlockIndexCandidates, ActivateBestChain and InvalidateBlock; and in ChainstateManager in LoadBlockIndex, CheckBlockIndex, ActivateSnapshot and PopulateAndValidateSnapshot.

It was profiled during the investigation of #33618 (comment).

Testing

To ensure the optimized implementations are both fast and correct, the first commit adds a dedicated benchmark to measure CBlockIndexWorkComparator performance, and the second commit adds randomized tests comparing the new implementation with the original one.

Results

GCC 15.0.1:

  • CBlockIndexWorkComparator: 100,772,511 cmp/s → 656,674,205 cmp/s = 6.51x faster
  • CheckBlockIndex: 9,091 ops/s → 14,765 ops/s = 1.62x faster

Clang 22.0.0:

  • CBlockIndexWorkComparator: 100,451,893 cmp/s → 683,414,234 cmp/s = 6.8x faster
  • CheckBlockIndex: 10,322 ops/s → 14,376 ops/s = 1.39x faster

See https://corecheck.dev/bitcoin/bitcoin/pulls/33637 for the CI's CheckBlockIndex performance.

gcc and clang measurements

Compiler: gcc 15.0.1
b60450f bench: add benchmark to measure CBlockIndexWorkComparator performance

ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark
9.92 100,772,511.62 0.0% 63.98 35.64 1.795 14.17 1.9% 5.50 CBlockIndexWorkComparator
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
109,996.46 9,091.20 0.2% 1,014,421.11 394,979.29 2.568 313,025.11 0.0% 5.50 CheckBlockIndex

e2e0217 refactor: inline arith_uint256 comparison operator

ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark
6.48 154,439,491.66 0.0% 31.01 23.25 1.334 7.16 3.8% 5.50 CBlockIndexWorkComparator
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
105,754.86 9,455.83 0.1% 913,130.11 379,588.20 2.406 276,692.11 0.0% 5.50 CheckBlockIndex

85b74b0 refactor: optimize arith_uint256 comparison with spaceship operator

ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark
6.37 156,990,488.06 0.0% 28.85 22.87 1.261 8.61 3.2% 5.50 CBlockIndexWorkComparator
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
83,803.10 11,932.73 0.0% 743,565.09 300,824.84 2.472 232,646.08 0.0% 5.56 CheckBlockIndex

deb58ee refactor: optimize CBlockIndexWorkComparator with std::tie

ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark
1.52 656,674,205.98 0.0% 13.18 5.47 2.410 3.06 0.1% 5.50 CBlockIndexWorkComparator
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
67,726.84 14,765.19 0.2% 585,826.07 243,155.90 2.409 181,920.07 0.0% 5.54 CheckBlockIndex

Compiler: clang 22.0.0

b60450f bench: add benchmark to measure CBlockIndexWorkComparator performance

ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark
9.96 100,451,893.46 0.0% 61.28 35.75 1.714 13.62 2.1% 5.52 CBlockIndexWorkComparator
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
96,878.70 10,322.19 0.1% 802,827.10 347,679.01 2.309 234,823.10 0.0% 5.50 CheckBlockIndex

e2e0217 refactor: inline arith_uint256 comparison operator

ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark
6.13 163,183,693.10 0.0% 25.74 22.01 1.170 6.10 4.6% 5.50 CBlockIndexWorkComparator
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
86,307.94 11,586.42 1.2% 646,229.09 309,785.33 2.086 195,119.09 0.0% 5.54 CheckBlockIndex

85b74b0 refactor: optimize arith_uint256 comparison with spaceship operator

ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark
6.36 157,330,900.16 1.0% 26.20 22.61 1.159 6.55 4.4% 5.53 CBlockIndexWorkComparator
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
75,031.66 13,327.71 1.0% 650,604.15 266,934.04 2.437 149,967.14 0.0% 5.54 CheckBlockIndex

deb58ee refactor: optimize CBlockIndexWorkComparator with std::tie

ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark
1.46 683,414,234.14 0.0% 16.12 5.25 3.067 4.02 0.1% 5.50 CBlockIndexWorkComparator
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
69,559.70 14,376.14 0.1% 559,208.07 249,654.28 2.240 132,342.07 0.0% 5.51 CheckBlockIndex

Reproducer

Run the equivalence tests with:

cmake -B build && cmake --build build && ./build/bin/test_bitcoin --run_test=arith_uint256_tests,blockchain_tests

Each commit shows how it changes the relevant benchmarks.

Benchmark script
for compiler in gcc clang; do \
  if [ "$compiler" = "gcc" ]; then CC=gcc; CXX=g++; COMP_VER=$(gcc -dumpfullversion); \
  else CC=clang; CXX=clang++; COMP_VER=$(clang -dumpversion); fi && \
  echo "> Compiler: $compiler $COMP_VER" && \
  for commit in b60450fae83970daa9dc2da0706bf126a2f41515 e2e02177ba6f7fac34eda9696dad2e8ecd44e6cd 85b74b01de7e914d07630138eaa78f09a083b85b deb58eea2fdd2712a28aa6b81417087426b19f5b; do \
    git fetch origin $commit >/dev/null 2>&1 && git checkout $commit >/dev/null 2>&1 && git log -1 --pretty='%h %s' && \
    rm -rf build && \
    cmake -B build -DBUILD_BENCH=ON -DENABLE_IPC=OFF -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_C_COMPILER=$CC -DCMAKE_CXX_COMPILER=$CXX >/dev/null 2>&1 && \
    cmake --build build -j$(nproc) >/dev/null 2>&1 && \
    for i in 1; do \
      build/bin/bench_bitcoin -filter='CBlockIndexWorkComparator|CheckBlockIndex' -min-time=5000; \
    done; \
  done; \
done

Note: something similar was already started in #33334, but this is a broader optimization that doesn't use the same technique: added as coauthor regardless.

@DrahtBot
Copy link
Contributor

DrahtBot commented Oct 16, 2025

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33637.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK optout21
Concept ACK mzumsande
Approach ACK Raimo33
Stale ACK laanwj

If your review is incorrectly listed, please copy-paste <!--meta-tag:bot-skip--> into the comment that the bot should ignore.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #34179 (refactor: Enable transparent lookup for setBlockIndexCandidates to remove const_cast by joaonevess)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

LLM Linter (✨ experimental)

Possible places where named args may be used (e.g. func(x, /*named_arg=*/0) in C++, and func(x, named_arg=0) in Python):

  • DoubleEquals(difficulty, expected_difficulty, 0.00001) in src/test/blockchain_tests.cpp

2025-11-25

@Raimo33
Copy link
Contributor

Raimo33 commented Oct 17, 2025

Approach ACK

I have tested the new block index comparator but I’ll refrain from acking the added benchmarks/tests

@l0rinc l0rinc force-pushed the l0rinc/block_index_comparators branch from f74572e to c15d839 Compare October 18, 2025 05:55
@l0rinc l0rinc changed the title refactor: optimize block index comparisons (1.4-7.7x faster) refactor: optimize block index comparisons (1.4-6.8x faster) Oct 18, 2025
@Christewart
Copy link
Contributor

Christewart commented Oct 18, 2025

I attempted to run the script, not really sure what these results indicate. Just pasting what the results were

Details
Darwin Chriss-MacBook-Pro.local 24.6.0 Darwin Kernel Version 24.6.0: Mon Jul 14 11:30:55 PDT 2025; root:xnu-11417.140.69~1/RELEASE_ARM64_T6031 arm64
Apple clang version 17.0.0 (clang-1700.3.19.1)
Target: arm64-apple-darwin24.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
|              ns/cmp |               cmp/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                4.97 |      201,237,014.91 |    0.4% |      5.49 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|           43,777.21 |           22,842.94 |    0.3% |      5.41 | `CheckBlockIndex`
|              ns/cmp |               cmp/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                1.89 |      529,694,702.31 |    0.1% |      5.50 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|           38,303.87 |           26,107.02 |    0.1% |      5.35 | `CheckBlockIndex`
|              ns/cmp |               cmp/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                2.31 |      432,334,219.11 |    0.6% |      5.45 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|           33,598.60 |           29,763.15 |    0.2% |      5.32 | `CheckBlockIndex`
|              ns/cmp |               cmp/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                0.78 |    1,287,258,159.19 |    0.1% |      5.46 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|           31,055.32 |           32,200.61 |    0.1% |      5.30 | `CheckBlockIndex`

@l0rinc
Copy link
Contributor Author

l0rinc commented Oct 19, 2025

Thanks for the measurements @Christewart, this is how your measurements compare to mine (but most importantly how it compares to master):
image

Copy link
Member

@laanwj laanwj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK c15d839aefba017e64f14cd9cd2779655352f4b6
i did not run the benchmarks but the code changes look good, using <=> makes sense.

@l0rinc l0rinc force-pushed the l0rinc/block_index_comparators branch from c15d839 to a558471 Compare October 28, 2025 23:18
@l0rinc
Copy link
Contributor Author

l0rinc commented Oct 28, 2025

Rebased after #29640 - only change was adjusting the comment

Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

Not 100% sure yet about 2dd0f2ced35a268bcab661c671e7c70271cdd91f, seems like inlining gives most of the speedup, whereas the gain of using the spaceship operator (which comes at the cost of readability) is marginal.

if (pa->nChainWork > pb->nChainWork) return false;
if (pa->nChainWork < pb->nChainWork) return true;

// ... then by earliest time received, ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should take the latest comments from the old function after #29640

Copy link
Contributor Author

@l0rinc l0rinc Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already done that, which parts do you think I'm missing? Edit: ah in the tests, good point, let me update it

@@ -117,6 +117,43 @@ BOOST_AUTO_TEST_CASE(num_chain_tx_max)
BOOST_CHECK_EQUAL(block_index.m_chain_tx_count, std::numeric_limits<uint64_t>::max());
}

BOOST_AUTO_TEST_CASE(cblockindex_comparator_equivalence)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like a fuzz test, why not make it an actual fuzz target out of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I look at fuzz tests as exploratory tools that are hard to write and run and debug and do code coverage on - not trivial to maintain (see #33731), it's definitely not my go-to way to test something. This is just a randomized property based unit test - easy to debug, easy to run and understand.
But looking at the tests again, I can make it more deterministic so that it hits all important branches which allows me to reduce the iteration count - code coverage and debugging shows this regularly tests each branch now (green line on left side):
image
Added you as coauthor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I look at fuzz tests as exploratory tools that are hard to write and run and debug and do code coverage ... not trivial to maintain ... not my go-to way to test something

There is a learning curve to writing and using fuzz testing but it is 100% worth getting in to. The maintenance aspect only relates to running them using libFuzzer on macOS.

On average, coverage guided fuzzing will be magnitudes better at finding bugs than this style of unit test. For something as small as the comparator your approach is probably good enough, although you could just have a fixed list of test cases that enumerate all possibilities (for the old comparator) instead? (maybe you do already, didn't look)

@@ -117,6 +117,43 @@ BOOST_AUTO_TEST_CASE(num_chain_tx_max)
BOOST_CHECK_EQUAL(block_index.m_chain_tx_count, std::numeric_limits<uint64_t>::max());
}

BOOST_AUTO_TEST_CASE(cblockindex_comparator_equivalence)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would suggest to add a bit more explanation, e.g. mentioning that old_comparator is a snapshot of an old implementation, so that people who look at this in years understand where it's coming from.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly dislike code comments, I prefer explaining with live code over dead comments - and if people disagree they can always do a blame which instantly reveals the purpose since I also over-explain in commit messages usually.
Are the names old_comparator in a cblockindex_comparator_equivalence not enough to make it obvious that? I
I have added a comment anyway, let me know if it helps.

@@ -87,12 +87,20 @@ static constexpr uint32_t UNDO_DATA_DISK_OVERHEAD{STORAGE_HEADER_BYTES + uint256
using BlockMap = std::unordered_map<uint256, CBlockIndex, BlockHasher>;

struct CBlockIndexWorkComparator {
bool operator()(const CBlockIndex* pa, const CBlockIndex* pb) const;
// First sort by most total work (descending), then by earliest activatable time (ascending), then by pointer value (ascending).
Copy link
Contributor

@mzumsande mzumsande Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find "ascending/descending" confusing here, I would have expected the opposites: "descending work" means most work first, but the strict weak ordering of C++ comparators would place the one with the lower work first.

Would suggest something like
"First compare by work (less first), then by earliest activatable time (higher sequence first), then by pointer value (higher first)."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to use the SQL order-by terminology here

Ascending order puts smaller values first, where “smaller” is defined in terms of the < operator.
Similarly, descending order is determined with the > operator.

But you're right, I inverted the terminology, added a dedicated test to check for hard-coded order (first param is increasing, second is decreasing), updated the comments (this is why I find comments a lazy solution, it's too easy to keep them up-to-date), added you as coauthor.

@l0rinc l0rinc force-pushed the l0rinc/block_index_comparators branch 2 times, most recently from da5eb75 to 5cbb9a4 Compare October 31, 2025 13:10
@l0rinc
Copy link
Contributor Author

l0rinc commented Oct 31, 2025

Pushed, thanks for the review, you had a few good points, added you as coauthor.

the gain of using the spaceship operator (which comes at the cost of readability) is marginal.

the spaceship might not be very familiar to us yet, but it's arguably simpler, having fewer moving parts, and some compilers prefer that over the manual comparisons.

@optout21
Copy link
Contributor

optout21 commented Nov 3, 2025

utACK 5cbb9a40873203ea5a4dd0aa7127c9756da2f607

An evident, localized micro-optimization in a hot-path. I haven't run measurements.

@DrahtBot DrahtBot requested review from laanwj and mzumsande November 3, 2025 12:44
@l0rinc l0rinc force-pushed the l0rinc/block_index_comparators branch from 5cbb9a4 to 237eec0 Compare November 3, 2025 12:48
@optout21
Copy link
Contributor

optout21 commented Nov 3, 2025

reACK 6e9061f

Rebase, no content changes. 237eec0f7ca095bc60e40ee94ba1a160a7064753..6e9061f
Prev: utACK 237eec0f7ca095bc60e40ee94ba1a160a7064753
Re-acked (after 7 mins :D ), only minor/formatting changes since last review

@maflcko
Copy link
Member

maflcko commented Nov 6, 2025

Profiling the performance regression in #33618 (comment) revealed that CBlockIndexWorkComparator and its underlying base_uint<256u>::CompareTo are hot paths during block validation, consuming ~4% of CPU time.

I don't follow here. The issue is about the additional CWallet::WriteBestBlock overhead? CBlockIndexWorkComparator seems unrelated to the issue, at least I can't see how it changed the result between 29.x and 30.x.

Instead of mentioning an unrelated issue, I think it could be better to mention what real-world effect this improvement has. I think that is -checkblockindex (default in regtest) and possibly LoadExternalBlockFile/LoadBlockIndex?

@l0rinc
Copy link
Contributor Author

l0rinc commented Nov 6, 2025

CBlockIndexWorkComparator seems unrelated to the issue

The original issue won't be fixed since it's not exactly a bug, but we could still speed up the regression by optimizing another bottleneck.
Since you found the etymology confusing, I have removed the details from the description, only mentioned #33618 (comment) for the flame graph that show this being one of the bottlenecks. Not the biggest one, but the one we can easily optimize and clean up.

I think that is -checkblockindex (default in regtest)

Yes, this was mentioned in the benchmarking section.

possibly LoadExternalBlockFile/LoadBlockIndex

Yes, it's used explicitly in Chainstate in FindMostWorkChain, PruneBlockIndexCandidates, ActivateBestChain and InvalidateBlock.
And ChainstateManager in LoadBlockIndex, CheckBlockIndex, ActivateSnapshot and PopulateAndValidateSnapshot.
But the setBlockIndexCandidates usages should already suffice as motivation, I'd say:

git grep 'setBlockIndexCandidates.' src/validation.cpp | wc -l 
   44

// Pointer tiebreak should only happen with blocks loaded from disk, as those share the same id: 0 for blocks on the best chain, 1 for all others.
bool operator()(const CBlockIndex* pa, const CBlockIndex* pb) const noexcept
{
return std::tie(pa->nChainWork, pb->nSequenceId, pb)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of eliminating the worst case here (when pa == pb, where it has to do the comparison for every byte) with something like:

return pa != pb &&
       std::tie(pa->nChainWork, pb->nSequenceId, pb)
     < std::tie(pb->nChainWork, pa->nSequenceId, pa);

But the benchmarks indicate that it's not really faster.

benchmark with pointer duplicates
// Copyright (c) 2025-present The Bitcoin Core developers
// Distributed under the MIT software license, see the accompanying
// file COPYING or http://www.opensource.org/licenses/mit-license.php.

#include <arith_uint256.h>
#include <bench/bench.h>
#include <chain.h>
#include <node/blockstorage.h>
#include <random.h>
#include <uint256.h>

#include <algorithm>
#include <memory>
#include <vector>

static void CBlockIndexWorkComparator(benchmark::Bench& bench)
{
    FastRandomContext rng{/*fDeterministic=*/true};

    constexpr size_t n{1'000};
    std::vector<std::shared_ptr<CBlockIndex>> blocks;
    blocks.reserve(n);
    for (size_t i{0}; i < n; ++i) {
        auto block{std::make_shared<CBlockIndex>()};
        block->nChainWork = UintToArith256(rng.rand256());
        block->nSequenceId = int32_t(rng.rand32());
        if (i % 10 == 1) {
            // Have some duplicates
            if (rng.randbool()) {
                block = blocks.back();
            } else {
                if (rng.randbool()) block->nChainWork = blocks.back()->nChainWork;
                if (rng.randbool()) block->nSequenceId = blocks.back()->nSequenceId;
            }
        }
        blocks.push_back(std::move(block));
    }
    std::ranges::shuffle(blocks, rng);

    const node::CBlockIndexWorkComparator comparator;
    bench.batch(n * n).unit("cmp").run([&] {
        for (size_t i{0}; i < n; ++i) {
            const auto* lhs{blocks[i].get()};
            for (size_t j{0}; j < n; ++j) {
                const auto* rhs{blocks[j].get()};
                const bool result{comparator(lhs, rhs)};
                ankerl::nanobench::doNotOptimizeAway(result);
            }
        }
    });
}

BENCHMARK(CBlockIndexWorkComparator, benchmark::PriorityLevel::HIGH);

b17504b refactor: optimize CBlockIndexWorkComparator with std::tie

ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark
2.13 469,485,340.38 0.2% 15.07 5.09 2.959 4.02 0.1% 11.01 CBlockIndexWorkComparator
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
90,265.10 11,078.48 0.1% 637,536.00 215,946.18 2.952 178,844.00 0.0% 10.84 CheckBlockIndex

2b77e56 refactor: short-circuit CBlockIndex comparator for equality

ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark
2.60 385,002,212.20 0.0% 18.00 6.22 2.894 5.00 0.0% 11.00 CBlockIndexWorkComparator
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
94,110.53 10,625.80 0.0% 682,454.03 225,242.05 3.030 186,486.01 0.0% 11.00 CheckBlockIndex

l0rinc and others added 5 commits November 25, 2025 15:50
Profiling shows this comparator consumes a significant portion of `CheckBlockIndex()`:
... ChainstateManager::CheckBlockIndex()
    ... std::_Rb_tree<...>::find(...)
        ... node::CBlockIndexWorkComparator::operator()(...)
            ... base_uint<256u>::CompareTo(...) const

This commit introduces a benchmark that performs pairwise comparisons on 1000 randomly generated block indices (with some duplicates) to establish baseline performance metrics before further optimization.

|              ns/cmp |               cmp/s |    err% |         ins/cmp |         cyc/cmp |    IPC |        bra/cmp |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|                9.92 |      100,772,511.62 |    0.0% |           63.98 |           35.64 |  1.795 |          14.17 |    1.9% |      5.50 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|          109,996.46 |            9,091.20 |    0.2% |    1,014,421.11 |      394,979.29 |  2.568 |     313,025.11 |    0.0% |      5.50 | `CheckBlockIndex`
Add equivalence tests to verify behavioral compatibility when optimizing comparison operations.
These are duplicating behavior for now, but this way the reviewers can validate that they behave the same way before the optimizations.

The `arith_uint256` test verifies that the spaceship operator produces identical results to the original `CompareTo` method for all comparison operators (<, >, <=, >=, ==, !=).

The `CBlockIndexWorkComparator` test captures the current comparison logic in a lambda and verifies that optimized versions maintain identical sorting behavior for chain work, sequence ID, and pointer tiebreaking.

To make sure all paths are taken, we randomly assign previously encountered values.

You can run the tests with:
> cmake -B build && cmake --build build && ./build/bin/test_bitcoin --run_test=arith_uint256_tests,blockchain_tests

Co-authored-by: Martin Zumsande <[email protected]>
Remove the `CompareTo` method and inline its logic directly into `operator<=>`, updating related comments.
This eliminates function call overhead in the hot path during block generation and chain selection.

The comparison algorithm remains unchanged, iterating from most significant to least significant byte, but now returns `std::strong_ordering` directly, removing an extra conversion and enabling better inlining.

|              ns/cmp |               cmp/s |    err% |         ins/cmp |         cyc/cmp |    IPC |        bra/cmp |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|                6.48 |      154,439,491.66 |    0.0% |           31.01 |           23.25 |  1.334 |           7.16 |    3.8% |      5.50 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|          105,754.86 |            9,455.83 |    0.1% |      913,130.11 |      379,588.20 |  2.406 |     276,692.11 |    0.0% |      5.50 | `CheckBlockIndex`
Replace multiple comparisons with a single C++20 spaceship operator call directly.

|              ns/cmp |               cmp/s |    err% |         ins/cmp |         cyc/cmp |    IPC |        bra/cmp |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|                6.37 |      156,990,488.06 |    0.0% |           28.85 |           22.87 |  1.261 |           8.61 |    3.2% |      5.50 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|           83,803.10 |           11,932.73 |    0.0% |      743,565.09 |      300,824.84 |  2.472 |     232,646.08 |    0.0% |      5.56 | `CheckBlockIndex`
Replace manual comparison branches with a single tuple comparison, allowing the compilers to generate more efficient comparison code.
Also, moved it to the header to allow inlining.
For symmetry, `CBlockIndexHeightOnlyComparator` was also moved to the header.

|              ns/cmp |               cmp/s |    err% |         ins/cmp |         cyc/cmp |    IPC |        bra/cmp |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|                1.52 |      656,674,205.98 |    0.0% |           13.18 |            5.47 |  2.410 |           3.06 |    0.1% |      5.50 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|           67,726.84 |           14,765.19 |    0.2% |      585,826.07 |      243,155.90 |  2.409 |     181,920.07 |    0.0% |      5.54 | `CheckBlockIndex`

Co-authored-by: Raimo33 <[email protected]>
Co-authored-by: Martin Zumsande <[email protected]>
@l0rinc l0rinc force-pushed the l0rinc/block_index_comparators branch from 237eec0 to 6e9061f Compare November 25, 2025 14:52
@l0rinc
Copy link
Contributor Author

l0rinc commented Nov 25, 2025

Rebased after 4f65a1c#diff-f14e9423bd43609969886b2e42751c55606344e0b71780564cdb23515802cec0R12, ready for review again.

l0rinc referenced this pull request in l0rinc/bitcoin Nov 26, 2025
|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|              131.79 |        7,588,049.73 |    0.1% |     10.78 | `PrivateBroadcastBench`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants