-
Notifications
You must be signed in to change notification settings - Fork 38.7k
node: optimize CBlockIndexWorkComparator #33334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33334. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
Refactor the comparator logic in CBlockIndexWorkComparator::operator() to reduce the amounts of branches and improve readability without changing semantics. The previous implementation used multiple separate comparisons with explicit branches for greater-than and less-than cases, resulting in unnecessary code paths. The new implementation consolidates comparisons into single inequality checks and reduces complexity while preserving its original behavior. This change is particularly beneficial for loading blocks from files and reindexing. taskset -c 1 ./bin/bench_bitcoin --filter="(CheckBlockIndex|LoadExternalBlockFile|BlockToJsonVerboseWrite)" -output-csv=bench_old.csv --min-time=30000 | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 26,557,419.20 | 37.65 | 0.1% | 32.85 | `BlockToJsonVerboseWrite` | 129,988.82 | 7,692.97 | 0.0% | 33.02 | `CheckBlockIndex` | 21,661,396.96 | 46.17 | 0.5% | 32.37 | `LoadExternalBlockFile` taskset -c 1 ./bin/bench_bitcoin --filter="(CheckBlockIndex|LoadExternalBlockFile|BlockToJsonVerboseWrite|WalletIsMineDescriptors)" -output-csv=bench_new.csv --min-time=30000 | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 27,930,130.95 | 35.80 | 0.1% | 32.96 | `BlockToJsonVerboseWrite` | 115,346.65 | 8,669.52 | 0.0% | 33.00 | `CheckBlockIndex` | 20,389,679.85 | 49.04 | 0.4% | 31.76 | `LoadExternalBlockFile`
5a9e710 to
80ac046
Compare
l0rinc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK - please consider my suggestion
| bool CBlockIndexWorkComparator::operator()(const CBlockIndex* pa, const CBlockIndex* pb) const | ||
| { | ||
| // First sort by most total work, ... | ||
| if (pa->nChainWork > pb->nChainWork) return false; | ||
| if (pa->nChainWork < pb->nChainWork) return true; | ||
| if (pa->nChainWork != pb->nChainWork) { | ||
| return pa->nChainWork < pb->nChainWork; | ||
| } | ||
|
|
||
| // ... then by earliest time received, ... | ||
| if (pa->nSequenceId < pb->nSequenceId) return false; | ||
| if (pa->nSequenceId > pb->nSequenceId) return true; | ||
| if (pa->nSequenceId != pb->nSequenceId) { | ||
| return pa->nSequenceId > pb->nSequenceId; | ||
| } | ||
|
|
||
| // Use pointer address as tie breaker (should only happen with blocks | ||
| // loaded from disk, as those all have id 0). | ||
| if (pa < pb) return false; | ||
| if (pa > pb) return true; | ||
|
|
||
| // Identical blocks. | ||
| return false; | ||
| return pa > pb; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this does is compare by 3 parameters - wouldn't it be simpler to use std::tie for lexicographical comparison instead?
| bool CBlockIndexWorkComparator::operator()(const CBlockIndex* pa, const CBlockIndex* pb) const | |
| { | |
| // First sort by most total work, ... | |
| if (pa->nChainWork > pb->nChainWork) return false; | |
| if (pa->nChainWork < pb->nChainWork) return true; | |
| if (pa->nChainWork != pb->nChainWork) { | |
| return pa->nChainWork < pb->nChainWork; | |
| } | |
| // ... then by earliest time received, ... | |
| if (pa->nSequenceId < pb->nSequenceId) return false; | |
| if (pa->nSequenceId > pb->nSequenceId) return true; | |
| if (pa->nSequenceId != pb->nSequenceId) { | |
| return pa->nSequenceId > pb->nSequenceId; | |
| } | |
| // Use pointer address as tie breaker (should only happen with blocks | |
| // loaded from disk, as those all have id 0). | |
| if (pa < pb) return false; | |
| if (pa > pb) return true; | |
| // Identical blocks. | |
| return false; | |
| return pa > pb; | |
| } | |
| bool CBlockIndexWorkComparator::operator()(const CBlockIndex* pa, const CBlockIndex* pb) const | |
| { | |
| // First sort by most total work (descending), then by earliest activatable time (ascending), then by pointer value (ascending) | |
| return std::tie(pa->nChainWork, pb->nSequenceId, pb) < std::tie(pb->nChainWork, pa->nSequenceId, pa); | |
| } |
Note that pa & pb usages are mixed to follow the original algorithm.
Since https://corecheck.dev/bitcoin/bitcoin/pulls/33334 hasn't finished yet, I ran the benchmarks on my Mac with min-time of 100 seconds for stability several times, and took the fastest of each execution (since noise can only slow down execution), which reveals that:
BlockToJsonVerboseWritespeed is completely unaffected in all cases;LoadExternalBlockFileis the same in the current implementation but would be ~7% faster with the suggested solution;CheckBlockIndexis indeed 17% faster for the current solution, but would be 24% faster with the proposed one - all while making the code a lot simpler;
Before:
| ns/op | op/s | err% | total | benchmark |
|---|---|---|---|---|
| 25,322,734.53 | 39.49 | 0.2% | 109.93 | BlockToJsonVerboseWrite |
| 39,049.20 | 25,608.72 | 0.2% | 110.15 | CheckBlockIndex |
| 5,955,124.54 | 167.92 | 0.1% | 110.26 | LoadExternalBlockFile |
After:
| ns/op | op/s | err% | total | benchmark |
|---|---|---|---|---|
| 25,333,028.32 | 39.47 | 0.2% | 109.61 | BlockToJsonVerboseWrite |
| 33,173.57 | 30,144.48 | 0.3% | 109.82 | CheckBlockIndex |
| 6,169,392.66 | 162.09 | 0.4% | 110.79 | LoadExternalBlockFile |
Suggested
std::tie:
| ns/op | op/s | err% | total | benchmark |
|---|---|---|---|---|
| 25,360,146.33 | 39.43 | 0.3% | 109.90 | BlockToJsonVerboseWrite |
| 31,343.25 | 31,904.80 | 0.7% | 104.88 | CheckBlockIndex |
| 5,535,101.51 | 180.67 | 0.7% | 967.85 | LoadExternalBlockFile |
If you decide to take the suggestion, please add me as a coauthor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I experimented with std::tie before. whether it is more readable is debatable. I'd say they're the same in terms of readability (one might not be familiar with std::tie). I ran the benchmarks on my x86 with gcc laptop (not the same machine as my original benchmarks) and got different results from yours.
taskset -c 0 ./bin/bench_bitcoin --filter="(CheckBlockIndex|LoadExternalBlockFile)" --min-time=10000Master:
| ns/op | op/s | err% | total | benchmark |
|---|---|---|---|---|
| 200,279.21 | 4,993.03 | 0.1% | 10.95 | CheckBlockIndex |
| 29,798,546.03 | 33.56 | 0.3% | 10.82 | LoadExternalBlockFile |
This PR:
| ns/op | op/s | err% | total | benchmark |
|---|---|---|---|---|
| 166,134.37 | 6,019.22 | 0.1% | 10.91 | CheckBlockIndex |
| 29,547,138.28 | 33.84 | 0.1% | 10.59 | LoadExternalBlockFile |
CheckBlockIndex: +17% faster than master
LoadExternalBlockFile: same as master
@l0rinc suggestion:
| ns/op | op/s | err% | total | benchmark |
|---|---|---|---|---|
| 193,085.31 | 5,179.06 | 0.0% | 11.00 | CheckBlockIndex |
| 30,456,267.03 | 32.83 | 0.3% | 10.96 | LoadExternalBlockFile |
CheckBlockIndex: +3.6% faster than master
LoadExternalBlockFile: -2.2% slower than master
Overall very different results on different machines/compilers. probably std::tie would be the most standard implementation, giving room for smarter future compilers to optimize. but I genuinely don't know how this could be less than 2 branches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I genuinely don't know how this could be less than 2 branches
let's find out - can you create a godbold reproducer with latest compilers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://godbolt.org/z/6TjP137z1
Looks like the current version wins both on x86 and arm, but I'm no assembly expert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think the std::tie version is better, pushed it as an alternative as part of a bigger related optimization I had. Added you as coauthor there: #33637
Refactor the comparator logic in CBlockIndexWorkComparator::operator() to reduce the amounts of branches and improve readability without changing semantics. The previous implementation used multiple separate comparisons with explicit branches for greater-than and less-than cases, resulting in unnecessary code paths. The new implementation consolidates comparisons into single inequality checks and reduces complexity while preserving its original behavior. This change is particularly beneficial for loading blocks from files and reindexing. taskset -c 1 ./bin/bench_bitcoin --filter="(CheckBlockIndex|LoadExternalBlockFile|BlockToJsonVerboseWrite)" -output-csv=bench_old.csv --min-time=30000 | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 26,557,419.20 | 37.65 | 0.1% | 32.85 | `BlockToJsonVerboseWrite` | 129,988.82 | 7,692.97 | 0.0% | 33.02 | `CheckBlockIndex` | 21,661,396.96 | 46.17 | 0.5% | 32.37 | `LoadExternalBlockFile` taskset -c 1 ./bin/bench_bitcoin --filter="(CheckBlockIndex|LoadExternalBlockFile|BlockToJsonVerboseWrite|WalletIsMineDescriptors)" -output-csv=bench_new.csv --min-time=30000 | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 27,930,130.95 | 35.80 | 0.1% | 32.96 | `BlockToJsonVerboseWrite` | 115,346.65 | 8,669.52 | 0.0% | 33.00 | `CheckBlockIndex` | 20,389,679.85 | 49.04 | 0.4% | 31.76 | `LoadExternalBlockFile` Github-Pull: bitcoin#33334 Rebased-From: 80ac046
|
Follow up in #33637 |
Summary
This PR optimizes
CBlockIndexWorkComparator::operator()by removing 4 redundant branches.The previous implementation used multiple separate comparisons with explicit branches for greater-than and less-than cases, resulting in unnecessary code paths.
The new implementation consolidates comparisons into single inequality checks and reduces complexity while preserving its original behavior. This change is particularly beneficial for loading blocks from files and reindexing.
Benchmarks
taskset -c 1 ./bin/bench_bitcoin --filter="(CheckBlockIndex|LoadExternalBlockFile)" -output-csv=bench_old.csv --min-time=30000CheckBlockIndexLoadExternalBlockFileCheckBlockIndexLoadExternalBlockFileCompared to master:
CheckBlockIndex+12.7% fasterLoadExternalBlockFile+6.2% faster