-
Notifications
You must be signed in to change notification settings - Fork 38.8k
[IBD] specialize CheckBlock's input & coinbase checks #31682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[IBD] specialize CheckBlock's input & coinbase checks #31682
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31682. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
8edc31d to
be627bc
Compare
|
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
d168906 to
0fae6e9
Compare
src/consensus/tx_check.cpp
Outdated
| if (tx.vin.size() == 2) { | ||
| if (tx.vin[0].prevout == tx.vin[1].prevout) [[unlikely]] return state.Invalid(TxValidationResult::TX_CONSENSUS, "bad-txns-inputs-duplicate"); | ||
| } else if (tx.vin.size() >= 3) { | ||
| std::vector<COutPoint> vInOutPoints; | ||
| vInOutPoints.reserve(tx.vin.size()); | ||
| for (const auto& txin : tx.vin) { | ||
| vInOutPoints.push_back(txin.prevout); | ||
| } | ||
| std::sort(vInOutPoints.begin(), vInOutPoints.end()); | ||
| if (std::ranges::adjacent_find(vInOutPoints) != vInOutPoints.end()) [[unlikely]] return state.Invalid(TxValidationResult::TX_CONSENSUS, "bad-txns-inputs-duplicate"); |
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
|
How much of an improvement does this translate into for a user? The bar should be pretty high for touching such critical code. |
0fae6e9 to
ec62db0
Compare
Absolutely, every change should have an extremely high bar here.
The primary goal of this change is to improve block validation performance while maintaining a low risk profile. The PR description includes detailed measurements of the improvements. If there are additional benchmarks or scenarios you’d like me to measure, please let me know, and I’ll be happy to provide them. |
mzumsande
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not just affecting CheckBlock, it is called whenever a tx is validated (so also during mempool acceptance).
The reason that this check is increased significantly percent-wise by these changes might just be that the non-contextual checks are extremely fast in the first place (compared with contextual checks such as script verification / input lookup).
Since the non-contextual checks aren't really ever performed in isolation, an interesting benchmark for me would be how much this improves transaction validation as a whole - whether it's in the context of a mempool acceptance or IBD benchmark.
CheckBlock's latency is critical both for quickly discarding invalid blocks
I don't think that this is the case, since it's guarded by PoW - it's very expensive to create blocks with a valid header that fail CheckBlock, so it's impossible to create them in bulk. So if we do encounter them, I'd say that performance is not really critical, unless the effect is extreme (in the order of seconds / minutes).
Updated the description, thanks.
I can only optimize small parts of these, step-by-step. This was the next part I found that could be cheaply sped up (having preexisting tests, benchmarks - signaling that others were also interested in it). I have created a ProcessTransaction resultsgit checkout bf090135d01076aa6d4fbfe7a185682dd6f9f489 >/dev/null 2>&1 \
&& cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release >/dev/null 2>&1 \
&& cmake --build build -j$(nproc) >/dev/null 2>&1 \
&& build/src/bench/bench_bitcoin -filter='ProcessTransactionBench' -min-time=10000 \
&& git checkout f25b91cd5a50e1035bbdb884ab11858702fadf33 >/dev/null 2>&1 \
&& cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release >/dev/null 2>&1 \
&& cmake --build build -j$(nproc) >/dev/null 2>&1 \
&& build/src/bench/bench_bitcoin -filter='ProcessTransactionBench' -min-time=10000
Similarly, I ran a few reindex-chainstates (as proxies for a stable IBD) and - as posted in the description -, got a similar ~1% speedup (2 runs per bench, |
This comment was marked as abuse.
This comment was marked as abuse.
|
@1440000bytes, this behavior isn't helpful - you're just reiterating what I've already explained, without suggesting workable solutions. Anyway, I've created an alternative PR that focuses solely on the tests and benchmarks, excluding the controversial optimizations. It validates the affected code and removes microbenchmarks that overemphasize this segment's significance, replacing it with a higher-level one, as suggested by @mzumsande. |
This comment was marked as abuse.
This comment was marked as abuse.
ec62db0 to
0a40b8c
Compare
|
I've also retriggered an IBD (simulated via multiple Benchmark resultshyperfine --runs 2 --parameter-list COMMIT dcc3fe7f9c4b68a4b4622ef958a3262cd8a4b08a,053ccbefa3563e776932bb847aae9c302d07de61 --prepare 'rm -rfd /mnt/my_storage/BitcoinData/debug.log build/ && git checkout {COMMIT} && cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_UTIL=OFF -DBUILD_TX=OFF -DBUILD_TESTS=OFF -DENABLE_WALLET=OFF -DINSTALL_MAN=OFF && cmake --build build -j$(nproc)' --cleanup 'mv /mnt/my_storage/BitcoinData/debug.log /mnt/my_storage/logs/debug-{COMMIT}-$(date +%s.log)' 'COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=30000 -reindex-chainstate -connect=0 -printtoconsole=0'
Benchmark 1: COMMIT=dcc3fe7f9c4b68a4b4622ef958a3262cd8a4b08a ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=30000 -reindex-chainstate -connect=0 -printtoconsole=0
Time (mean ± σ): 12537.220 s ± 7.591 s [User: 13362.903 s, System: 523.533 s]
Range (min … max): 12531.852 s … 12542.588 s 2 runs
Benchmark 2: COMMIT=053ccbefa3563e776932bb847aae9c302d07de61 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=30000 -reindex-chainstate -connect=0 -printtoconsole=0
Time (mean ± σ): 12251.387 s ± 52.389 s [User: 13169.814 s, System: 543.757 s]
Range (min … max): 12214.343 s … 12288.432 s 2 runs
Summary
COMMIT=053ccbefa3563e776932bb847aae9c302d07de61 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=30000 -reindex-chainstate -connect=0 -printtoconsole=0 ran
1.02 ± 0.00 times faster than COMMIT=dcc3fe7f9c4b68a4b4622ef958a3262cd8a4b08a ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=30000 -reindex-chainstate -connect=0 -printtoconsole=0This reveals a ~2.5% speedup (taking the best runs out of 2). |
CheckBlock input duplicate detectionCheckBlock input duplicate detection
CheckBlock input duplicate detectionCheckBlock input duplicate detection
This comment was marked as abuse.
This comment was marked as abuse.
19fbd5e to
f159a77
Compare
f159a77 to
d28302f
Compare
|
Rebased to fix conflicts with #33116 |
|
Concept ACK. will review soon. I'm a fan of the simpler std::vector approach. |
I agree but I argue the changes proposed are small enough that this should be seriously considered for merge. the diff makes it hard to grasp, but few actually changed, and for the best. I've ran the tests and double checked the logic. taskset -c 1 ./bench_bitcoin --filter="(ProcessTransactionBench)" --min-time=10000
- else if (tx.vin.size() == 2) {
- if (tx.vin[0].prevout == tx.vin[1].prevout) {
- return state.Invalid(TxValidationResult::TX_CONSENSUS, "bad-txns-inputs-duplicate");
- }
- if (tx.vin[0].prevout.IsNull() || tx.vin[1].prevout.IsNull()) {
- return state.Invalid(TxValidationResult::TX_CONSENSUS, "bad-txns-prevout-null");
- }
-}
+1.6% faster Looks like the gain comes from handling txs with 1-2 inputs separately. |
The `CheckTransaction` validation function in https://github.com/bitcoin/bitcoin/blob/master/src/consensus/tx_check.cpp#L41-L45 relies on a correct ordering relation for detecting duplicate transaction inputs. These test updates ensure: * Accurate detection of duplicates beyond trivial cases (e.g., two identical inputs); * Consistent behavior between sorted‑set and hash‑set duplicate checks when detecting duplicates for `COutPoint` and related values; * The function maintains expected behavior for ordering and equality checks. Using randomized testing with shuffled inputs (to avoid any remaining bias introduced), the enhanced test validates that `CheckTransaction` remains robust and reliable across various input configurations. It confirms identical behavior to a hashing-based duplicate detection mechanism, ensuring consistency and correctness. To make sure the new branches in the follow-up commits will be covered, `basic_transaction_tests` was extended a randomized test one comparing against the old implementation (and also an alternative duplicate). The iterations and ranges were chosen such that every new branch is expected to be hit once.
> cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc) && build/src/bench/bench_bitcoin -filter='CheckBlockBench|DuplicateInputs' -min-time=10000 > C++ compiler .......................... AppleClang 16.0.0.16000026 | ns/block | block/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 372,743.63 | 2,682.81 | 1.1% | 10.99 | `CheckBlockBench` | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 3,304,694.54 | 302.60 | 0.5% | 11.05 | `DuplicateInputs` > C++ compiler .......................... GNU 13.3.0 | ns/block | block/s | err% | ins/block | cyc/block | IPC | bra/block | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 1,096,261.84 | 912.19 | 0.1% | 7,963,390.88 | 3,487,375.26 | 2.283 | 1,266,941.00 | 1.8% | 11.03 | `CheckBlockBench` | ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 8,366,309.48 | 119.53 | 0.0% | 23,865,177.67 | 26,620,160.23 | 0.897 | 5,972,887.41 | 4.0% | 10.78 | `DuplicateInputs`
`IsCoinBase` means a single input with Null prevout, so restrict duplicate checks to non-coinbase transactions. Behavior is unchanged, except that single-input transactions aren’t checked for duplicates anymore (~70–90% of cases, see https://transactionfee.info/charts/transactions-1in). Added braces to conditions and loops to simplify review of follow‑up commits. > cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc) && build/src/bench/bench_bitcoin -filter='CheckBlockBench|DuplicateInputs|ProcessTransactionBench' -min-time=10000 > C++ compiler .......................... AppleClang 16.0.0.16000026 | ns/block | block/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 335,917.12 | 2,976.92 | 1.3% | 11.01 | `CheckBlockBench` | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 3,286,337.42 | 304.29 | 1.1% | 10.90 | `DuplicateInputs` | 9,561.02 | 104,591.35 | 0.2% | 11.02 | `ProcessTransactionBench`
No need to create a set to check duplicates for two-input transactions. > cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc) && build/src/bench/bench_bitcoin -filter='CheckBlockBench|DuplicateInputs|ProcessTransactionBench' -min-time=10000 > C++ compiler .......................... AppleClang 16.0.0.16000026 | ns/block | block/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 314,137.30 | 3,183.32 | 1.2% | 11.04 | `CheckBlockBench` | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 3,220,592.73 | 310.50 | 1.3% | 10.92 | `DuplicateInputs` | 9,425.98 | 106,089.77 | 0.3% | 11.00 | `ProcessTransactionBench`
A pre-sized vector retains locality (enabling SIMD operations), speeding up sorting and equality checks. It's also simpler (and therefore more reliable) than a sorted set, and causes less memory fragmentation. > cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc) && build/src/bench/bench_bitcoin -filter='CheckBlockBench|DuplicateInputs|ProcessTransactionBench' -min-time=10000 > C++ compiler .......................... AppleClang 16.0.0.16000026 | ns/block | block/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 181,922.54 | 5,496.85 | 0.2% | 10.98 | `CheckBlockBench` | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 997,739.30 | 1,002.27 | 1.0% | 10.94 | `DuplicateInputs` | 9,449.28 | 105,828.15 | 0.3% | 10.99 | `ProcessTransactionBench` Co-authored-by: Pieter Wuille <[email protected]>
For the two-input case, we simply check both inputs, as we did with equality. For the general case, take advantage of sorting, making invalid prevout detection constant time instead of worst-case linear. > cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc) && build/src/bench/bench_bitcoin -filter='CheckBlockBench|DuplicateInputs|ProcessTransactionBench' -min-time=10000 > C++ compiler .......................... AppleClang 16.0.0.16000026 | ns/block | block/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 179,971.00 | 5,556.45 | 0.3% | 11.02 | `CheckBlockBench` | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 963,177.98 | 1,038.23 | 1.7% | 10.92 | `DuplicateInputs` | 9,410.90 | 106,259.75 | 0.3% | 11.01 | `ProcessTransactionBench` > C++ compiler .......................... GNU 13.3.0 | ns/block | block/s | err% | ins/block | cyc/block | IPC | bra/block | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 834,855.94 | 1,197.81 | 0.0% | 6,518,548.86 | 2,656,039.78 | 2.454 | 919,160.84 | 1.5% | 10.78 | `CheckBlockBench` | ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 4,261,492.75 | 234.66 | 0.0% | 17,379,823.40 | 13,559,793.33 | 1.282 | 4,265,714.28 | 3.4% | 11.00 | `DuplicateInputs` | 55,819.53 | 17,914.88 | 0.1% | 227,828.15 | 177,520.09 | 1.283 | 15,184.36 | 0.4% | 10.91 | `ProcessTransactionBench`
|
Rebased after #33629, removed the conflicting |
d28302f to
1fb312f
Compare
|
🐙 This pull request conflicts with the target branch and needs rebase. |
This change is part of [IBD] - Tracking PR for speeding up Initial Block Download
Summary
CheckBlock's latency is critical for efficiently validating correct inputs during transaction validation, including mempool acceptance and new block creation.This PR improves performance and maintainability by introducing the following changes:
std::setwith sortedstd::vectorfor improved locality.Context, concerns
As noted by Sipa in #31682 (review), this is consensus-critical code (especially since we've had an actual inflation bug related to duplicate checking).
However, the changes here are very confined to a single function, easily reviewed, and I have a general preference for avoiding
std::setwhen sortedstd::vectorworks too as well.The goal of this change is to make block validation faster via a low-risk alternative.
Measurements
C++ compiler .......................... AppleClang 16.0.0.16000026
CheckBlockBenchDuplicateInputsProcessTransactionBenchCheckBlockBenchDuplicateInputsProcessTransactionBenchCheckBlockBenchDuplicateInputsProcessTransactionBenchC++ compiler .......................... GNU 13.3.0
CheckBlockBenchDuplicateInputsProcessTransactionBenchCheckBlockBenchDuplicateInputsProcessTransactionBenchCheckBlockBenchDuplicateInputsProcessTransactionBenchWhile the point of the change isn't necessarily to speed up IBD, but you can see the measurements at #31682 (comment)
Related PRs: