optimization: buffer reads(23%)/writes(290%) in [undo]block [de]serialization, 6% faster IBD #31539

l0rinc · 2024-12-19T11:09:38Z

An alternative implementation of #31551, where the block is read into memory in chunks, resulting in similar gains, but is slightly more complicated, but uses less memory.

This is a precursor to #31144.

Currently, obfuscation operations are performed byte-by-byte during serialization. Buffering the reads allows batching these operations (implemented in #31144) and improves file access efficiency by reducing fread calls and associated locking overhead. Testing with various buffer sizes showed that 16 KiB provides the best performance.

For writes, buffering enables batched obfuscations directly on the internal buffer, avoiding the need to copy input spans for obfuscation. Xor key offsets are calculated based on the file position, and the batched obfuscation is applied before writing to disk.

BufferedFile was avoided for both reads and writes due to its unrelated operations and potential deprecation.

2 full IBD runs against master for 870k blocks (seeded from real nodes) indicates a 6% total speedup.

Details

hyperfine --runs 2 --export-json /mnt/my_storage/ibd-xor-buffered.json --parameter-list COMMIT d73f37dda221835b5109ede1b84db2dc7c4b74a1,6853b2740851befffa3ca0b24d94212ed1e48d66 --prepare 'rm -rf /mnt/my_storage/BitcoinData/* && git checkout {COMMIT} && git clean -fxd && git reset --hard && cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_UTIL=OFF -DBUILD_TX=OFF -DBUILD_TESTS=OFF -DENABLE_WALLET=OFF -DINSTALL_MAN=OFF && cmake --build build -j$(nproc)' 'COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=870000 -printtoconsole=0'

Benchmark 1: COMMIT=d73f37dda221835b5109ede1b84db2dc7c4b74a1 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=870000 -printtoconsole=0
  Time (mean ± σ):     40216.674 s ± 113.132 s    [User: 51496.289 s, System: 3541.340 s]
  Range (min … max):   40136.678 s … 40296.671 s    2 runs

Benchmark 2: COMMIT=6853b2740851befffa3ca0b24d94212ed1e48d66 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=870000 -printtoconsole=0
  Time (mean ± σ):     37964.015 s ± 624.115 s    [User: 49086.037 s, System: 3375.072 s]
  Range (min … max):   37522.699 s … 38405.331 s    2 runs

Summary
  COMMIT=6853b2740851befffa3ca0b24d94212ed1e48d66 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=870000 -printtoconsole=0 ran
    1.06 ± 0.02 times faster than COMMIT=d73f37dda221835b5109ede1b84db2dc7c4b74a1 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=870000 -printtoconsole=0

Microbenchmarks show a ~23%/290% speedup (the followup XOR batching improves this further):

cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc) && build/src/bench/bench_bitcoin -filter='ReadBlockBench|SaveBlockBench' -min-time=10000

Before:

ns/op	op/s	err%	total	benchmark
2,288,264.16	437.01	0.2%	11.00	`ReadBlockBench`
5,260,934.43	190.08	1.2%	11.08	`SaveBlockBench`

After optimization: Buffer serialization reads in UndoRead and ReadBlock (23% faster read):

ns/op	op/s	err%	total	benchmark
1,847,259.64	541.34	0.2%	11.03	`ReadBlockBench`

After optimization: Buffer serialization writes in UndoWrite and WriteBlock (290% faster write):

ns/op	op/s	err%	total	benchmark
1,804,208.61	554.26	1.4%	10.89	`SaveBlockBench`

DrahtBot · 2024-12-19T11:09:42Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31539.

Reviews

See the guideline for information on the review process.
A summary of reviews will appear here.

Conflicts

No conflicts as of last run.

…k` to reduce serialization calls 223081e scripted-diff: rename block and undo functions for consistency (Lőrinc) baaa3b2 refactor,blocks: remove costly asserts and modernize affected logs (Lőrinc) fa39f27 refactor,blocks: deduplicate block's serialized size calculations (Lőrinc) dfb2f9d refactor,blocks: inline `WriteBlockToDisk` (Lőrinc) 42bc491 refactor,blocks: inline `UndoWriteToDisk` (Lőrinc) 86b85bb bench: add SaveBlockBench (Lőrinc) 34f9a01 refactor,bench: rename bench/readblock.cpp to bench/readwriteblock.cpp (Lőrinc) Pull request description: `UndoWriteToDisk` and `WriteBlockToDisk` were delegating a subset of their functionality to single-use methods that didn't optimally capture a meaningful chunk of the algorithm, resulting in calculating things twice (serialized size, header size). This change inlines the awkward methods (asserting that all previous behavior was retained), and in separate commits makes the usages less confusing. Besides making the methods slightly more intuitive, the refactorings reduce duplicate calculations as well. The speed difference is insignificant for now (~0.5% for the new `SaveBlockToDiskBench`), but are a cleanup for follow-ups such as #31539 ACKs for top commit: ryanofsky: Code review ACK 223081e. Since last review, "Save" was renamed to "Write", uint32_t references were dropped, some log statements and comments were improved as suggested, and a lot of tweaks made to commits and commit messages which should make this easier to review. hodlinator: ACK 223081e TheCharlatan: ACK 223081e andrewtoth: ACK 223081e Tree-SHA512: 951bc8ad3504c510988afd95c561e3e259c6212bd14f6536fe56e8eb5bf5c35c32a368bbdb1d5aea1acc473d7e5bd9cdcde02008a148b05af1f955e413062d5c

l0rinc · 2025-01-23T09:37:55Z

Not sure what the failure means:

100% tests passed, 0 tests failed out of 138

...

Error: Process completed with exit code 143.

maflcko · 2025-01-23T10:09:06Z

Not sure what the failure means:

100% tests passed, 0 tests failed out of 138

...

Error: Process completed with exit code 143.

The unit tests are different from the functional tests. I guess the failure is due to a GHA VM corruption or OOM?

l0rinc · 2025-01-23T17:39:58Z

I'll push a few more drafts to find out why CI is crashing - bear with me.

maflcko · 2025-01-23T19:11:07Z

The unit tests are not failing. This can be seen in the output (100% tests passed, 0 tests failed out of 138). So skipping them should have no effect.

maflcko · 2025-01-23T20:14:01Z

My recommendation would be to measure the asan CI max memory usage before and after this change. Maybe it is not largely different, but just the last drip to tip an OOM?

l0rinc · 2025-01-23T20:47:20Z

You may be right, though I don't yet see how this memory saving option keeps crashing after the rebase (it was passing before), while the memory hogging alternative is passing.
bitcoin-dev-tools#123 (comment) shows the before/after memory to be the same.

I suspected a stuck file or process or infinite loop...

asan CI max memory usage

I don't yet know how to run that on a Mac (and my linux boxes are running IBDs for now).
I'll push a few more rounds here to try to understand which part is causing the problem - and why only for ASan + LSan + UBSan + integer, if they won't reveal anything, I'll investigate how to replicate the CI behavior locally.

The Obfuscation (XOR) operations are currently done byte-by-byte during serialization, buffering the reads will enable batching the obfuscation operations later (not yet done here). Also, different operating systems seem to handle file caching differently, so reading bigger batches (and processing those from memory) is also a bit faster (likely because of fewer native fread calls or less locking). Buffered reading was done by first exhausting the buffer, and if more data is still needed, reading directly into the destination (to avoid copying into the buffer when we have huge destination Spans, such as the many megabyte vectors in later blocks), and lastly refilling the buffer completely. Running the `ReadBlockBench` benchmarks with different buffer sizes indicated that 16 KiB is the optimal buffer size (roughly 25% faster than master). Testing was done by randomly reading data from the same file with `AutoFile` and `BufferedReadOnlyFile` and making sure that the exact same data is read. `BufferedFile` wasn't used here because it does too many unrelated operations (and might be removed in the future). Before: | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 2,288,264.16 | 437.01 | 0.2% | 11.00 | `ReadBlockBench` After: | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 1,847,259.64 | 541.34 | 0.2% | 11.03 | `ReadBlockBench`

…eBlock` Similarly to the serialization reads, buffered writes will enable batched xor calculations - especially since currently we need to copy the write inputs Span to do the obfuscation on it, batching will enable doing the xor on the internal buffer instead. All write operations are delegated to `AutoFile`. Xor key offsets are also calculated based on where we are in the underlying file. To avoid the buffered write of 4096 in `AutoFile.write`, we're disabling obfuscation for the underlying `m_dest` and doing it ourselves for the whole buffer (instead of byte-by-byte) before writing it to file. We can't obfuscate the write's src directly since it would mutate the method's input (for very big writes it would avoid copying), but we can fill our own buffer and xor all of that safely. Running the `SaveBlockBench` benchmarks with different buffer sizes indicated that 1 MiB was the optimal buffer size. `BufferedFile` wasn't used here because it does too many unrelated operations (and might be removed in the future). Before: | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 5,260,934.43 | 190.08 | 1.2% | 11.08 | `SaveBlockBench` After: | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 1,804,208.61 | 554.26 | 1.4% | 10.89 | `SaveBlockBench`

…and `SaveBlock`" This reverts commit b1b55a2.

maflcko · 2025-01-24T10:50:21Z

I'll investigate how to replicate the CI behavior locally

It may be sufficient to just copy-paste the cxx flags into a normal build. In the future a cmake preset could be added, see also #31199 (comment)

l0rinc · 2025-01-24T11:07:33Z

It looks like the problem was with BufferedWriteOnlyFile - if needed, I will investigate further.

Given that there is an alternative PR to this (reading the whole block into memory (instead of smaller chunks) - which we're already doing in other places, so shouldn't be controversial), I'll close this PR. If the memory allocations come up in that, I'll consider reopening.

l0rinc mentioned this pull request Dec 19, 2024

refactor: inline UndoWriteToDisk and WriteBlockToDisk to reduce serialization calls #31490

Merged

l0rinc changed the title ~~optimization: buffer reads/writes in [undo]blocks~~ optimization: buffer reads/writes in [undo]block [de]serialization Dec 19, 2024

This was referenced Dec 20, 2024

[IBD] multi-byte block obfuscation #31144

Merged

util: explicitly close all AutoFiles that have been written #29307

Merged

reduce cs_main scope, guard block index 'nFile' under a local mutex #27006

Closed

ismaelsadeeq mentioned this pull request Dec 20, 2024

Faster way to get block with prevouts in JSON-RPC #30495

Open

l0rinc force-pushed the l0rinc/buffered-block-read-write branch from 6853b27 to 4ef1507 Compare December 21, 2024 10:08

l0rinc changed the title ~~optimization: buffer reads/writes in [undo]block [de]serialization~~ optimization: buffer reads(23%)/writes(19%) in [undo]block [de]serialization Dec 21, 2024

l0rinc mentioned this pull request Dec 21, 2024

[IBD] batch block reads/writes during AutoFile serialization #31551

Merged

l0rinc force-pushed the l0rinc/buffered-block-read-write branch from 4ef1507 to 1d99994 Compare December 21, 2024 14:53

l0rinc changed the title ~~optimization: buffer reads(23%)/writes(19%) in [undo]block [de]serialization~~ optimization: buffer reads(23%)/writes(290%) in [undo]block [de]serialization Dec 21, 2024

l0rinc changed the title ~~optimization: buffer reads(23%)/writes(290%) in [undo]block [de]serialization~~ optimization: buffer reads(23%)/writes(290%) in [undo]block [de]serialization, 6% faster IBD Dec 21, 2024

bitcoin deleted a comment from sebm123 Jan 4, 2025

DrahtBot added the Needs rebase label Jan 22, 2025

l0rinc force-pushed the l0rinc/buffered-block-read-write branch 2 times, most recently from c2fa31f to 0caf94a Compare January 22, 2025 20:15

DrahtBot removed the Needs rebase label Jan 22, 2025

l0rinc force-pushed the l0rinc/buffered-block-read-write branch from 0caf94a to b5eeb50 Compare January 22, 2025 22:53

l0rinc marked this pull request as ready for review January 22, 2025 22:53

l0rinc marked this pull request as draft January 23, 2025 10:41

l0rinc force-pushed the l0rinc/buffered-block-read-write branch from b5eeb50 to 05ff5ee Compare January 23, 2025 15:48

l0rinc force-pushed the l0rinc/buffered-block-read-write branch from 05ff5ee to 19543d9 Compare January 23, 2025 17:49

l0rinc force-pushed the l0rinc/buffered-block-read-write branch from c2640e5 to 75e85be Compare January 23, 2025 20:52

l0rinc added 2 commits January 23, 2025 22:55

l0rinc force-pushed the l0rinc/buffered-block-read-write branch from 75e85be to b1b55a2 Compare January 23, 2025 21:56

Revert "optimization: Buffer serialization writes in SaveBlockUndo …

9f8a93e

…and `SaveBlock`" This reverts commit b1b55a2.

l0rinc closed this Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

optimization: buffer reads(23%)/writes(290%) in [undo]block [de]serialization, 6% faster IBD #31539

optimization: buffer reads(23%)/writes(290%) in [undo]block [de]serialization, 6% faster IBD #31539

Uh oh!

l0rinc commented Dec 19, 2024 •

edited

Loading

Uh oh!

DrahtBot commented Dec 19, 2024 •

edited

Loading

Uh oh!

l0rinc commented Jan 23, 2025

Uh oh!

maflcko commented Jan 23, 2025

Uh oh!

l0rinc commented Jan 23, 2025

Uh oh!

maflcko commented Jan 23, 2025

Uh oh!

maflcko commented Jan 23, 2025

Uh oh!

l0rinc commented Jan 23, 2025

Uh oh!

maflcko commented Jan 24, 2025 •

edited

Loading

Uh oh!

l0rinc commented Jan 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

optimization: buffer reads(23%)/writes(290%) in [undo]block [de]serialization, 6% faster IBD #31539

optimization: buffer reads(23%)/writes(290%) in [undo]block [de]serialization, 6% faster IBD #31539

Uh oh!

Conversation

l0rinc commented Dec 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DrahtBot commented Dec 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage & Benchmarks

Reviews

Conflicts

Uh oh!

l0rinc commented Jan 23, 2025

Uh oh!

maflcko commented Jan 23, 2025

Uh oh!

l0rinc commented Jan 23, 2025

Uh oh!

maflcko commented Jan 23, 2025

Uh oh!

maflcko commented Jan 23, 2025

Uh oh!

l0rinc commented Jan 23, 2025

Uh oh!

maflcko commented Jan 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

l0rinc commented Jan 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

l0rinc commented Dec 19, 2024 •

edited

Loading

DrahtBot commented Dec 19, 2024 •

edited

Loading

maflcko commented Jan 24, 2025 •

edited

Loading