-
Notifications
You must be signed in to change notification settings - Fork 38.8k
blocks: add -reobfuscate-blocks argument to enable (de)obfuscating existing blocks
#33324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33324. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please copy-paste ConflictsNo conflicts as of last run. LLM Linter (✨ experimental)Possible places where named args for integral literals may be used (e.g.
2025-12-10 |
e66f04d to
bb50372
Compare
2933b17 to
d3962f6
Compare
|
Concept ACK |
|
Concept ACK, this seems like useful functionality to expose.
I don't like using startup options for one-time operations (I feel the same about e.g.
With this PR, IIUC we'd have |
d3962f6 to
aa587f3
Compare
ajtowns
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having it be a startup option like -reindex seems fine to me.
src/init.cpp
Outdated
| auto migrate_single_blockfile{[&](const fs::path& file, const Obfuscation& delta_obfuscation, std::vector<std::byte>& buf) -> bool { | ||
| AutoFile old_blocks{fsbridge::fopen(file, "rb"), delta_obfuscation}; // deobfuscate & reobfuscate with a single combined key | ||
| buf.resize(fs::file_size(file)); // reuse buffer | ||
| old_blocks.read(buf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than reading the entire blockfile into memory at once, consider chunking it:
size_t left = fs::file_size(file);
while (left > 0) {
size_t chunk = std::min<size_t>(left, 2 * MAX_BLOCK_SERIALIZED_SIZE);
buf.resize(chunk);
old_blocks.read(buf);
new_blocks.write_buffer(buf);
left -= chunk;
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do that with the recently introduced buffered readers - but that's considerably slower.
Is it a problem to read all of it in memory when we don't even have dbcache yet? The total memory usage is just 160 MB during migration, we should be fine until 1 GB at least, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think buffered readers is the right thing (that's for when you want to process small amounts of data while still reading it from the file in large chunks), and trying the above didn't seem particularly slow to me.
I guess it could be simplified a bit to:
buf.resize(2 * MAX_BLOCK_SERIALIZED_SIZE);
while (true) {
size_t size = old_blocks.detail_fread(buf);
if (size == 0) break;
new_blocks.write_buffer(std::span(buf, 0, size));
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's problem would chunking solve in your opinion? I don't mind doing it, but the current version is slightly simpler and slightly faster, so I need at least some justification for giving up both :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Loading a large file entirely into memory when it's not necessary is just bad practice. What if we changed to .blk files of 1GB each? What if we're running on a node that's memory constrained and configured dbcache down to 4MB?
If we're worried about speed, then doing it in parallel helps on my system since obfuscation ends up CPU bound when single-threaded -- with the current code, it takes 238s (4min); with 8 threads it's 65s; with 16 threads it's 47s. Using 16MB chunks (BLOCKFILE_CHUNK_SIZE), 8 threads is also ~128MB of memory, but a user on a severely memory constrained system could reduce the thread count if they wanted. Here's roughly what I'm thinking: https://github.com/ajtowns/bitcoin/commits/202509-reobfus/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Loading a large file
This is run on request, before anything else loads, it's not that large, only 160 MB memory is needed.
For reference, applying the mentioned dbcache=4 (which isn't used here yet) still makes the node use > 1 GB memory:

Edit: doing an actual massif memory measurement with dbcache=4 and -blocksonly reveals that the actual memory usage is lower than that (but still higher than the 160MB needed for a single blockfile):
Command: ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=200000 -dbcache=4 -blocksonly -printtoconsole=1
Massif arguments: --time-unit=ms --massif-out-file=/mnt/my_storage/logs/massif-e66f04d0131b8c2db13ddd649e9eb20910eb6d1d-200000-4.out
ms_print arguments: /mnt/my_storage/logs/massif-e66f04d0131b8c2db13ddd649e9eb20910eb6d1d-200000-4.out
--------------------------------------------------------------------------------
MB
383.1^#
|#
|#
|#
|#
|#
|# : : ::: : : :::: @
|# :@: :::@::::::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::
|#::::@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::
|#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::
|#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::
|#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::
|#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::
|#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::
|#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::
|#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::
|#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::
|#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::
|#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::
|#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::
0 +----------------------------------------------------------------------->h
0 2.506
Here's roughly what I'm thinking: ajtowns/bitcoin@202509-reobfus (commits)
Multithreading is indeed a very good argument for chunking, thanks a lot for the patch, I'll apply it soon and add you as coauthor!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for the review, I have pushed a change to fix the CI and took a few suggestion from your branch (chunking, code simplifications), but kept the original file iteration with progress indicator for now.
The parallelization complicates the situation considerably, I will see if I can find a simpler way or if single-threaded execution is also acceptable.
Edit: grouped the block and undo files for more uniform iteration instead of shuffling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have pushed a new version (rebased, extended test), let me know what you think.
I have implemented a very simple multithreaded version but I couldn't convince it to achieve any speedup whatsoever - I guess xor operations are a lot cheaper than disk reads/writes. The total CPU usage was at 20% even with 50 threads.
I have pushed my threaded solution to https://github.com/l0rinc/bitcoin/pull/40/files#diff-b1e19192258d83199d8adaa5ac31f067af98f63554bfdd679bd8e8073815e69dR1361-R1379, but I kept the single-threaded version here.
aa587f3 to
ffb0221
Compare
|
tidy wants emplace_back over push_back |
|
I agree a separate utility for this seems better - this requires very little of the existing codebase, in theory. Also suggest making the files with the same names, but in a new directory, and then atomically rename the directory when complete, rather than every single file. |
Can you quote what you're agreeing with specifically, not sure who suggested that.
I will think about it, could make sense, but in that case unrelated files should also be copied over (maybe duplicated to be safe) - and listing the directory content wouldn't make the progress obvious. What's wrong with the current approach? |
af1d2ec to
3afb11f
Compare
3afb11f to
86c5afc
Compare
|
Added kernel notifications (thanks @ryanofsky) and improved crash resistance at the very last step (final rename back to old names) - try it out with |
-reobfuscate-blocks arg to xor existing blk/rev on startup-reobfuscate-blocks arg to xor existing blk/rev on startup
### Context Recent discussions highlighted that many nodes which synced before Bitcoin Core v28 have their block and undo files stored effectively in the clear (zero XOR key). This patch adds a simple, resumable maintenance tool to obfuscate previously raw block files, rotate an existing key to a fresh random one, or deobfuscate (set key to zero) if consciously chosen, all without requiring resync. The operation can be cancelled and restarted safely. ### Implementation The new startup option `-reobfuscate-blocks[=VALUE]` accepts either 16 hex characters as an exact 8-byte XOR key (little-endian in-memory layout) or a boolean to generate a random 64-bit key. e.g. `-reobfuscate-blocks=0000000000000000` sets the key to zero, effectively removing obfuscation. If we detect unobfuscated blocks at start time, we suggest this new option in a warning. At startup, we iterate over all undo and block files (grouping the block and undo files for more uniform iteration), read them with the old XOR key and write them back with the new key (`<name>.reobfuscated`). The implementation actually combines the two keys and reads directly into the new obfuscated version to only do a single iteration over the data. This works if the original blocks aren't obfuscated or if the new blocks aren't or if both are. After successful write, we immediately delete the old file. Once all files are staged, we rename them back and atomically swap `xor.dat.reobfuscated` → `xor.dat` and continue operation. We log the old and new keys and print progress roughly per-percent as undo and block files complete (i.e. max 2 * 100 progress logs). ### Constraints * Reobfuscation resumes automatically (detected via `xor.dat.reobfuscated`) even without the flag. In worst-case a crash should only force us to redo previous work. * Single-threaded, processing one file at a time to keep code simple and avoid complexity of interleaving renames and key swaps across threads. * Fast in practice with sequential read/modify/write per blockfile - after recent obfuscation vectorization, this path is very quick. ### Reproducer ```bash cmake -B build -DENABLE_IPC=ON -DBUILD_GUI=ON cmake --build build -j$(nproc) ./build/bin/bitcoind -reobfuscate-blocks -stopatheight=1 ``` ### Single-threaded Performance cpu | hdd/ssd | block count | size | files | time (min) | blocks/min ---------------------|---------|-------------|---------|--------|------------|------------ Apple M4 Max laptop | SSD | ~909k | ~707 GB | 9,982 | 8.4 | 146,613 Intel Core i9 | SSD | ~909k | ~725 GB | 10,238 | 23.1 | 39,351 Raspberry Pi 5 | SSD | ~914k | ~728 GB | 10,276 | 72.78 | 12,558 Intel Core i7 | HDD | ~909k | ~720 GB | 10,156 | 208.7 | 4,356 Raspberry Pi 4B | HDD | ~915k | ~730 GB | 10,304 | 1467 | 624 ----- Similar work: bitcoin#32451 and andrewtoth/blocks-xor Co-authored-by: Andrew Toth <[email protected]> Co-authored-by: Murch <[email protected]> Co-authored-by: Anthony Towns <[email protected]>
### Reproducer ```bash cmake -B build -DENABLE_IPC=ON -DBUILD_GUI=ON cmake --build build -j$(nproc) ./build/bin/bitcoin-qt -reobfuscate-blocks -stopatheight=1 ``` Co-authored-by: Ryan Ofsky <[email protected]>
86c5afc to
d1f2cfc
Compare
-reobfuscate-blocks arg to xor existing blk/rev on startup-reobfuscate-blocks argument to enable (de)obfuscating existing blocks

Context
Recent discussions highlighted that many nodes which synced before Bitcoin Core v28 have their block and undo files stored effectively in the clear (zero XOR key). This patch adds a simple, resumable maintenance tool to obfuscate previously raw block files, rotate an existing key to a fresh random one, or de-obfuscate (set key to zero) if consciously chosen, all without requiring resync. The operation can be cancelled and restarted safely.
Implementation
The new startup option
-reobfuscate-blocks[=VALUE]accepts either 16 hex characters as an exact 8-byte XOR key (little-endian in-memory layout) or a boolean to generate a random 64-bit key. e.g.-reobfuscate-blocks=0000000000000000sets the key to zero, effectively removing obfuscation.If we detect unobfuscated blocks at start time we suggest this new option in a warning.
At startup, we iterate over all undo and block files (grouping the block and undo files for more uniform iteration), read them with the old XOR key and write them back with the new key (
<name>.reobfuscated). The implementation actually combines the two keys and reads directly into the new obfuscated version to only do a single iteration over the data. This works if the original blocks aren't obfuscated or if the new blocks aren't or if both are.After successful write, we immediately delete the old file. Once all files are staged, we rename them back and atomically swap
xor.dat.reobfuscated→xor.datand continue operation.We log the old and new keys and print progress roughly per-percent as files complete (i.e. max 100 progress logs).
Constraints
xor.dat.reobfuscated) even without the flag. In worst-case a crash should only force us to redo previous work.Reproducer
Single-threaded Performance
Similar work: