blocks: add `-reobfuscate-blocks` argument to enable (de)obfuscating existing blocks #33324

l0rinc · 2025-09-05T23:51:29Z

Context

Recent discussions highlighted that many nodes which synced before Bitcoin Core v28 have their block and undo files stored effectively in the clear (zero XOR key). This patch adds a simple, resumable maintenance tool to obfuscate previously raw block files, rotate an existing key to a fresh random one, or de-obfuscate (set key to zero) if consciously chosen, all without requiring resync. The operation can be cancelled and restarted safely.

Implementation

The new startup option -reobfuscate-blocks[=VALUE] accepts either 16 hex characters as an exact 8-byte XOR key (little-endian in-memory layout) or a boolean to generate a random 64-bit key. e.g. -reobfuscate-blocks=0000000000000000 sets the key to zero, effectively removing obfuscation.

If we detect unobfuscated blocks at start time we suggest this new option in a warning.

At startup, we iterate over all undo and block files (grouping the block and undo files for more uniform iteration), read them with the old XOR key and write them back with the new key (<name>.reobfuscated). The implementation actually combines the two keys and reads directly into the new obfuscated version to only do a single iteration over the data. This works if the original blocks aren't obfuscated or if the new blocks aren't or if both are.
After successful write, we immediately delete the old file. Once all files are staged, we rename them back and atomically swap xor.dat.reobfuscated → xor.dat and continue operation.

We log the old and new keys and print progress roughly per-percent as files complete (i.e. max 100 progress logs).

Constraints

Re-obfuscation resumes automatically (detected via xor.dat.reobfuscated) even without the flag. In worst-case a crash should only force us to redo previous work.
Single-threaded, processing one file at a time to keep code simple and avoid complexity of interleaving renames and key swaps across threads.
Fast in practice with sequential read/modify/write per blockfile - after recent obfuscation vectorization, this path is very quick.

Reproducer

cmake -B build -DENABLE_IPC=ON -DBUILD_GUI=ON
cmake --build build -j$(nproc)
# command line
./build/bin/bitcoind -reobfuscate-blocks -stopatheight=1
# same with GUI
./build/bin/bitcoin-qt -reobfuscate-blocks -stopatheight=1

Single-threaded Performance

cpu	hdd/ssd	block count	size	files	time (min)	blocks/min
Apple M4 Max laptop	SSD	~909k	~707 GB	9,982	8.4	146,613
Intel Core i9	SSD	~909k	~725 GB	10,238	23.1	39,351
Raspberry Pi 5	SSD	~914k	~728 GB	10,276	72.78	12,558
Intel Core i7	HDD	~909k	~720 GB	10,156	208.7	4,356
Raspberry Pi 4B	HDD	~915k	~730 GB	10,304	1467	624

Similar work:

DrahtBot · 2025-09-05T23:51:33Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33324.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept ACK	sedited, stickies-v

If your review is incorrectly listed, please copy-paste  into the comment that the bot should ignore.

Conflicts

No conflicts as of last run.

LLM Linter (✨ experimental)

Possible places where named args for integral literals may be used (e.g. func(x, /*named_arg=*/0) in C++, and func(x, named_arg=0) in Python):

notifications.progress(_("Reobfuscating blocks…"), 0, false) in src/init.cpp
notifications.progress(_("Reobfuscating blocks…"), percentage, false) in src/init.cpp
notifications.progress(_("Reobfuscating blocks…"), 100, false) in src/init.cpp

^2025-12-10

sedited · 2025-09-08T08:21:29Z

Concept ACK

stickies-v · 2025-09-08T22:06:31Z

Concept ACK, this seems like useful functionality to expose.

Should we split ObfuscateBlocks out of init? I have split it into many local lambdas, but we may want to find better home for those methods...

I don't like using startup options for one-time operations (I feel the same about e.g. -reindex). Without having thought it through too much yet, maybe we can bundle this e.g. as part of bitcoin-util or a separate bitcoin-xor-blocks utility?

Should we repurpose the existing -blocksxor arg instead?

With this PR, IIUC we'd have -blocksxor, reobfuscate-blocks, and the existence of the xor.dat file that all have some redundancy and thus potential for conflict (e,g. blocksxor=0, reobfuscate-blocks=1, and a non-zero xor.dat file). Reducing that complexity seems like it would be useful.

ajtowns

Having it be a startup option like -reindex seems fine to me.

src/init.cpp

ajtowns · 2025-09-11T05:23:45Z

src/init.cpp

+    auto migrate_single_blockfile{[&](const fs::path& file, const Obfuscation& delta_obfuscation, std::vector<std::byte>& buf) -> bool {
+        AutoFile old_blocks{fsbridge::fopen(file, "rb"), delta_obfuscation}; // deobfuscate & reobfuscate with a single combined key
+        buf.resize(fs::file_size(file)); // reuse buffer
+        old_blocks.read(buf);


Rather than reading the entire blockfile into memory at once, consider chunking it:

size_t left = fs::file_size(file); while (left > 0) { size_t chunk = std::min<size_t>(left, 2 * MAX_BLOCK_SERIALIZED_SIZE); buf.resize(chunk); old_blocks.read(buf); new_blocks.write_buffer(buf); left -= chunk; }

We could do that with the recently introduced buffered readers - but that's considerably slower.
Is it a problem to read all of it in memory when we don't even have dbcache yet? The total memory usage is just 160 MB during migration, we should be fine until 1 GB at least, right?

I don't think buffered readers is the right thing (that's for when you want to process small amounts of data while still reading it from the file in large chunks), and trying the above didn't seem particularly slow to me.

I guess it could be simplified a bit to:

buf.resize(2 * MAX_BLOCK_SERIALIZED_SIZE); while (true) { size_t size = old_blocks.detail_fread(buf); if (size == 0) break; new_blocks.write_buffer(std::span(buf, 0, size)); }

What's problem would chunking solve in your opinion? I don't mind doing it, but the current version is slightly simpler and slightly faster, so I need at least some justification for giving up both :)

Loading a large file entirely into memory when it's not necessary is just bad practice. What if we changed to .blk files of 1GB each? What if we're running on a node that's memory constrained and configured dbcache down to 4MB?

If we're worried about speed, then doing it in parallel helps on my system since obfuscation ends up CPU bound when single-threaded -- with the current code, it takes 238s (4min); with 8 threads it's 65s; with 16 threads it's 47s. Using 16MB chunks (BLOCKFILE_CHUNK_SIZE), 8 threads is also ~128MB of memory, but a user on a severely memory constrained system could reduce the thread count if they wanted. Here's roughly what I'm thinking: https://github.com/ajtowns/bitcoin/commits/202509-reobfus/

Loading a large file

This is run on request, before anything else loads, it's not that large, only 160 MB memory is needed.

For reference, applying the mentioned dbcache=4 (which isn't used here yet) still makes the node use > 1 GB memory:

Edit: doing an actual massif memory measurement with dbcache=4 and -blocksonly reveals that the actual memory usage is lower than that (but still higher than the 160MB needed for a single blockfile):

Command: ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=200000 -dbcache=4 -blocksonly -printtoconsole=1 Massif arguments: --time-unit=ms --massif-out-file=/mnt/my_storage/logs/massif-e66f04d0131b8c2db13ddd649e9eb20910eb6d1d-200000-4.out ms_print arguments: /mnt/my_storage/logs/massif-e66f04d0131b8c2db13ddd649e9eb20910eb6d1d-200000-4.out -------------------------------------------------------------------------------- MB 383.1^# |# |# |# |# |# |# : : ::: : : :::: @ |# :@: :::@::::::::::::::@:::@::::::::::::::::::@:::::::@::::::@::::::: |#::::@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@::::::: |#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@::::::: |#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@::::::: |#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@::::::: |#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@::::::: |#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@::::::: |#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@::::::: |#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@::::::: |#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@::::::: |#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@::::::: |#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@::::::: |#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@::::::: 0 +----------------------------------------------------------------------->h 0 2.506

Here's roughly what I'm thinking: ajtowns/bitcoin@202509-reobfus (commits)

Multithreading is indeed a very good argument for chunking, thanks a lot for the patch, I'll apply it soon and add you as coauthor!

Thanks again for the review, I have pushed a change to fix the CI and took a few suggestion from your branch (chunking, code simplifications), but kept the original file iteration with progress indicator for now.
The parallelization complicates the situation considerably, I will see if I can find a simpler way or if single-threaded execution is also acceptable.
Edit: grouped the block and undo files for more uniform iteration instead of shuffling

I have pushed a new version (rebased, extended test), let me know what you think.

I have implemented a very simple multithreaded version but I couldn't convince it to achieve any speedup whatsoever - I guess xor operations are a lot cheaper than disk reads/writes. The total CPU usage was at 20% even with 50 threads.

I have pushed my threaded solution to https://github.com/l0rinc/bitcoin/pull/40/files#diff-b1e19192258d83199d8adaa5ac31f067af98f63554bfdd679bd8e8073815e69dR1361-R1379, but I kept the single-threaded version here.

src/init.cpp

ajtowns · 2025-09-15T07:32:02Z

tidy wants emplace_back over push_back

luke-jr · 2025-09-18T01:29:22Z

I agree a separate utility for this seems better - this requires very little of the existing codebase, in theory.

Also suggest making the files with the same names, but in a new directory, and then atomically rename the directory when complete, rather than every single file.

l0rinc · 2025-09-18T01:49:52Z

I agree a separate utility for this seems better

Can you quote what you're agreeing with specifically, not sure who suggested that.
Besides, @andrewtoth already has a tool for that, it was mentioned in the PR description.

but in a new directory, and then atomically rename

I will think about it, could make sense, but in that case unrelated files should also be copied over (maybe duplicated to be safe) - and listing the directory content wouldn't make the progress obvious. What's wrong with the current approach?

l0rinc · 2025-10-03T01:01:44Z

Added kernel notifications (thanks @ryanofsky) and improved crash resistance at the very last step (final rename back to old names) - try it out with ./build/bin/bitcoin-qt -reobfuscate-blocks -stopatheight=1.

### Context Recent discussions highlighted that many nodes which synced before Bitcoin Core v28 have their block and undo files stored effectively in the clear (zero XOR key). This patch adds a simple, resumable maintenance tool to obfuscate previously raw block files, rotate an existing key to a fresh random one, or deobfuscate (set key to zero) if consciously chosen, all without requiring resync. The operation can be cancelled and restarted safely. ### Implementation The new startup option `-reobfuscate-blocks[=VALUE]` accepts either 16 hex characters as an exact 8-byte XOR key (little-endian in-memory layout) or a boolean to generate a random 64-bit key. e.g. `-reobfuscate-blocks=0000000000000000` sets the key to zero, effectively removing obfuscation. If we detect unobfuscated blocks at start time, we suggest this new option in a warning. At startup, we iterate over all undo and block files (grouping the block and undo files for more uniform iteration), read them with the old XOR key and write them back with the new key (`<name>.reobfuscated`). The implementation actually combines the two keys and reads directly into the new obfuscated version to only do a single iteration over the data. This works if the original blocks aren't obfuscated or if the new blocks aren't or if both are. After successful write, we immediately delete the old file. Once all files are staged, we rename them back and atomically swap `xor.dat.reobfuscated` → `xor.dat` and continue operation. We log the old and new keys and print progress roughly per-percent as undo and block files complete (i.e. max 2 * 100 progress logs). ### Constraints * Reobfuscation resumes automatically (detected via `xor.dat.reobfuscated`) even without the flag. In worst-case a crash should only force us to redo previous work. * Single-threaded, processing one file at a time to keep code simple and avoid complexity of interleaving renames and key swaps across threads. * Fast in practice with sequential read/modify/write per blockfile - after recent obfuscation vectorization, this path is very quick. ### Reproducer ```bash cmake -B build -DENABLE_IPC=ON -DBUILD_GUI=ON cmake --build build -j$(nproc) ./build/bin/bitcoind -reobfuscate-blocks -stopatheight=1 ``` ### Single-threaded Performance cpu | hdd/ssd | block count | size | files | time (min) | blocks/min ---------------------|---------|-------------|---------|--------|------------|------------ Apple M4 Max laptop | SSD | ~909k | ~707 GB | 9,982 | 8.4 | 146,613 Intel Core i9 | SSD | ~909k | ~725 GB | 10,238 | 23.1 | 39,351 Raspberry Pi 5 | SSD | ~914k | ~728 GB | 10,276 | 72.78 | 12,558 Intel Core i7 | HDD | ~909k | ~720 GB | 10,156 | 208.7 | 4,356 Raspberry Pi 4B | HDD | ~915k | ~730 GB | 10,304 | 1467 | 624 ----- Similar work: bitcoin#32451 and andrewtoth/blocks-xor Co-authored-by: Andrew Toth <[email protected]> Co-authored-by: Murch <[email protected]> Co-authored-by: Anthony Towns <[email protected]>

### Reproducer ```bash cmake -B build -DENABLE_IPC=ON -DBUILD_GUI=ON cmake --build build -j$(nproc) ./build/bin/bitcoin-qt -reobfuscate-blocks -stopatheight=1 ``` Co-authored-by: Ryan Ofsky <[email protected]>

l0rinc force-pushed the l0rinc/reobfuscate-blocks branch from e66f04d to bb50372 Compare September 6, 2025 00:29

DrahtBot mentioned this pull request Sep 6, 2025

net: Prevent node from binding to the same CService #33231

Merged

l0rinc force-pushed the l0rinc/reobfuscate-blocks branch 3 times, most recently from 2933b17 to d3962f6 Compare September 8, 2025 04:47

l0rinc mentioned this pull request Sep 8, 2025

use -loadblock to load blk*****.dat files, but the blocks in it are not recognized #33280

Open

1 task

DrahtBot added the Needs rebase label Sep 9, 2025

l0rinc force-pushed the l0rinc/reobfuscate-blocks branch from d3962f6 to aa587f3 Compare September 10, 2025 03:17

DrahtBot removed the Needs rebase label Sep 10, 2025

ajtowns reviewed Sep 11, 2025

View reviewed changes

l0rinc force-pushed the l0rinc/reobfuscate-blocks branch from aa587f3 to ffb0221 Compare September 12, 2025 03:55

l0rinc force-pushed the l0rinc/reobfuscate-blocks branch 4 times, most recently from af1d2ec to 3afb11f Compare September 26, 2025 03:29

l0rinc force-pushed the l0rinc/reobfuscate-blocks branch from 3afb11f to 86c5afc Compare October 3, 2025 00:59

l0rinc marked this pull request as ready for review October 3, 2025 01:01

l0rinc changed the title ~~RFC: blocks: add -reobfuscate-blocks arg to xor existing blk/rev on startup~~ blocks: add -reobfuscate-blocks arg to xor existing blk/rev on startup Oct 3, 2025

This was referenced Oct 15, 2025

scripted-diff: Type-safe settings retrieval #31260

Open

kernel, logging: Pass Logger instances to kernel objects #30342

Draft

indexes: Stop using node internal types and locking cs_main, improve sync logic #24230

Draft

l0rinc added 2 commits December 10, 2025 23:20

refactor: inline constant f_obfuscate = false parameter

1670fb1

refactor: add path + string and file removal helpers

f055e45

l0rinc and others added 3 commits December 10, 2025 23:20

init: add -reobfuscate-blocks argument

7b7df5e

gui: add kernel notifications for reobfuscation progress

d1f2cfc

### Reproducer ```bash cmake -B build -DENABLE_IPC=ON -DBUILD_GUI=ON cmake --build build -j$(nproc) ./build/bin/bitcoin-qt -reobfuscate-blocks -stopatheight=1 ``` Co-authored-by: Ryan Ofsky <[email protected]>

l0rinc force-pushed the l0rinc/reobfuscate-blocks branch from 86c5afc to d1f2cfc Compare December 10, 2025 22:21

l0rinc mentioned this pull request Dec 10, 2025

refactor: inline constant f_obfuscate = false parameter #34048

Closed

l0rinc changed the title ~~blocks: add -reobfuscate-blocks arg to xor existing blk/rev on startup~~ blocks: add -reobfuscate-blocks argument to enable (de)obfuscating existing blocks Dec 11, 2025

blocks: add -reobfuscate-blocks argument to enable (de)obfuscating existing blocks #33324

Are you sure you want to change the base?

blocks: add -reobfuscate-blocks argument to enable (de)obfuscating existing blocks #33324

Uh oh!

Conversation

l0rinc commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Implementation

Constraints

Reproducer

Single-threaded Performance

Uh oh!

DrahtBot commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage & Benchmarks

Reviews

Conflicts

LLM Linter (✨ experimental)

Uh oh!

sedited commented Sep 8, 2025

Uh oh!

stickies-v commented Sep 8, 2025

Uh oh!

ajtowns left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ajtowns Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

l0rinc Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

ajtowns Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

l0rinc Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

ajtowns Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

l0rinc Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

l0rinc Sep 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

l0rinc Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ajtowns commented Sep 15, 2025

Uh oh!

luke-jr commented Sep 18, 2025

Uh oh!

l0rinc commented Sep 18, 2025

Uh oh!

l0rinc commented Oct 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

blocks: add `-reobfuscate-blocks` argument to enable (de)obfuscating existing blocks #33324

blocks: add `-reobfuscate-blocks` argument to enable (de)obfuscating existing blocks #33324

l0rinc commented Sep 5, 2025 •

edited

Loading

DrahtBot commented Sep 5, 2025 •

edited

Loading

l0rinc Sep 12, 2025 •

edited

Loading

l0rinc Sep 20, 2025 •

edited

Loading

l0rinc Sep 26, 2025 •

edited

Loading