Skip to content

Conversation

@l0rinc
Copy link
Contributor

@l0rinc l0rinc commented Sep 5, 2025

Context

Recent discussions highlighted that many nodes which synced before Bitcoin Core v28 have their block and undo files stored effectively in the clear (zero XOR key). This patch adds a simple, resumable maintenance tool to obfuscate previously raw block files, rotate an existing key to a fresh random one, or de-obfuscate (set key to zero) if consciously chosen, all without requiring resync. The operation can be cancelled and restarted safely.

Implementation

The new startup option -reobfuscate-blocks[=VALUE] accepts either 16 hex characters as an exact 8-byte XOR key (little-endian in-memory layout) or a boolean to generate a random 64-bit key. e.g. -reobfuscate-blocks=0000000000000000 sets the key to zero, effectively removing obfuscation.

If we detect unobfuscated blocks at start time we suggest this new option in a warning.

At startup, we iterate over all undo and block files (grouping the block and undo files for more uniform iteration), read them with the old XOR key and write them back with the new key (<name>.reobfuscated). The implementation actually combines the two keys and reads directly into the new obfuscated version to only do a single iteration over the data. This works if the original blocks aren't obfuscated or if the new blocks aren't or if both are.
After successful write, we immediately delete the old file. Once all files are staged, we rename them back and atomically swap xor.dat.reobfuscatedxor.dat and continue operation.

We log the old and new keys and print progress roughly per-percent as files complete (i.e. max 100 progress logs).

Constraints

  • Re-obfuscation resumes automatically (detected via xor.dat.reobfuscated) even without the flag. In worst-case a crash should only force us to redo previous work.
  • Single-threaded, processing one file at a time to keep code simple and avoid complexity of interleaving renames and key swaps across threads.
  • Fast in practice with sequential read/modify/write per blockfile - after recent obfuscation vectorization, this path is very quick.

Reproducer

cmake -B build -DENABLE_IPC=ON -DBUILD_GUI=ON
cmake --build build -j$(nproc)
# command line
./build/bin/bitcoind -reobfuscate-blocks -stopatheight=1
# same with GUI
./build/bin/bitcoin-qt -reobfuscate-blocks -stopatheight=1

Single-threaded Performance

cpu hdd/ssd block count size files time (min) blocks/min
Apple M4 Max laptop SSD ~909k ~707 GB 9,982 8.4 146,613
Intel Core i9 SSD ~909k ~725 GB 10,238 23.1 39,351
Raspberry Pi 5 SSD ~914k ~728 GB 10,276 72.78 12,558
Intel Core i7 HDD ~909k ~720 GB 10,156 208.7 4,356
Raspberry Pi 4B HDD ~915k ~730 GB 10,304 1467 624

Similar work:

@DrahtBot
Copy link
Contributor

DrahtBot commented Sep 5, 2025

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33324.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept ACK sedited, stickies-v

If your review is incorrectly listed, please copy-paste <!--meta-tag:bot-skip--> into the comment that the bot should ignore.

Conflicts

No conflicts as of last run.

LLM Linter (✨ experimental)

Possible places where named args for integral literals may be used (e.g. func(x, /*named_arg=*/0) in C++, and func(x, named_arg=0) in Python):

  • notifications.progress(_("Reobfuscating blocks…"), 0, false) in src/init.cpp
  • notifications.progress(_("Reobfuscating blocks…"), percentage, false) in src/init.cpp
  • notifications.progress(_("Reobfuscating blocks…"), 100, false) in src/init.cpp

2025-12-10

@l0rinc l0rinc force-pushed the l0rinc/reobfuscate-blocks branch from e66f04d to bb50372 Compare September 6, 2025 00:29
@l0rinc l0rinc force-pushed the l0rinc/reobfuscate-blocks branch 3 times, most recently from 2933b17 to d3962f6 Compare September 8, 2025 04:47
@sedited
Copy link
Contributor

sedited commented Sep 8, 2025

Concept ACK

@stickies-v
Copy link
Contributor

Concept ACK, this seems like useful functionality to expose.

Should we split ObfuscateBlocks out of init? I have split it into many local lambdas, but we may want to find better home for those methods...

I don't like using startup options for one-time operations (I feel the same about e.g. -reindex). Without having thought it through too much yet, maybe we can bundle this e.g. as part of bitcoin-util or a separate bitcoin-xor-blocks utility?

Should we repurpose the existing -blocksxor arg instead?

With this PR, IIUC we'd have -blocksxor, reobfuscate-blocks, and the existence of the xor.dat file that all have some redundancy and thus potential for conflict (e,g. blocksxor=0, reobfuscate-blocks=1, and a non-zero xor.dat file). Reducing that complexity seems like it would be useful.

Copy link
Contributor

@ajtowns ajtowns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having it be a startup option like -reindex seems fine to me.

src/init.cpp Outdated
auto migrate_single_blockfile{[&](const fs::path& file, const Obfuscation& delta_obfuscation, std::vector<std::byte>& buf) -> bool {
AutoFile old_blocks{fsbridge::fopen(file, "rb"), delta_obfuscation}; // deobfuscate & reobfuscate with a single combined key
buf.resize(fs::file_size(file)); // reuse buffer
old_blocks.read(buf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than reading the entire blockfile into memory at once, consider chunking it:

        size_t left = fs::file_size(file);
        while (left > 0) {
            size_t chunk = std::min<size_t>(left, 2 * MAX_BLOCK_SERIALIZED_SIZE);
            buf.resize(chunk);
            old_blocks.read(buf);
            new_blocks.write_buffer(buf);
            left -= chunk;
        }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do that with the recently introduced buffered readers - but that's considerably slower.
Is it a problem to read all of it in memory when we don't even have dbcache yet? The total memory usage is just 160 MB during migration, we should be fine until 1 GB at least, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think buffered readers is the right thing (that's for when you want to process small amounts of data while still reading it from the file in large chunks), and trying the above didn't seem particularly slow to me.

I guess it could be simplified a bit to:

buf.resize(2 * MAX_BLOCK_SERIALIZED_SIZE);
while (true) {
    size_t size = old_blocks.detail_fread(buf);
    if (size == 0) break;
    new_blocks.write_buffer(std::span(buf, 0, size));
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's problem would chunking solve in your opinion? I don't mind doing it, but the current version is slightly simpler and slightly faster, so I need at least some justification for giving up both :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loading a large file entirely into memory when it's not necessary is just bad practice. What if we changed to .blk files of 1GB each? What if we're running on a node that's memory constrained and configured dbcache down to 4MB?

If we're worried about speed, then doing it in parallel helps on my system since obfuscation ends up CPU bound when single-threaded -- with the current code, it takes 238s (4min); with 8 threads it's 65s; with 16 threads it's 47s. Using 16MB chunks (BLOCKFILE_CHUNK_SIZE), 8 threads is also ~128MB of memory, but a user on a severely memory constrained system could reduce the thread count if they wanted. Here's roughly what I'm thinking: https://github.com/ajtowns/bitcoin/commits/202509-reobfus/

Copy link
Contributor Author

@l0rinc l0rinc Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loading a large file

This is run on request, before anything else loads, it's not that large, only 160 MB memory is needed.

For reference, applying the mentioned dbcache=4 (which isn't used here yet) still makes the node use > 1 GB memory:
image

Edit: doing an actual massif memory measurement with dbcache=4 and -blocksonly reveals that the actual memory usage is lower than that (but still higher than the 160MB needed for a single blockfile):

Command:            ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=200000 -dbcache=4 -blocksonly -printtoconsole=1                                               
Massif arguments:   --time-unit=ms --massif-out-file=/mnt/my_storage/logs/massif-e66f04d0131b8c2db13ddd649e9eb20910eb6d1d-200000-4.out                                                    
ms_print arguments: /mnt/my_storage/logs/massif-e66f04d0131b8c2db13ddd649e9eb20910eb6d1d-200000-4.out                                                                                     
--------------------------------------------------------------------------------                                                                                                          
                                                                                                                                                                                          
                                                                                                                                                                                          
    MB                                                                                                                                                                                    
383.1^#                                                                                                                                                                                   
     |#                                                                                                                                                                                   
     |#                                                                                                                                                                                   
     |#                                                                                                                                                                                   
     |#                                                                                                                                                                                   
     |#                                                                                                                                                                                   
     |#                                   :  : :::      :  :    ::::  @                                                                                                                   
     |#   :@: :::@::::::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::                                                                                                            
     |#::::@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::                                                                                                            
     |#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::                                                                                                            
     |#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::                                                                                                            
     |#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::                                                                                                            
     |#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::                                                                                                            
     |#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::                                                                                                            
     |#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::                                                                                                            
     |#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::                                                                                                            
     |#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::                                                                                                            
     |#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::                                                                                                            
     |#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::
     |#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::
   0 +----------------------------------------------------------------------->h
     0                                                                   2.506

Here's roughly what I'm thinking: ajtowns/bitcoin@202509-reobfus (commits)

Multithreading is indeed a very good argument for chunking, thanks a lot for the patch, I'll apply it soon and add you as coauthor!

Copy link
Contributor Author

@l0rinc l0rinc Sep 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for the review, I have pushed a change to fix the CI and took a few suggestion from your branch (chunking, code simplifications), but kept the original file iteration with progress indicator for now.
The parallelization complicates the situation considerably, I will see if I can find a simpler way or if single-threaded execution is also acceptable.
Edit: grouped the block and undo files for more uniform iteration instead of shuffling

Copy link
Contributor Author

@l0rinc l0rinc Sep 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have pushed a new version (rebased, extended test), let me know what you think.

I have implemented a very simple multithreaded version but I couldn't convince it to achieve any speedup whatsoever - I guess xor operations are a lot cheaper than disk reads/writes. The total CPU usage was at 20% even with 50 threads.

I have pushed my threaded solution to https://github.com/l0rinc/bitcoin/pull/40/files#diff-b1e19192258d83199d8adaa5ac31f067af98f63554bfdd679bd8e8073815e69dR1361-R1379, but I kept the single-threaded version here.

@l0rinc l0rinc force-pushed the l0rinc/reobfuscate-blocks branch from aa587f3 to ffb0221 Compare September 12, 2025 03:55
@ajtowns
Copy link
Contributor

ajtowns commented Sep 15, 2025

tidy wants emplace_back over push_back

@luke-jr
Copy link
Member

luke-jr commented Sep 18, 2025

I agree a separate utility for this seems better - this requires very little of the existing codebase, in theory.

Also suggest making the files with the same names, but in a new directory, and then atomically rename the directory when complete, rather than every single file.

@l0rinc
Copy link
Contributor Author

l0rinc commented Sep 18, 2025

I agree a separate utility for this seems better

Can you quote what you're agreeing with specifically, not sure who suggested that.
Besides, @andrewtoth already has a tool for that, it was mentioned in the PR description.

but in a new directory, and then atomically rename

I will think about it, could make sense, but in that case unrelated files should also be copied over (maybe duplicated to be safe) - and listing the directory content wouldn't make the progress obvious. What's wrong with the current approach?

@l0rinc l0rinc force-pushed the l0rinc/reobfuscate-blocks branch 4 times, most recently from af1d2ec to 3afb11f Compare September 26, 2025 03:29
@l0rinc l0rinc force-pushed the l0rinc/reobfuscate-blocks branch from 3afb11f to 86c5afc Compare October 3, 2025 00:59
@l0rinc
Copy link
Contributor Author

l0rinc commented Oct 3, 2025

Added kernel notifications (thanks @ryanofsky) and improved crash resistance at the very last step (final rename back to old names) - try it out with ./build/bin/bitcoin-qt -reobfuscate-blocks -stopatheight=1.

@l0rinc l0rinc marked this pull request as ready for review October 3, 2025 01:01
@l0rinc l0rinc changed the title RFC: blocks: add -reobfuscate-blocks arg to xor existing blk/rev on startup blocks: add -reobfuscate-blocks arg to xor existing blk/rev on startup Oct 3, 2025
l0rinc and others added 3 commits December 10, 2025 23:20
### Context

Recent discussions highlighted that many nodes which synced before Bitcoin Core v28 have their block and undo files stored effectively in the clear (zero XOR key). This patch adds a simple, resumable maintenance tool to obfuscate previously raw block files, rotate an existing key to a fresh random one, or deobfuscate (set key to zero) if consciously chosen, all without requiring resync. The operation can be cancelled and restarted safely.

### Implementation

The new startup option `-reobfuscate-blocks[=VALUE]` accepts either 16 hex characters as an exact 8-byte XOR key (little-endian in-memory layout) or a boolean to generate a random 64-bit key. e.g. `-reobfuscate-blocks=0000000000000000` sets the key to zero, effectively removing obfuscation.

If we detect unobfuscated blocks at start time, we suggest this new option in a warning.

At startup, we iterate over all undo and block files (grouping the block and undo files for more uniform iteration), read them with the old XOR key and write them back with the new key (`<name>.reobfuscated`). The implementation actually combines the two keys and reads directly into the new obfuscated version to only do a single iteration over the data. This works if the original blocks aren't obfuscated or if the new blocks aren't or if both are.
After successful write, we immediately delete the old file. Once all files are staged, we rename them back and atomically swap `xor.dat.reobfuscated` → `xor.dat` and continue operation.

We log the old and new keys and print progress roughly per-percent as undo and block files complete (i.e. max 2 * 100 progress logs).

### Constraints

* Reobfuscation resumes automatically (detected via `xor.dat.reobfuscated`) even without the flag. In worst-case a crash should only force us to redo previous work.
* Single-threaded, processing one file at a time to keep code simple and avoid complexity of interleaving renames and key swaps across threads.
* Fast in practice with sequential read/modify/write per blockfile - after recent obfuscation vectorization, this path is very quick.

### Reproducer

```bash
cmake -B build -DENABLE_IPC=ON -DBUILD_GUI=ON
cmake --build build -j$(nproc)
./build/bin/bitcoind -reobfuscate-blocks -stopatheight=1
```

### Single-threaded Performance

 cpu                 | hdd/ssd | block count | size    | files  | time (min) | blocks/min
---------------------|---------|-------------|---------|--------|------------|------------
 Apple M4 Max laptop | SSD     | ~909k       | ~707 GB | 9,982  | 8.4        | 146,613
 Intel Core i9       | SSD     | ~909k       | ~725 GB | 10,238 | 23.1       | 39,351
 Raspberry Pi 5      | SSD     | ~914k       | ~728 GB | 10,276 | 72.78      | 12,558
 Intel Core i7       | HDD     | ~909k       | ~720 GB | 10,156 | 208.7      | 4,356
 Raspberry Pi 4B     | HDD     | ~915k       | ~730 GB | 10,304 | 1467       | 624

-----

Similar work: bitcoin#32451 and andrewtoth/blocks-xor

Co-authored-by: Andrew Toth <[email protected]>
Co-authored-by: Murch <[email protected]>
Co-authored-by: Anthony Towns <[email protected]>
### Reproducer

```bash
cmake -B build -DENABLE_IPC=ON -DBUILD_GUI=ON
cmake --build build -j$(nproc)
./build/bin/bitcoin-qt -reobfuscate-blocks -stopatheight=1
```

Co-authored-by: Ryan Ofsky <[email protected]>
@l0rinc l0rinc force-pushed the l0rinc/reobfuscate-blocks branch from 86c5afc to d1f2cfc Compare December 10, 2025 22:21
@l0rinc l0rinc changed the title blocks: add -reobfuscate-blocks arg to xor existing blk/rev on startup blocks: add -reobfuscate-blocks argument to enable (de)obfuscating existing blocks Dec 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants