-
Notifications
You must be signed in to change notification settings - Fork 38.7k
[29.x] Backport wallets directory deletion fixes #34222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/34222. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please copy-paste |
|
ACK 5b8e09b |
5b8e09b to
c6877e3
Compare
Github-Pull: bitcoin#31423 Rebased-From: d04f6a9
c6877e3 to
5a1e79d
Compare
|
@davidgumberg discovered that a form of the deletion bug is also reachable if the user migrates a wallet at a relative path, resulting the entire directory at that relative path being deleted on a migration failure. This is problematic when that relative path contains more than just a wallet file. For example, with a wallet.dat in the datadir, migrating the wallet named "../" would result in deleting the entire datadir. To resolve this, I've backported the entirety of #34156 and #34226 which adds tests for this case. These backports are largely unclean as well. |
5a1e79d to
1bfcac1
Compare
1bfcac1 to
3606219
Compare
src/wallet/wallet.cpp
Outdated
| // Before deleting the wallet's directory, copy the backup file to the top-level wallets dir | ||
| fs::path temp_backup_location = fsbridge::AbsPathJoin(GetWalletDir(), backup_filename); | ||
| fs::copy_file(backup_path, temp_backup_location, fs::copy_options::none); | ||
| wallet_files_to_remove.insert(backup_path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the wallet originally was in the wallet dir directly, we'll copy it over itself and then delete it?
Why not just move the file here (and never delete the backup)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing that breaks a bunch of tests and it got increasingly complicated to fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it looks like copy_file is throwing an exception: "filesystem error: cannot copy file: File exists"
So it's not deleting the backup, but neither is it doing the intended cleanup at all. Probably want to extend the test to verify the cleanup happens, and then fix this.
(to be clear, for test "Test failure during migration of wallet named: unnamed (default)")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests already check that this error is being hit and that the cleanup does not occur.
However this was needed for the other tests where cleanup is occurring. It is not universally true that the cleanup does not occur. It should only be for the unnamed wallet and for wallets by absolute path.
src/wallet/bdb.h
Outdated
| files.emplace_back(env->Directory() / ".walletlock"); | ||
| files.emplace_back(env->Directory() / "database" / "log.0000000001"); | ||
| files.emplace_back(env->Directory() / "database"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are other wallets in the same directory, these may be needed to avoid corrupting them, and are not safe to delete with this wallet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Added a check that env->m_databases.size() == 1 to return these.
It's only the unnamed wallet and direct files in the walletsdir that can possibly run into this issue. For wallets in a subdirectory, it will be one wallet.dat per environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
m_databases only includes open wallets. Other wallets dependent on these files might not be open.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wallets are compacted on shutdown so they should not have any of these files.
This also matches the behavior of BerkeleyDatabase::Flush.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it worth discussing this further. The Files() implementation for BDB exists just for completeness, it is not being called anywhere during migration right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is not being called anywhere during migration right now.
Oh right, only SQLiteDatabase::Files() is called by migration. BerkeleyDatabase::Files() is only called by createfromdump's cleanup if the user creates into a BDB file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wallets are compacted on shutdown so they should not have any of these files.
Only on clean shutdown. It's possible the node crashed or power failed, etc.
This also matches the behavior of BerkeleyDatabase::Flush.
Yes, this is an existing bug: bitcoinknots#242 (happy to PR here if there's interest)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to focus on the high priority issue first. The goal of the backport is to fix the top-level directory deletion issue. Any edge case during the createfromdump cleanup is really minimal in comparison. createfromdump is a niche external wallet utility intended for development mostly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is an existing bug: bitcoinknots#242 (happy to PR here if there's interest)
Briefly glancing through that pr, I don't think it solves any of the issues discussed in this thread. According to BDB's docs, the only error that lsn_reset, txn_checkpoint, and log_archive can return is EINVAL, so I don't think adding error handling there is meaningfully useful. It looks like that behavior has also existed since the initial commit of this project.
As best as I can tell, there is no way to prevent BDB from possibly becoming corrupted when the log files are removed. Even with log_archive(DB_ARCH_REMOVE), it still keeps the most recent log file, and there does not seem to be any way to determine whether that log file is actually still useful. The only way to avoid corruption there is to never delete the log files. However, the purpose of deleting the log files is to allow people to move between BDB versions (or more specifically, between a self compiled binary against a different BDB version and the release binaries) without the log file incompatibility issue, and that is probably still useful behavior. This behavior has also existed for nearly 13 years now.
I think this corruption is extremely unlikely to occur anyways. I'm finding it hard to even trigger these corruptions even with knowing the conditions they should occur in. Given that BDB was removed from the project several months ago, I doubt there is any appetite to review fixes for these issues, especially as the versions they affect go EOL.
| /** Return path to main database filename */ | ||
| std::string Filename() override { return fs::PathToString(env->Directory() / m_filename); } | ||
|
|
||
| std::vector<fs::path> Files() override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what scenario is .Files() ever called on a BerkeleyDatabase? Just createfromdump?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is now used by the changes to migration.
It's used by RestoreWallet for cleanup as well.
Github-Pull: bitcoin#31423 Rebased-From: 1de423e
Github-Pull: bitcoin#34215 Rebased-From: f78f6f1
Track what RestoreWallet creates so only those files and directories are removed during a failure and nothing else. Preexisting paths must be left untouched. Note: Using fs::remove_all() instead of fs::remove() in RestoreWallet does not cause any problems currently, but the change is necessary for the next commit which extends RestoreWallet to work with existing directories, which may contain files that must not be deleted. Github-Pull: bitcoin#34156 Rebased-From: 4ed0693
3606219 to
f159bb6
Compare
| # Check migration failure | ||
| mocked_time = int(time.time()) | ||
| self.master_node.setmocktime(mocked_time) | ||
| assert_raises_rpc_error(-4, "last wallet synchronisation goes beyond pruned data. You need to -reindex (download the whole blockchain again in case of pruned node)", self.master_node.migratewallet, wallet_name=wallet_name) | ||
| self.master_node.setmocktime(0) | ||
|
|
||
| # Verify the wallet is stil BDB and backup is there. | ||
| self.restart_node(0, ["-reindex", "-nowallet"]) | ||
| self.connect_nodes(0, 1) | ||
| self.master_node.loadwallet(wallet_name) | ||
| migrated = self.master_node.get_wallet_rpc(wallet_name) | ||
| info = migrated.getwalletinfo() | ||
| assert_equal(info["descriptors"], False) | ||
| assert_equal(info["format"], "bdb") | ||
| backup_path = self.master_node.wallets_path / wallet_name / f"{wallet_name}_{mocked_time}.legacy.bak" | ||
| assert backup_path.exists() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this test passing?
We first attempt to migrate the wallet and fail, then call loadwallet with the same bdb wallet when v29 has them disabled by default.
The master_node is not restarted with -deprecatedrpc=create_bdb so master_node.loadwallet should be throwing a "Build does not support Berkeley DB database format".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update: please see #34222 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-deprecatedrpc=create_bdb is only necessary for createwallet, not loadwallet.
test/functional/wallet_migration.py
Outdated
| def unsynced_wallet_on_pruned_node_fails(self): | ||
| self.log.info("Test migration of an unsynced wallet on a pruned node fails gracefully") | ||
| wallet_name = "pruned" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 4831bfa:
Why not the unnamed wallet? The error was there, not in a named one.
update: please see #34222 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the unnamed wallet will hit the filesystem error and not the pruned error.
|
Created a branch with all fixes in separate commits. https://github.com/furszy/bitcoin-core/tree/pr34222
|
When migrating any legacy unnamed wallet, a failed migration would cause the cleanup logic to remove its parent directory. Since this type of legacy wallet lives directly in the main '/wallets/' folder, this resulted in unintentionally erasing all wallets, including the backup file. To be fully safe, we will no longer call `fs::remove_all`. Instead, we only erase the individual db files we have created, leaving everything else intact. The created wallets parent directories are erased only if they are empty. As part of this last change, `RestoreWallet` was modified to allow an existing directory as the destination, since we no longer remove the original wallet directory (we only remove the files we created inside it). This also fixes the restore of top-level default wallets during failures, which were failing due to the directory existence check that always returns true for the /wallets/ directory. This bug started after: bitcoin@f6ee59b Previously, the `fs::copy_file` call was failing for top-level wallets, which prevented the `fs::remove_all` call from being reached. Github-Pull: bitcoin#34156 Rebased-From: f4c7e28
f159bb6 to
367c140
Compare
|
Taken @furszy's suggestions |
| fs::copy_file(backup_path, temp_backup_location, fs::copy_options::none); | ||
| fs::rename(backup_path, temp_backup_location); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that by changing this line, the error that 29.x would previously hit on an unnamed wallet migration failure is removed, and would introduce the bug present on 30.0 here. However, this is safe to do because of the other changes in this commit which prevent the deletions.
367c140 to
6fb06b8
Compare
Verifies that a failed migration of the unnamed (default) wallet does not erase the main /wallets/ directory, and also that the backup file exists. Github-Pull: bitcoin#34156 Rebased-From: 36093bd
…rune failure The first test verifies that restoring into an existing empty directory or a directory with no .dat db files succeeds, while restoring into a dir with a .dat file fails. The second test covers restoring into the default unnamed wallet (wallet.dat), which also implicitly exercises the recovery path used after a failed migration. The third test covers failure during restore on a prune node. When the wallet last sync was beyond the pruning height. Github-Pull: bitcoin#34156 Rebased-From: f011e0f
Right now, after migration the last message users see is "migration completed", but the migration isn't actually finished yet. We still need to load the new wallets to ensure consistency, and if that fails, the migration will be rolled back. This can be confusing for users. This change logs the post-migration loading step and if a wallet fails to load and the migration will be rolled back. Github-Pull: bitcoin#34156 Rebased-From: d70b159
Because the default wallet has no name, the watch-only and solvables wallets created during migration end up having no name either. This fixes it by applying the same prefix name we use for the backup file for an unnamed default wallet. Before: watch-only wallet named "_watchonly" After: watch-only wallet named "default_wallet_watchonly" Github-Pull: bitcoin#34156 Rebased-From: 82caa81
…eight Github-Pull: bitcoin#34156 Rebased-From: b7c34d0
test/functional/wallet_migration.py
Outdated
|
|
||
| mocked_time = int(time.time()) | ||
| self.master_node.setmocktime(mocked_time) | ||
| assert_raises_rpc_error(-1, "filesystem error: cannot copy file: File exists", self.migrate_and_get_rpc, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In c95b112: this should be:
assert_raises_rpc_error(-4, "Failed to create database", self.migrate_and_get_rpc, "")
Already fixing that
Also, should pull cbf0bd3 from #34221.
Remember that this is setting the mock time twice, one outside
migrate_and_get_rpcand another time inside it.
migrate_and_get_rpc in 29.x does not do setmocktime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
migrate_and_get_rpc in 29.x does not do setmocktime.
👍🏼 .
Refactor a common way to perform the failed migration test that exists for default wallets, and add relative-path wallets and absolute-path wallets. Github-Pull: 34226 Rebased-From: eeaf28d
6fb06b8 to
76cdeb7
Compare
furszy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 76cdeb7
brunoerg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
light code review ACK 76cdeb7 + backported the functional tests without the fixes and all of them failed accordingly.
bruno@bruno-MS-Challenger-B850M-PLUS:~/projects/bitcoin((HEAD detached at v29.2))$ ./build/test/functional/tool_wallet.py
...
2026-01-12T18:20:20.949000Z TestFramework (ERROR): Unexpected exception
Traceback (most recent call last):
File "/home/bruno/projects/bitcoin/test/functional/test_framework/test_framework.py", line 135, in main
self.run_test()
File "/home/bruno/projects/bitcoin/./build/test/functional/tool_wallet.py", line 853, in run_test
self.test_dump_createfromdump()
File "/home/bruno/projects/bitcoin/./build/test/functional/tool_wallet.py", line 601, in test_dump_createfromdump
assert self.nodes[0].wallets_path.exists()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionErrorbruno@bruno-MS-Challenger-B850M-PLUS:~/projects/bitcoin((HEAD detached at v29.2))$ ./build/test/functional/wallet_backup.py
...
2026-01-12T18:21:24.258000Z TestFramework (ERROR): Unexpected exception
Traceback (most recent call last):
File "/home/bruno/projects/bitcoin/test/functional/test_framework/util.py", line 160, in try_rpc
fun(*args, **kwds)
File "/home/bruno/projects/bitcoin/test/functional/test_framework/coverage.py", line 50, in __call__
return_val = self.auth_service_proxy_instance.__call__(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bruno/projects/bitcoin/test/functional/test_framework/authproxy.py", line 151, in __call__
raise JSONRPCException(response['error'], status)
test_framework.authproxy.JSONRPCException: Failed to create database path '/tmp/bitcoin_func_test_h2_o1g4s/node3/regtest/wallets/res0'. Database already exists. (-36)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/bruno/projects/bitcoin/test/functional/test_framework/test_framework.py", line 135, in main
self.run_test()
File "/home/bruno/projects/bitcoin/./build/test/functional/wallet_backup.py", line 359, in run_test
self.restore_wallet_existent_name()
File "/home/bruno/projects/bitcoin/./build/test/functional/wallet_backup.py", line 160, in restore_wallet_existent_name
assert_raises_rpc_error(
File "/home/bruno/projects/bitcoin/test/functional/test_framework/util.py", line 151, in assert_raises_rpc_error
assert try_rpc(code, message, fun, *args, **kwds), "No exception raised"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bruno/projects/bitcoin/test/functional/test_framework/util.py", line 166, in try_rpc
raise AssertionError(
AssertionError: Expected substring not found in error message:
substring: 'Failed to restore wallet. Database file exists in '/tmp/bitcoin_func_test_h2_o1g4s/node3/regtest/wallets/res0/wallet.dat'.'
error message: 'Failed to create database path '/tmp/bitcoin_func_test_h2_o1g4s/node3/regtest/wallets/res0'. Database already exists.'.bruno@bruno-MS-Challenger-B850M-PLUS:~/projects/bitcoin((HEAD detached at v29.2))$ ./build/test/functional/wallet_migration.py
...
2026-01-12T18:23:08.626000Z TestFramework (ERROR): Unexpected exception
Traceback (most recent call last):
File "/home/bruno/projects/bitcoin/test/functional/test_framework/test_framework.py", line 135, in main
self.run_test()
File "/home/bruno/projects/bitcoin/./build/test/functional/wallet_migration.py", line 1873, in run_test
self.test_migration_failure(wallet_name=wallet_name)
File "/home/bruno/projects/bitcoin/./build/test/functional/wallet_migration.py", line 773, in test_migration_failure
assert_raises_rpc_error(
File "/home/bruno/projects/bitcoin/test/functional/test_framework/util.py", line 151, in assert_raises_rpc_error
assert try_rpc(code, message, fun, *args, **kwds), "No exception raised"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: No exception raised
glozow
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
light review ACK 76cdeb7.
Did a manual cherry-pick, diff is substantial but nothing seemed strange.
Done |
Backports:
Note that this backport is unclean and several changes have to be made to most commits to accommodate BDB and the differences in migration cleanup behavior.