-
Notifications
You must be signed in to change notification settings - Fork 38.7k
wallet: fast rescan with BIP157 block filters for descriptor wallets #25957
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wallet: fast rescan with BIP157 block filters for descriptor wallets #25957
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsNo conflicts as of last run. |
|
Concept ACK Limiting the functionality to descriptor wallets is fine. We could add an argument to Is there anything in @jamesob's #23549 you can reuse here? The I tried a rescan from genesis on a simple mainnet wallet. It was blazing fast (in comparison) and it found all transactions. It's also slightly faster than the little helper script I wrote for #23549. Bonus feature for a followup: fetch any matching pruned blocks. |
8f9687d to
d0114e0
Compare
|
Rebased on master, with a few changes and an extra commit (see below). Thanks for your initial feedback, much appreciated!
Agree, I think this sounds like a useful follow-up idea. Not that I could see; on contrast to scanblocks, the filter set is created from wallet-internal source rather than external user input and also has to be updated continuously.
Done, added an extra commit that documents the speedup possibilty for all RPCs that work on descriptor wallet and (possibly) involve a rescan (
🚀 🚀 🚀
Sounds like a charming idea 👌 Note that this PR is covered in the PR review club for next wednesday, hosted by LarryRuane. |
stickies-v
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
It might be too controversial to be part of this PR, but I wonder (not suggesting, yet) if after this PR we should enable blockfilterindex by default when a user has at least one descriptor wallet?
LarryRuane
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK, I'll try to do a more complete code review in the next few days.
d0114e0 to
f9052af
Compare
|
Force-pushed with most of the suggestions from @stickies-v and @LarryRuane. Thanks for your reviews, very good points! |
|
Suggest renaming to "turbo rescan" for more review :-) |
|
Performance test report: With the master branch, and an empty wallet (which is the best case for this PR), on mainnet, the On signet, the PR branch is actually slower; this is probably due to the fact that so many blocks are empty or nearly empty; the time needed to check the filter is greater than the time to check the block directly. But of course, mainnet is what really matters. |
amovfx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK,
This looks like it is over my head but I'm going to give this a shot for the pr review club.
e95d9c8 to
22486ea
Compare
|
Force-pushed again with the following changes:
|
22486ea to
137347f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have the skill to really review this. But I ran this over testnet and def got a speed improvement. 3:08.55 vs 3:50.80. This looks like a win.
blockfilterindex=1
{ "start_height": 0, "stop_height": 2345883 } /Users/andrewoseen/git/bitcoin/build/src/bitcoin-cli -chain=test 0.00s user 0.00s system 0% cpu 3:08.55 total
blockfilterindex=0
{ "start_height": 0, "stop_height": 2345883 } /Users/andrewoseen/git/bitcoin/build/src/bitcoin-cli -chain=test 0.00s user 0.00s system 0% cpu 3:50.80 total
Feels like a footgun. Would rather a param that migrates without "loading". But I suspect extending this to support legacy wallets (outside the scope of this PR) wouldn't be very hard either. |
src/wallet/wallet.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about putting this in a separate function that returns bool skip_block? CWallet::ScanForWalletTransactions() is already quite bloated, I think we should try not to add to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought about that already and agree that bloat should be avoided, but couldn't come up with a satisfying function name yet (CanBlockBeSkippedDueToFastRescanFilter?) and was concerned that the "fast rescan" functionality in the wallet would become too scattered. One class and once place where to use this class seems to be more compact and self-contained than class + function (that passes the class instance) + another function where this is called.
Any conrete suggestions there? Happy to follow-up. (Another possible variant would be to put the check directly into FastWalletRescanFilter, as a method? Not sure.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of moving (part of the logic) to FastWalletRescanFilter, yeah. What do you think about the below diff? I didn't test it very carefully (tests pass though) but it seems equivalent and imo more readable?
Details
diff --git a/src/wallet/wallet.cpp b/src/wallet/wallet.cpp
index d3f1fb418..fdc5fdc7f 100644
--- a/src/wallet/wallet.cpp
+++ b/src/wallet/wallet.cpp
@@ -282,6 +282,14 @@ public:
}
}
+ bool BlockNotInFilter(uint256 block_hash)
+ {
+ UpdateIfNeeded();
+ auto block_in_filter{MatchesBlock(block_hash)};
+ return block_in_filter.value_or(true) == false;
+ }
+
+private:
void UpdateIfNeeded()
{
// repopulate filter with new scripts if top-up has happened since last iteration
@@ -300,7 +308,6 @@ public:
return m_wallet.chain().blockFilterMatchesAny(BlockFilterType::BASIC, block_hash, m_filter_set);
}
-private:
const CWallet& m_wallet;
/** Map for keeping track of each active descriptor's last seen end range.
* This information is used to detect whether new addresses were derived
@@ -1843,16 +1850,7 @@ CWallet::ScanResult CWallet::ScanForWalletTransactions(const uint256& start_bloc
WalletLogPrintf("Still rescanning. At block %d. Progress=%f\n", block_height, progress_current);
}
- bool skip_block{false};
- if (fast_rescan_filter) {
- fast_rescan_filter->UpdateIfNeeded();
- auto match_result{fast_rescan_filter->MatchesBlock(block_hash)};
- if (match_result.has_value() && !match_result.value()) {
- result.last_scanned_block = block_hash;
- result.last_scanned_height = block_height;
- skip_block = true;
- }
- }
+ bool skip_block{fast_rescan_filter ? fast_rescan_filter->BlockNotInFilter(block_hash) : false};
// Read block data
CBlock block;
@@ -1865,7 +1863,10 @@ CWallet::ScanResult CWallet::ScanForWalletTransactions(const uint256& start_bloc
uint256 next_block_hash;
chain().findBlock(block_hash, FoundBlock().inActiveChain(block_still_active).nextBlock(FoundBlock().inActiveChain(next_block).hash(next_block_hash)));
- if (!skip_block) {
+ if (skip_block) {
+ result.last_scanned_block = block_hash;
+ result.last_scanned_height = block_height;
+ } else {
if (!block.IsNull()) {
LOCK(cs_wallet);
if (!block_still_active) {
Edit: not blocking of course, can be a follow-up too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would prefer that for a follow-up, could possibly even be a larger one that generally tries to divide up CWallet::ScanForWalletTransactions() into more functions to reduce bloat. Just one thing about naming, though I also unfortunately don't have a better idea: BlockNotInFilter suggests that this method only retrieves information and as such doesn't have any side-effects (but it does, due to the UpdateIfNeeded call).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially I agreed with your view (and had the same consideration when proposing) about BlockNotInFilter having side-effects. However, in this case upon further thinking I'm not sure that's a problem:
- both
BlockNotInFilter()andUpdateIfNeeded()are idempotent, we don't need to be concerned about running it "too often".UpdateIfNeeded()only has internal side-effects. UpdateIfNeeded()is a separate/manual function to improve performance, so we don't continuously need to monitor/update the filter when it's not needed. All it does, is ensure the filter works as expected. From that perspective, I don't thinkBlockNotInFilter()has the kind of side-effects we should normally be worried about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good valid points, I tend to agree that in this case the side-effect is acceptable and we shouldn't worry. With the current version of the PR that introduced logging messages on two cases of the tri-state result (block filter not found or wallet filter matches block), as suggested by Sjors, it get's maybe a bit more difficult to put all this logic into a single method though; the logging would also need to be part of the method, if we want to distinguish between the concrete reason of why we don't skip a block. That said, I'm of course still open though for a potential follow-up that reduces ScanForWalletTransaction bloat.
137347f to
f871d46
Compare
This is useful for speeding up wallet rescans and is based on an earlier version from PR bitcoin#15845 ("wallet: Fast rescan with BIP157 block filters"), which was never merged. Co-authored-by: MacroFake <[email protected]>
…index This extra method will be needed for updating the filter set for faster wallet rescans; after an internal top-up has happened, we only want to add the newly created scriptPubKeys.
This only supports wallet descriptors right now.
Can be reviewed with `--ignore-all-space`.
For that purpose, a new logging category BCLog::SCAN is introduced.
6fc1f5f to
0582932
Compare
|
Force-pushed with the review suggestions from @furszy and @achow101 taken into account (range-diff):
Re-review would be much appreciated! |
|
ACK 0582932 |
aureleoules
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 0582932 - minor changes, documentation and updated test since last review
|
re-utACK 0582932 |
w0xlt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-ACK 0582932
7832883 doc: add release note for #25957 (fast wallet rescan) (Sebastian Falbesoner) Pull request description: This PR adds a missing release note for #25957. ACKs for top commit: Sjors: ACK 7832883 Tree-SHA512: 817aa3d27b3f839de3975ace7c8ec59bcc4dbe4b5628bf64153e503cd143599d8923bd7e181ad5b196dacf1a9078347825bc40d4de5c6e2df9ed12e752217094
…llet rescan) 7832883 doc: add release note for bitcoin#25957 (fast wallet rescan) (Sebastian Falbesoner) Pull request description: This PR adds a missing release note for bitcoin#25957. ACKs for top commit: Sjors: ACK 7832883 Tree-SHA512: 817aa3d27b3f839de3975ace7c8ec59bcc4dbe4b5628bf64153e503cd143599d8923bd7e181ad5b196dacf1a9078347825bc40d4de5c6e2df9ed12e752217094
* script, test: fix python linter E275 errors with flake8 5.0.4 * script: update python linter dependencies * wallet: simplify ListCoins implementation Can remove the locked coins lookup if we include them directly inside the AvailableCoins result * wallet: unify FindNonChangeParentOutput functions The function is only used in ListCoins. * Use same Python executable for subprocesses as for all-lint.py * tests: Use unique port for ZMQ tests The ZMQ interface tests should use unique ports as we do for the p2p and rpc ports so that multiple instances of the test can be run at the same time. * test: add coverage for cross-chain wallet restore * refactor: use braced init for integer constants instead of c style casts * rpc: Sanitize label name in various RPCs - importprivkey - importaddress - importpubkey - listtransactions - listsinceblock - importmulti - importdescriptors * compat: use STDIN_FILENO over 0 This is already used throughout this file, and is self-documenting. * debian: remove nonexistent files from copyright * refactor: Add sanity checks in LabelFromValue * test: Invalid label name coverage * refactor: use convenience fn to auto parse non-string parameters Minimizes code duplication and improves function naming by having a single (overloaded) convenience function that both checks if the parameter is a non-string parameter and automatically parses the value if so. * refactor: add kernel/cs_main.* Co-authored-by: Anthony Towns <[email protected]> * test: Fix feature_startupnotify intermittent issue * refactor: Make `ThreadHTTP` return void The `bool` return value was introduced in 755aa05. It has been not used since 8d3f46e. No behavior change. * build: remove already tested headers from AC_CHECK_HEADERS These headers are already included in a default set which are checked early during configure. We already use at least sys/types.h and unistd.h unconditionally in configure. * refactor: Work around Werror=free-nonheap-object in AssumeCalculateMemPoolAncestors * build: fix configuring with only bitcoin-util Fixes the issue presented in bitcoin#25037 in a single (easily backportable) diff, with no additional refactoring/changes. Can be tested with: ```bash ./configure \ --disable-tests \ --disable-bench \ --without-libs \ --without-daemon \ --without-gui \ --disable-fuzz-binary \ --without-utils \ --enable-util-util ``` * random: remove windows-only compat.h include in randomenv Note that this was probably only here to indirectly receive windows.h via another include in compat.h (windows.h or winreg.h aren't included there). Also note that compat.h is already pulled in here for everyone via util/time.h, so including inside a windows only ifdef is secondarily redundant. * randomenv: consolidate WIN32 #ifdefs Order includes. Remove // for xyz comments * test: Add signs P2TR and RAWSCRIPT to MiniWallet sign_tx * test: Add "include immature coinbase" flag to MiniWallet get_utxos * test: Add "include mempool" flag to MiniWallet rescan_utxos * test: Run feature_bip68_sequence.py with MiniWallet * doc: Update Boost version in doc/dependencies.md * scripts: add PE Canary check to security-check * rpc: Return accurate results for scanblocks This makes use of undo data to accurately verify results from blockfilters. * p2p, rpc: don't allow past absolute timestamp in `setban` * test: add coverage for absolute timestamp in `setban` * [validation] when quitting early in AcceptPackage, set package_state and tx result Bug: not setting package_state means package_state.IsValid() == true and the caller does not know that this failed. We won't be validating this transaction again, so it makes sense to return this failure to the caller. Rename package_state to package_state_quit_early to make it more clear what this variable is used for and what its scope is. Co-authored-by: Greg Sanders <[email protected]> * [test] package validation quits early due to non-policy, non-missing-inputs failure * [validation] return effective feerate from mempool validation * ci: Fix ci_native_fuzz_msan CONTAINER_NAME * ci: Create named symbol for BINS_SCRATCH_DIR Also, create the dir a bit earlier, right after it was put in the PATH. * ci: Remove unused busybox workaround The find workaround is no longer needed after commit d3d547c * ci: Run one task with all tests on credits * doc: clarify -i2pacceptincoming help documentation and also hoist the default setting to a constexpr and remove unused f-string operators in a related functional test. * doc: update bandwidth section of I2P documentation * doc: update/clarify/de-emphasize I2P transient address section * doc: remove recommended I2P router versions as these go stale and users will generally install the current versions available. * [validation] return wtxids of other transactions whose fees were used * [rpc] return effective-feerate in testmempoolaccept and submitpackage * [rpc] return effective-includes in testmempoolaccept and submitpackage * [validation] remove PackageMempoolAcceptResult::m_package_feerate This value creates an extremely confusing interface as its existence is dependent upon implementation details (whether something was submitted on its own, etc). MempoolAcceptResult::m_effective_feerate is much more helpful, as it always exists for submitted transactions. * [doc] release note effective-feerate and effective-includes RPC results No release note for submitpackage because it is regtest-only. * [refactor] rename variables in AcceptPackage for clarity * [validation] return MempoolAcceptResult for every tx on PCKG_TX failure This makes the interface more predictable and useful. The caller understands one or more transactions failed, and can learn what happened with each transaction. We already have this information, so we might as well return it. It doesn't make sense to do this for other PackageValidationResult values because: - PCKG_RESULT_UNSET: this means everything succeeded, so the individual failures are no longer accurate. - PCKG_MEMPOOL_ERROR: something went wrong with the mempool logic; transaction failures might not be meaningful. - PCKG_POLICY: this means something was wrong with the package as a whole. The caller should use the PackageValidationState to find the error, rather than looking at individual MempoolAcceptResults. * doc: net: fix link to onion address encoding scheme [ONIONADDRESS] Instead of referring to a fixed line number to a file in master (which is obviously always quickly outdated), use a permalink tied to the latest commit. * doc: fix up -netinfo relaytxes help Co-authored-by: "MarcoFalke <*~=`'#}+{/-|&$^[email protected]>" * scripted-diff: ci: Rework docker naming DOCKER in names is confusingly used as synonym for "image", "container", and "ci". Fix the confusion by picking the term that fits the context. -BEGIN VERIFY SCRIPT- ren() { sed -i "s:$1:$2:g" $( git grep -l "$1" ) ; } ren DOCKER_PACKAGES CI_BASE_PACKAGES # This better reflects that they are the common base for all CI # containers. ren DOCKER_ID CI_CONTAINER_ID # This is according to the documentation of "--detach , -d: Run # container in background and print container ID". ren DOCKER_NAME_TAG CI_IMAGE_NAME_TAG # This avoids confusing with CONTAINER_NAME and clarifies that it is an # image. ren DOCKER_ADMIN CI_CONTAINER_CAP # This clarifies that it is a capability added to the container. ren DOCKER_CI_CMD_PREFIX CI_EXEC_CMD_PREFIX # This brings it in line with the CI_EXEC naming. -END VERIFY SCRIPT- * ci: Stop and remove CI container * test: Fix intermittent timeout in p2p_permissions.py * test: Fix wrong types passed to RPCs * rpc: Run type check against RPCArgs * doc: add databases/py-sqlite3 to FreeBSD test suite deps * Wallet/RPC: Allow specifying min & max chain depth for inputs used by fund calls Enables users to craft BIP-125 replacements with changes to the output list, ensuring that if additional funds are needed they will be added. * rpc: add minconf and maxconf options to sendall * doc: move errant release note to doc/ * test: skip sqlite3 tests if it isn't available Fixes bitcoin#26819. Related too bitcoin#26873. * doc: remove usages of C++11 Now it's just the standard library. * Update to mention restoring wallet via GUI * Change dots to an ellipsis and fix capitalization Matches ellipsis usage in the "Restore" section. * test: remove `-spendzeroconfchange` setting from mempool_limit.py Since this test was changed to use MiniWallet instead of the Bitcoin Core wallet (see commit d447ded), the setting doesn't have any effect and hence can be removed. * ci: Bump vcpkg to the latest version 2023.01.09 Dependency changes (2022.09.27 - 2023.01.09): - boost 1.80.0#0 -> 1.81.0#0 - libevent 2.1.12#6 -> libevent 2.1.12#7 - sqlite3 3.39.2#0 -> 3.40.0#1 * ci: Bump `ccache` version to the latest 4.7.4 in "Win64 native" task * test: wallet: add coverage for `-spendzeroconfchange` setting * build: allow NO_BOOST=1 in depends * build: allow NO_LIBEVENT=1 in depends * test: refactor: simplify p2p_permissions.py by using MiniWallet Also, use the pre-mined chain of the test framework rather than mining 100 blocks manually on each run. * test: Remove redundant function call * ci: Use pyenv's `python-build` to install Python in lint task * ci: Bump lint task image to Ubuntu Jammy * test: add `rescan_utxos` in MiniWallet's initialization this simplifies usage when MiniWallet is used with a pre-mined chain. * test: remove redundant blocks generation logic those tests already have enough mature utxos from the pre-mined chain. * test: simplify tests by using the pre-mined chain * test: Return wtxid from create_self_transfer_multi This is not used right now, but may be in the future. Also, it simplifies the create_self_transfer return logic * test: Refactor MiniWallet sign_tx To make the code less verbose and easier to read. * test: Return chain of MiniWallet txs from MiniWallet chain method * test: Return fee from MiniWallet * test: Run mempool_packages.py with MiniWallet * build: move rpc/request from util lib to common This is JSON RPC request code that doesn't need to be in util, and should not be required by the kernel. * test: Add test for missing and omitted required arg * doc: Fix incorrect sendmany RPC doc This enables the type check and fixes the wrong docs. Otherwise the enabled check would lead to test errors, such as: > "wallet_labels.py", line 96, in run_test > node.sendmany( > > test_framework.authproxy.JSONRPCException: > JSON value of type null is not of expected type string (-3) * refactor: Remove const to fix performance-move-const-arg clang-tidy errors The warnings look like: src/rpc/util.h:192:19: error: std::move of the const variable 'name' has no effect; remove std::move() or make the variable non-const [performance-move-const-arg,-warnings-as-errors] : m_names{std::move(name)}, ^~~~~~~~~~ ~ * refactor: Introduce is_top_level_arg * doc: Properly report optional RPC args * refactor: Remove duplication of clang-tidy's check names * ci: Add missing lint dependency * hash: add HashedSourceWriter This class is the counterpart to CHashVerifier, in that it writes data to an underlying source stream, while keeping a hash of the written data. * addrdb: Only call Serialize() once The previous logic would call it once for serializing into the filestream, and then again for serializing into the hasher. If AddrMan was changed in between these calls by another thread, the resulting peers.dat would be corrupt with non-matching checksum and data. Fix this by using HashedSourceWriter, which writes the data to the underlying stream and keeps track of the hash in one go. * ci: Bump --combinedlogslen to debug intermittent issues * doc: Clarify debian copyright comment * Bump minimum python version to 3.7 * Revert "contrib: Fix capture_output in getcoins.py" This reverts commit be59bd1 because the changes are no longer needed. * scripted-diff: Use new python 3.7 keywords -BEGIN VERIFY SCRIPT- sed -i 's/universal_newlines/text/g' $(git grep -l universal_newlines) -END VERIFY SCRIPT- * test: add an easy way to run linters locally Adds a Dockerfile configuration that allows straightforward running of linters with compatible versions locally. This removes a ton of annoyance when trying to appease CI, because many of the linter versions are quite old and difficult to maintain locally. I realize that people may not be thrilled to more ancillary tooling to the repo, but I think this makes a lot of sense given the linter versions listed in this container configuration are dictated by this repo (within the CI configuration), so having these things live in two separate places is a recipe for version mismatches. Eventually we can likely just use this container on CI directly to avoid any chance of inconsistencies between local dev experience and CI. * lint: specify the right commit range when running locally When running lints on Cirrus, a special envvar is set ($CIRRUS_PR); emulate this when running linters locally by setting $LOCAL_BRANCH to any value. * clang-tidy: Fix `performance-move-const-arg` in headers See https://clang.llvm.org/extra/clang-tidy/checks/performance/move-const-arg.html * clang-tidy: Fix `performance-no-automatic-move` in headers See https://clang.llvm.org/extra/clang-tidy/checks/performance/no-automatic-move.html * doc: add release note for bitcoin#25957 (fast wallet rescan) * test: refactor: simplify p2p_tx_download.py by using MiniWallet * test: refactor: simplify p2p_eviction.py by using MiniWallet Also, use the pre-mined chain of the test framework rather than mining 100 blocks manually on each run. * Add missing includes to fix gcc-13 compile error * RPC: make RPCResult::MatchesType return useful errors * [fuzz] Actually use mocked mempool in tx_pool target * rpc: Throw more user friendly arg type check error * mempool: Don't share mempool with dbcache in blocksonly When -blockonly is set, reduce mempool size to 5MB unless -maxmempool is also set. See bitcoin#9569 * doc: Update blocksonly behaviour in reduce-memory Changes to the default mempool allocation size now documented. Provides users with guidance on the mempool implications of -blocksonly mode, along with instructions on how to re-enable old behaviour. * doc: release note on mempool size in -blocksonly Adds a release note detailing the new mempool sizing behaviour when running in blocksonly mode, and instruction on how to override the new defaults. * Add unit test for ComputeTapleafHash * test: Fix intermittent feature_rbf issue * scripted-diff: use RPCArg::Optional::OMITTED over OMITTED_NAMED_ARG -BEGIN VERIFY SCRIPT- sed -i -e "/Deprecated alias for OMITTED, can be removed/d" src/rpc/util.h src/rpc/util.cpp sed -i -e "s/OMITTED_NAMED_ARG/OMITTED/g" $(git grep -l "OMITTED_NAMED_ARG" src/) -END VERIFY SCRIPT- * doc: improve doc for RPCArg::Optional::OMITTED * depends: fix systemtap download URL * depends: systemtap 4.8 * build: use more recommended sqlite3 compile options See https://www.sqlite.org/compile.html. DSQLITE_DQS > This setting disables the double-quoted string literal misfeature. DSQLITE_DEFAULT_MEMSTATUS > This setting causes the sqlite3_status() interfaces that track > memory usage to be disabled. > This helps the sqlite3_malloc() routines run much faster, and since > SQLite uses sqlite3_malloc() internally, this helps to make the > entire library faster. DSQLITE_OMIT_DEPRECATED > Omitting deprecated interfaces and features will not help SQLite > to run any faster. > It will reduce the library footprint, however. And it is the > right thing to do. DSQLITE_OMIT_SHARED_CACHE > Omitting the possibility of using shared cache allows many > conditionals in performance-critical sections of the code to be > eliminated. This can give a noticeable improvement in performance. Also: https://www.sqlite.org/sharedcache.html > Shared-cache mode is an obsolete feature. > The use of shared-cache mode is discouraged. > Most use cases for shared-cache are better served by WAL mode. > Applications that build their own copy of SQLite from source code > are encouraged to use the -DSQLITE_OMIT_SHARED_CACHE compile-time > option, as the resulting binary will be both smaller and faster. DSQLITE_OMIT_JSON Starting with sqlite 3.38.0 the JSON extension became opt-out rather than opt-in, so we disable it here. --disable-rtree > An R-Tree is a special index that is designed for doing range queries. > R-Trees are most commonly used in geospatial systems... https://www.sqlite.org/rtree.html --disable-fts4 --disable-fts5 > FTS5 is an SQLite virtual table module that provides full-text > search functionality to database applications. DSQLITE_LIKE_DOESNT_MATCH_BLOBS > simplifies the implementation of the LIKE optimization and allows > queries that use the LIKE optimization to run faster. DSQLITE_OMIT_DECLTYPE > By omitting the (seldom-needed) ability to return the declared type of > columns from the result set of query, prepared statements can be made > to consume less memory. DSQLITE_OMIT_PROGRESS_CALLBACK > By omitting this interface, a single conditional is removed from the > inner loop of the bytecode engine, helping SQL statements to run slightly > faster. DSQLITE_OMIT_AUTOINIT > with the SQLITE_OMIT_AUTOINIT option, the automatic initialization is omitted. > This helps many API calls to run a little faster > it also means that the application must call sqlite3_initialize() manually. * build: pass --enable-debug to sqlite when DEBUG=1 * wallet: permit mintxfee=0 Fixes bitcoin#26797 Permit nodes to use a mintxfee of `0` if they choose. Values below 0 are handled by the ParseMoney() check. * test: Avoid rpc timeout in p2p_headers_sync_with_minchainwork * [block encodings] Make CheckBlock mockable for PartiallyDownloadedBlock * [block encodings] Avoid fuzz blocking asserts in PartiallyDownloadedBlock * [fuzz] Add PartiallyDownloadedBlock target * [fuzz] Assert that omitting missing transactions always fails block reconstruction * build: fix usage of -Wloop-analysis Looks like I introduced this in 5ced925. * depends: systemtap: remove variadic params that trigger compiler warnings * init: Remove sensitive flag from rpcbind * refactor: Remove c_str from util/check * streams: Add DataStream without ser-type and ser-version The moved parts can be reviewed with "--color-moved=dimmed-zebra". The one-char changes can be reviewed with "--word-diff-regex=.". * ci: Fix APPEND_APT_SOURCES_LIST trying to modify the host system * txorphange: Drop redundant originator arg from GetTxToReconsider * net_processing: only process orphans before messages Previously, when we processed a new tx we would attempt to ATMP any orphans that considered the new tx a parent immediately, but would only accept at most one such tx, leaving any others to be considered on a future run of ProcessMessages(). With this patch, we don't attempt any orphan processing immediately after receiving a tx, instead deferring all of them until the next call to ProcessMessages(). * net_processing: Don't process tx after processing orphans If we made progress on orphans, consider that enough work for this peer for this round of ProcessMessages. This also allows cleaning up the api for TxOrphange:GetTxToReconsider(). * net_processing: indicate more work to do when orphans are ready to reconsider When PR#15644 made orphan processing interruptible, it also introduced a potential 100ms delay between processing of the first and second newly reconsiderable orphan, because it didn't check if the orphan work set was non-empty after invoking ProcessMessage(). This adds that check, so that ProcessMessages() will return true if there are orphans to process, usually avoiding the 100ms delay in CConnman::ThreadMessageHandler(). * Use DataStream where possible * Remove unused CDataStream::SetType The last use was removed in the previous commit. * Comment mcl/bls libs * fix linter * fix linter * fix Makefile.am --------- Co-authored-by: Andrew Chow <[email protected]> Co-authored-by: Jon Atack <[email protected]> Co-authored-by: furszy <[email protected]> Co-authored-by: MarcoFalke <*~=`'#}+{/-|&$^[email protected]> Co-authored-by: Kristaps Kaupe <[email protected]> Co-authored-by: Sebastian Falbesoner <[email protected]> Co-authored-by: Pasta <[email protected]> Co-authored-by: glozow <[email protected]> Co-authored-by: fanquake <[email protected]> Co-authored-by: Aurèle Oulès <[email protected]> Co-authored-by: stickies-v <[email protected]> Co-authored-by: Anthony Towns <[email protected]> Co-authored-by: Hennadii Stepanov <[email protected]> Co-authored-by: Miles Liu <[email protected]> Co-authored-by: brunoerg <[email protected]> Co-authored-by: Greg Sanders <[email protected]> Co-authored-by: Juan Pablo Civile <[email protected]> Co-authored-by: ishaanam <[email protected]> Co-authored-by: John Moffett <[email protected]> Co-authored-by: Kolby ML <[email protected]> Co-authored-by: kouloumos <[email protected]> Co-authored-by: Martin Zumsande <[email protected]> Co-authored-by: James O'Beirne <[email protected]> Co-authored-by: dergoegge <[email protected]> Co-authored-by: willcl-ark <[email protected]> Co-authored-by: Cory Fields <[email protected]> Co-authored-by: alex v <[email protected]>
Description
This PR is another take of using BIP 157 block filters (enabled by
-blockfilterindex=1) for faster wallet rescans and is a modern revival of #15845. For reviewers new to this topic I can highly recommend to read the corresponding PR review club (https://bitcoincore.reviews/15845).The basic idea is to skip blocks for deeper inspection (i.e. looking at every single tx for matches) if our block filter doesn't match any of the block's spent or created UTXOs are relevant for our wallet. Note that there can be false-positives (see https://bitcoincore.reviews/15845#l-199 for a PR review club discussion about false-positive rates), but no false-negatives, i.e. it is safe to skip blocks if the filter doesn't match; if the filter does match even though there are no wallet-relevant txs in the block, no harm is done, only a little more time is spent extra.
In contrast to #15845, this solution only supports descriptor wallets, which are way more widespread now than back in the time >3 years ago. With that approach, we don't have to ever derive the relevant scriptPubKeys ourselves from keys before populating the filter, and can instead shift the full responsibility to that to the
DescriptorScriptPubKeyManwhich already takes care of that automatically. Compared to legacy wallets, theIsMinelogic for descriptor wallets is as trivial as checking if a scriptPubKey is included in the ScriptPubKeyMan's set of scriptPubKeys (m_map_script_pub_keys):bitcoin/src/wallet/scriptpubkeyman.cpp
Lines 1703 to 1710 in e191fac
One of the unaddressed issues of #15845 was that the filter was only created once outside the loop and as such didn't take into account possible top-ups that have happened. This is solved here by keeping a state of ranged
DescriptorScriptPubKeyMan's descriptor end ranges and check at each iteration whether that range has increased since last time. If yes, we update the filter with all scriptPubKeys that have been added since the last filter update with a range index equal or higher than the last end range. Note that finding new scriptPubKeys could be made more efficient than linearly iterating through the wholem_script_pub_keysmap (e.g. by introducing a bidirectional map), but this would mean introducing additional complexity and state and it's probably not worth it at this time, considering that the performance gain is already significant.Output scripts from non-ranged
DescriptorScriptPubKeyMans (i.e. ones with a fixed set of output scripts that is never extended) are added only once when the filter is created first.Benchmark results
Obviously, the speed-up indirectly correlates with the wallet tx frequency in the scanned range: the more blocks contain wallet-related transactions, the less blocks can be skipped due to block filter detection.
In a simple benchmark, a regtest chain with 1008 blocks (corresponding to 1 week) is mined with 20000 scriptPubKeys contained (25 txs * 800 outputs) each. The blocks each have a weight of ~2500000 WUs and hence are about 62.5% full. A global constant
WALLET_TX_BLOCK_FREQUENCYdefines how often wallet-related txs are included in a block. The created descriptor wallet (default setting ofkeypool=1000, we have 8*1000 = 8000 scriptPubKeys at the start) is backuped via thebackupwalletRPC before the mining starts and imported viarestorewalletRPC after. The measured time for taking this import process (which involves a rescan) once with block filters (-blockfilterindex=1) and once without block filters (-blockfilterindex=0) yield the relevant result numbers for the benchmark.The following table lists the results, sorted from worst-case (all blocks contain wallte-relevant txs, 0% can be skipped) to best-case (no blocks contain walltet-relevant txs, 100% can be skipped) where the frequencies have been picked arbitrarily:
Since even the (rather unrealistic) worst-case scenario of having wallet-related txs in every block of the rescan range obviously doesn't take significantly longer, I'd argue it's reasonable to always take advantage of block filters if they are available and there's no need to provide an option for the user.
Feedback about the general approach (but also about details like naming, where I struggled a lot) would be greatly appreciated. Thanks fly out to furszy for discussing this subject and patiently answering basic question about descriptor wallets!