-
Notifications
You must be signed in to change notification settings - Fork 38.7k
Batch write imported stuff in importmulti #15741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch write imported stuff in importmulti #15741
Conversation
|
utACK although it would be good to factor out the counting into a WalletWriteCache or something, that flushes itself at intervals so you don't have to do it manually. |
src/wallet/rpcdump.cpp
Outdated
| const CPubKey& pubkey = entry->second; | ||
| CPubKey temp; | ||
| if (!pwallet->GetPubKey(id, temp) && !pwallet->AddWatchOnly(GetScriptForRawPubKey(pubkey), timestamp)) { | ||
| if (!pwallet->GetPubKey(id, temp) && !pwallet->AddWatchOnlyWithDB(*batch, GetScriptForRawPubKey(pubkey), timestamp)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a weird indentation glitch here, although it looks like it was preexisting.
src/wallet/rpcdump.cpp
Outdated
| pwallet->SetAddressBook(dest, label, "receive"); | ||
| } | ||
| } | ||
| batch.reset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary? It's about to go out of scope, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not. I did that just as a belt-and-suspenders thing.
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
|
Empact@dd5b0aa Empact@c5d9eec How about moving this functionality into Lines 750 to 753 in ba54342
Practically speaking, this also explicitly limits the scope of the batch operations on Empact@b363aa7 Could also apply the batch treatment to |
src/wallet/wallet.h
Outdated
|
|
||
| /** Add a KeyOriginInfo to the wallet */ | ||
| bool AddKeyOrigin(const CPubKey& pubkey, const KeyOriginInfo& info); | ||
| bool AddKeyOriginWithDB(WalletBatch& batch, const CPubKey& pubkey, const KeyOriginInfo& info); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: May be a bad example to follow, but AddKeypoolPubkeyWithDB has batch as the last argument, which makes as much sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was following the AddKeyPubKeyWithDB way of argument ordering.
|
Concept ACK |
promag
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK, nice improvement.
src/wallet/rpcdump.cpp
Outdated
| throw JSONRPCError(RPC_WALLET_ERROR, "Error adding key to wallet"); | ||
| } | ||
| pwallet->UpdateTimeFirstKey(timestamp); | ||
| if (++cnt % 1000 == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the problem of one batch only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The batch can get very large in memory if lots of keys are involved, so we limit the size to avoid blowing up memory during imports.
|
See also:
|
508c486 to
7455c9d
Compare
|
I've pulled in @Empact's changes (Empact/bitcoin@dd5b0aa was squashed into 59ec412, cherry-picked the other two commits). |
|
Especially if there are now multiple places we're manually batching writes like this, and more where we're considering it: Let me suggest again and more strongly that the batch object should be handling autoflushing after every X calls, instead of open-coding it in the caller like this. If it's appropriate for every caller, it could go into WalletBatch, else into a new class that's an extremely thin wrapper around it. I bet the total number of lines added in this PR would actually go down. |
|
Concept ACK. I tried the first two commits (so without address book change) with #14145 and importing 2x 1000 keys seems about 6 times faster. |
|
@gwillen Agree it should be done but would be appropriate as a follow-up |
|
Looks like |
7455c9d to
27fdf12
Compare
|
I've implemented @gwillen's suggestion.
Done |
27fdf12 to
72ec512
Compare
src/wallet/wallet.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in commit ebb0ea85539b3e2f515fca89231c20ac86fb52d5:
Why is this no longer needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When batch goes out of scope, it will write. This was just a belt and suspenders.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment suggests that in this case -- unlike the batch write in importmulti -- there's a specific reason to force it to write before the end of the scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted
|
I'm unsure why tests are currently failing. |
72ec512 to
f2b9b32
Compare
| return false; | ||
| } | ||
| m_database.IncrementUpdateCounter(); | ||
| if (m_database.nUpdateCounter % 1000 == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get that this makes the code simple, but IMO this defeats the batch purpose. Like, it shouldn't be something to worry about here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what you mean by 'defeats the batch purpose'. Are there cases where the caller is depending on the batch not being flushed?
I think it is far better engineering practice to deal with periodic flushing once, inside the batch-management code, than to deal with it independently at each callsite. If there's some situation where periodic flushing is undesirable, I think it would be better to offer both versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gwillen I mean you should be able to create a batch with 1001 changes no? I was suggesting something on top of WalletBatch to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, you can add as many changes as you want, it just flushes after every 1000. This should be completely transparent to the user. Unless there is some reason that flushing can hurt, like the caller is using the batch as though it were an atomic transaction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless there is some reason that flushing can hurt, like the caller is using the batch as though it were an atomic transaction?
There is only one place where an atomic transaction is used, and if an atomic transaction is active, then Flush() will do nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I may be wrong but I see the batch as an atomic transaction. Making 1500 updates and then aborting the batch should not commit 1000 updates IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I may be wrong but I see the batch as an atomic transaction. Making 1500 updates and then aborting the batch should not commit 1000 updates IMO.
Except that's not how the batch works normally. Batches are not atomic unless you explicitly make them so by starting a transaction with the underlying BerkeleyBatch. In that case, then the flushing won't do anything. In the normal case, each update in a batch itself is atomic and should be expected to be written to the wallet file. There is no aborting. As soon as a batch goes out of scope, every single update is committed to the file. If you make 1500 updates, the first 1000 will be written after you do the 1000th write operation, and then the last 500 will be written when the batch object is deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is the description of WalletBatch out of date? It says This represents a single transaction at the database. It will be committed when the object goes out of scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that is out of date
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the comment.
src/wallet/db.cpp
Outdated
| nMinutes = 1; | ||
|
|
||
| env->dbenv->txn_checkpoint(nMinutes ? gArgs.GetArg("-dblogsize", DEFAULT_WALLET_DBLOGSIZE) * 1024 : 0, nMinutes, 0); | ||
| if (env) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that tests pass. When there's a dummy database, env will be nulltpr and this causes a segfault in tests which use a dummy database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we make env is defined instead? Otherwise a comment would make sense as this doesn't really happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we make
envis defined instead?
No. env is nullptr for dummy databases which are used in tests. A check like this is used in many other places in this file in the form of BerkeleyDatabase::IsDummy().
Otherwise a comment would make sense as this doesn't really happen.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could deal with it earlier (i.e. early return), and use IsDummy to make it more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IsDummy is done by BerkeleyDatabase, not BerkeleyBatch so is not available to do here (although the check is completely identical).
f2b9b32 to
2811f37
Compare
2811f37 to
ffafc25
Compare
ffafc25 to
a14a5ba
Compare
|
I noticed that importing scripts would also be slow so I've added Also updated OP. |
|
tACK 87f7befca04a095a06af003f9fe20d2e3c155d2f sooo fast. tested a rescan with my all my trezor keys from HWI and it worked great. thanks @achow101, hw wallet + core is actually usable now (at least for watching :)! |
|
Restarted travis, it was failing with 😕 |
|
utACK 87f7bef. nit, ampersand should be before space. |
src/wallet/wallet.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could be private
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/wallet/wallet.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could be private
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
utACK 87f7bef |
This maintains encapsulation of CWallet::database in the face of batching, e.g. allows making the `WithDB` methods private.
87f7bef to
0db94e5
Compare
|
utACK 0db94e5 |
|
Tested ACK 0db94e5 Drop import time for 10000 keys ranged descriptor from 495.57s to 3.626s with 2.8 GHz Intel Core i7. |
|
re-utACK 0db94e5 only change is re the privacy of |
0db94e5 wallet: Pass WalletBatch to CWallet::UnsetWalletFlag (João Barbosa) 6cb888b Apply the batch treatment to CWallet::SetAddressBook via ImportScriptPubKeys (Ben Woosley) 6154a09 Move some of ProcessImport into CWallet::Import* (Ben Woosley) ccb26cf Batch writes for importmulti (Andrew Chow) d6576e3 Have WalletBatch automatically flush every 1000 updates (Andrew Chow) 366fe0b Add AddWatchOnlyWithDB, AddKeyOriginWithDB, AddCScriptWithDB functions (Andrew Chow) Pull request description: Instead of writing each item to the wallet database individually, do them in batches so that the import runs faster. This was tested by importing a ranged descriptor for 10,000 keys. Current master ``` $ time src/bitcoin-cli -regtest -rpcwallet=importbig importmulti '[{"desc": "sh(wpkh([73111820/44h/1h/0h]tpubDDoT2SgEjaU5rerQpfcRDWPAcwyZ5g7xxHgVAfPwidgPDKVjm89d6jJ8AQotp35Np3m6VaysfUY1C2g68wFqUmraGbzhSsMF9YBuTGxpBaW/1/*))#3w7php47", "range": [0, 10000], "timestamp": "now", "internal": true, "keypool": false, "watchonly": true}]' ... real 7m45.29s ``` This PR: ``` $ time src/bitcoin-cli -regtest -rpcwallet=importbig4 importmulti '[{"desc": "pkh([73111820/44h/1h/0h]tpubDDoT2SgEjaU5rerQpfcRDWPAcwyZ5g7xxHgVAfPwidgPDKVjm89d6jJ8AQotp35Np3m6VaysfUY1C2g68wFqUmraGbzhSsMF9YBuTGxpBaW/1/*)#v65yjgmc", "range": [0, 10000], "timestamp": "now", "internal": true, "keypool": false, "watchonly": true}]' ... real 3.93s ``` Fixes #15739 ACKs for commit 0db94e: jb55: utACK 0db94e5 ariard: Tested ACK 0db94e5 Empact: re-utACK 0db94e5 only change is re the privacy of `UnsetWalletFlagWithDB` and `AddCScriptWithDB`. Tree-SHA512: 3481308a64c99b6129f7bd328113dc291fe58743464628931feaebdef0e6ec770ddd5c19e4f9fbc1249a200acb04aaf62a8d914d53b0a29ac1e557576659c0cc
0db94e5 wallet: Pass WalletBatch to CWallet::UnsetWalletFlag (João Barbosa) 6cb888b Apply the batch treatment to CWallet::SetAddressBook via ImportScriptPubKeys (Ben Woosley) 6154a09 Move some of ProcessImport into CWallet::Import* (Ben Woosley) ccb26cf Batch writes for importmulti (Andrew Chow) d6576e3 Have WalletBatch automatically flush every 1000 updates (Andrew Chow) 366fe0b Add AddWatchOnlyWithDB, AddKeyOriginWithDB, AddCScriptWithDB functions (Andrew Chow) Pull request description: Instead of writing each item to the wallet database individually, do them in batches so that the import runs faster. This was tested by importing a ranged descriptor for 10,000 keys. Current master ``` $ time src/bitcoin-cli -regtest -rpcwallet=importbig importmulti '[{"desc": "sh(wpkh([73111820/44h/1h/0h]tpubDDoT2SgEjaU5rerQpfcRDWPAcwyZ5g7xxHgVAfPwidgPDKVjm89d6jJ8AQotp35Np3m6VaysfUY1C2g68wFqUmraGbzhSsMF9YBuTGxpBaW/1/*))#3w7php47", "range": [0, 10000], "timestamp": "now", "internal": true, "keypool": false, "watchonly": true}]' ... real 7m45.29s ``` This PR: ``` $ time src/bitcoin-cli -regtest -rpcwallet=importbig4 importmulti '[{"desc": "pkh([73111820/44h/1h/0h]tpubDDoT2SgEjaU5rerQpfcRDWPAcwyZ5g7xxHgVAfPwidgPDKVjm89d6jJ8AQotp35Np3m6VaysfUY1C2g68wFqUmraGbzhSsMF9YBuTGxpBaW/1/*)#v65yjgmc", "range": [0, 10000], "timestamp": "now", "internal": true, "keypool": false, "watchonly": true}]' ... real 3.93s ``` Fixes bitcoin#15739 ACKs for commit 0db94e: jb55: utACK 0db94e5 ariard: Tested ACK 0db94e5 Empact: re-utACK bitcoin@0db94e5 only change is re the privacy of `UnsetWalletFlagWithDB` and `AddCScriptWithDB`. Tree-SHA512: 3481308a64c99b6129f7bd328113dc291fe58743464628931feaebdef0e6ec770ddd5c19e4f9fbc1249a200acb04aaf62a8d914d53b0a29ac1e557576659c0cc
40ad2f6 Have importwallet use ImportPrivKeys and ImportScripts (Andrew Chow) 78941da Optionally allow ImportScripts to set script creation timestamp (Andrew Chow) 94bf156 Have importaddress use ImportScripts and ImportScriptPubKeys (Andrew Chow) a00d1e5 Have importpubkey use CWallet's ImportScriptPubKeys and ImportPubKeys functions (Andrew Chow) c6a8274 Have importprivkey use CWallet's ImportPrivKeys, ImportScripts, and ImportScriptPubKeys (Andrew Chow) fae7a5b Log when an import is being skipped because we already have it (Andrew Chow) ab28e31 Change ImportScriptPubKeys' internal to apply_label (Andrew Chow) Pull request description: #15741 introduced `ImportPrivKeys`, `ImportPubKeys`, `ImportScripts`, and `ImportScriptPubKeys` in `CWallet` which are used by `importmulti`. This PR changes the remaining `import*` RPCs (`importaddress`, `importprivkey`, `importpubkey`, and `importwallet`) to use these functions as well instead of directly adding the imported items to the wallet. ACKs for top commit: MarcoFalke: ACK 40ad2f6 (checked that behavior changes are mentioned in the commit body) ryanofsky: utACK 40ad2f6. Only change since last review is a tweaked commit message (mentioning label update in importpubkey commit) Sjors: ACK 40ad2f6. Those extra tests also pass. Tree-SHA512: 910e3bbe20b6f8809a47b7293775db234125615d886c7fd99c194f4cdf00c765eb1e24b1799260f1213b98c88f9bbe696796f36087c182925e567d44e9194c98
40ad2f6 Have importwallet use ImportPrivKeys and ImportScripts (Andrew Chow) 78941da Optionally allow ImportScripts to set script creation timestamp (Andrew Chow) 94bf156 Have importaddress use ImportScripts and ImportScriptPubKeys (Andrew Chow) a00d1e5 Have importpubkey use CWallet's ImportScriptPubKeys and ImportPubKeys functions (Andrew Chow) c6a8274 Have importprivkey use CWallet's ImportPrivKeys, ImportScripts, and ImportScriptPubKeys (Andrew Chow) fae7a5b Log when an import is being skipped because we already have it (Andrew Chow) ab28e31 Change ImportScriptPubKeys' internal to apply_label (Andrew Chow) Pull request description: bitcoin#15741 introduced `ImportPrivKeys`, `ImportPubKeys`, `ImportScripts`, and `ImportScriptPubKeys` in `CWallet` which are used by `importmulti`. This PR changes the remaining `import*` RPCs (`importaddress`, `importprivkey`, `importpubkey`, and `importwallet`) to use these functions as well instead of directly adding the imported items to the wallet. ACKs for top commit: MarcoFalke: ACK 40ad2f6 (checked that behavior changes are mentioned in the commit body) ryanofsky: utACK 40ad2f6. Only change since last review is a tweaked commit message (mentioning label update in importpubkey commit) Sjors: ACK 40ad2f6. Those extra tests also pass. Tree-SHA512: 910e3bbe20b6f8809a47b7293775db234125615d886c7fd99c194f4cdf00c765eb1e24b1799260f1213b98c88f9bbe696796f36087c182925e567d44e9194c98
Summary: * Add AddWatchOnlyWithDB, AddKeyOriginWithDB, AddCScriptWithDB functions AddWatchOnlyWithDB, AddKeyOriginWithDB, and AddCScriptWithDB add their respective data to the wallet using the provided WalletBatch instead of creating a new WalletBatch object every time. This allows for batching writes to the database. * Have WalletBatch automatically flush every 1000 updates Since it now automatically flushes, we don't need to have UpgradeKeyMetadata count and flush separately * Batch writes for importmulti When writing all of the imported data to the wallet, use a common WalletBatch object so that batch writes are done and the writes finish more quickly. AddKeypoolPubkey is no longer needed so it is also removed * Move some of ProcessImport into CWallet::Import* This maintains encapsulation of CWallet::database in the face of batching, e.g. allows making the `WithDB` methods private. * Apply the batch treatment to CWallet::SetAddressBook via ImportScriptPubKeys * wallet: Pass WalletBatch to CWallet::UnsetWalletFlag This is a backport of Core [[bitcoin/bitcoin#15741 | PR15741]] I did it all in one block even though it's quite large because intermediate commits do not work individualy. Depends on D6385 Test Plan: ninja all check-all Reviewers: #bitcoin_abc, majcosta Reviewed By: #bitcoin_abc, majcosta Subscribers: majcosta Differential Revision: https://reviews.bitcoinabc.org/D6387
Instead of writing each item to the wallet database individually, do them in batches so that the import runs faster.
This was tested by importing a ranged descriptor for 10,000 keys.
Current master
This PR:
Fixes #15739