-
Notifications
You must be signed in to change notification settings - Fork 38.7k
rpc: improve getreceivedby{address,label} performance
#23662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rpc: improve getreceivedby{address,label} performance
#23662
Conversation
JoinMarket users labels to mark addresses in a Bitcoin Core watchonly wallet as belonging to a specific JoinMarket wallet. All addresses of the same JoinMarket wallet will have the same label in Bitcoin Core watchonly wallet. It used accounts before, until they were deprecated. |
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
|
ACK! Great, thanks a lot for addressing this so quickly! |
maflcko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
d1f056f to
7589305
Compare
shaavan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crACK 7589305280a7b0f43a4dd500b9fc72e070104bd2
Thanks for addressing it. These updates make the code more clear to understand and straightforward to run.
as it could actually hurt performance if users put a huge number of addresses under the same label.
Though I think this refactoring would be very advantageous for an average user, this statement still worries me a little. Maybe if someone might have some insights into how frequent a user is to put multiple addresses under the same label, they could share them.
7589305 to
f336ff7
Compare
|
Rebased on master (necessary as the function modified in this PR has moved from @shaavan: Thanks for your review! The performance worries on the second commit concerns transactions that are in Would be very interested in further opinions. |
amadeuszpawlik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept & approach ACK
stratospher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK.
-
- Output scripts are used instead of addresses to perform the operations.
- For the 'getreceivedbylabel' case, all the addresses with the given label are iterated through to insert all the
output_scripts. For the 'getreceivedbyaddress' case, the required address's output script is inserted intooutput_scripts. - When tallying all the transactions,
output scriptscan be directly used to compute the total amount at an address/label in the wallet.
-
- For the 'getreceivedbylabel' case, when iterating through all the addresses with the given label to insert the
output_scripts, onlyoutput_scriptsthat belong to the wallet are added. - When tallying all the transactions,
output scriptsare used to compute the total amount at an address/label(and it's already known that they belong to the wallet).
- For the 'getreceivedbylabel' case, when iterating through all the addresses with the given label to insert the
|
I don't think it's even necessary to use |
Thanks for the hint! In the current version of the PR though, |
|
Can this please be merged? It would help a lot! |
|
I'm worried about "the second commit is potentially more controversial, as it could actually hurt performance if users put a huge number of addresses under the same label", as this is how JoinMarket uses Core wallets. Guess some benchmarking is needed here, how much is performance gains / losses in different scenarios. |
For the second commit this may be true. But the version that has now been on master for several years is FAR slower under any scenario than the solution in this initial commit. And just pushing the merge forward until some benchmarking is set up or not will just mean the slow solution will stay in all upcoming versions. |
|
Finally did some benchmarking. The functional test wallet_pr23662.py (not part of this PR) creates transactions with 15000 different addresses:
Then, the time is measured for calling
Seems like my worries ("the second commit is potentially more controversial, as it could actually hurt performance if users put a huge number of addresses under the same label") are not justified and most of the performance gain is indeed achieved by the second commit. Improvement ideas to the benchmark are very welcome, though I'd consider this pretty safe to merge at this point. |
Performance would only be hurt if a label had a huge number of addressees that did not belong to the wallet and for that to matter, there would need to be way more addresses than outputs stored in the wallet. Even for JoinMarket users (probably especially for JoinMarket users too), it will be faster because instead of doing |
|
ACK f336ff7 |
w0xlt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK f336ff7
This change makes the code clearer and better performing.
furszy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code ACK f336ff7
Plus, probably something like furszy@e8d8607 would be good to include here or on a follow-up PR.
Basically, inside GetReceived, we can return early if the label provided by getreceivedbylabel is not in the address book. Otherwise, we walk through all the wallet txs + outputs for no reason (output_scripts is empty).
|
@furszy: Thanks for reviewing!
This seems to be an optimization for a very special case (passing a label that is in the address book, but doesn't apply to any address that is ours), hence I think this would fit better for a follow-up PR. Would you mind opening one? If you want, you can already do that now, branching off of this PR. |
|
Hey, sure. Will do it.
Actually, can happen when the user inputs a label that is not in the address book as well (eg: |
Good point. So it would be an improvement of the user-interface, which is maybe even more important; right now if the passed label doesn't exist (due to e.g. a typo from the user), the amount of 0 is returned instead of an error. |
|
yeah, for that I added an error there "Address/es not found in wallet" (a general error for both But we could customize it a bit more too. Have an error string targeting In any case, just opened #25122. Will rebase it on master once this one gets merged. |
… no addresses were found in the address book baa3ddc doc: add release notes about `getreceivedbylabel` returning an error if the label is not in the address book. (furszy) 8897a21 rpc: getreceivedbylabel, don't loop over the entire wallet txs map if no destinations were found for the input label. (furszy) Pull request description: Built on top of #23662, coming from comment bitcoin/bitcoin#23662 (review). If `wallet.GetLabelAddresses()` returns an empty vector (the wallet does not have stored destinations with that label in the addressbook) or if none of the returned destinations are from the wallet, we can return the function right away. Otherwise, we are walking through all the wallet txs + outputs for no reason (`output_scripts` is empty). ACKs for top commit: achow101: ACK baa3ddc theStack: re-ACK baa3ddc w0xlt: ACK bitcoin/bitcoin@baa3ddc Tree-SHA512: 00e10365b179bf008da2f3ef8fbb3ee04a330426374020e3f2d0151b16991baba4ef2b944e4659452f3e4d6cb20f128d0918ddf0453933a25a4d9fd8414a1911
…ormance f336ff7 rpc: avoid expensive `IsMine` calls in `GetReceived` tally (Sebastian Falbesoner) a7b65af rpc: avoid scriptPubKey<->CTxDestination conversions in `GetReceived` tally (Sebastian Falbesoner) Pull request description: The RPC calls `getreceivedbyaddress`/`getreceivedbylabel` both use the internal helper function `GetReceived` which was introduced in PR bitcoin#17579 to deduplicate tallying code. For every wallet-related transaction output, the following unnecessary operations are currently performed in the tally loop, leading to a quite bad performance (as reported in bitcoin#23645): - converting from CScript -> TxDestination (`ExtractDestination(...)`), converting from TxDestination -> CScript (`CWallet::IsMine(const CTxDestination& dest)`); this can be avoided by directly using output scripts in the search set instead of addresses (first commit) - checking if the iterated output script belongs to the wallet by calling `IsMine`; this can be avoided by only adding addresses to the search set which fulfil `IsMine` in the first place (second commit) ### Benchmark results The functional test [wallet_pr23662.py](https://github.com/theStack/bitcoin/blob/pr23662_benchmarks/test/functional/wallet_pr23662.py) (not part of this PR) creates transactions with 15000 different addresses: - 5000 outputs (500 txs with 10 outputs each) with label set, IsMine set (received) - 5000 outputs (500 txs with 10 outputs each) with label set, IsMine not set (sent) - 5000 outputs (500 txs with 10 outputs each) without label set, IsMine not set (sent) Then, the time is measured for calling `getreceivedbyaddress` and `getreceivedbylabel`, the latter with two variants. Results on my machine: | branch | `getreceivedbyaddress` (single) | `getreceivedbylabel` (single) | `getreceivedbylabel` (10000) | |--------------------|---------------------------------|-------------------------------|------------------------------| | master | 406.13ms | 425.33ms | 446.58ms | | PR (first commit) | 367.18ms | 365.81ms | 426.33ms | | PR (second commit) | 3.96ms | 4.83ms | 339.69ms | Fixes bitcoin#23645. ACKs for top commit: achow101: ACK f336ff7 w0xlt: ACK bitcoin@f336ff7 furszy: Code ACK f336ff7 Tree-SHA512: 9cbf402b9e269713bd3feda9e31719d9dca8a0dfd526de12fd3d561711589195d0c50143432c65dae279c4eab90a4fc3f99e29fbc0452fefe05113e92d129b8f
…esses were found in the address book baa3ddc doc: add release notes about `getreceivedbylabel` returning an error if the label is not in the address book. (furszy) 8897a21 rpc: getreceivedbylabel, don't loop over the entire wallet txs map if no destinations were found for the input label. (furszy) Pull request description: Built on top of bitcoin#23662, coming from comment bitcoin#23662 (review). If `wallet.GetLabelAddresses()` returns an empty vector (the wallet does not have stored destinations with that label in the addressbook) or if none of the returned destinations are from the wallet, we can return the function right away. Otherwise, we are walking through all the wallet txs + outputs for no reason (`output_scripts` is empty). ACKs for top commit: achow101: ACK baa3ddc theStack: re-ACK baa3ddc w0xlt: ACK bitcoin@baa3ddc Tree-SHA512: 00e10365b179bf008da2f3ef8fbb3ee04a330426374020e3f2d0151b16991baba4ef2b944e4659452f3e4d6cb20f128d0918ddf0453933a25a4d9fd8414a1911
The RPC calls
getreceivedbyaddress/getreceivedbylabelboth use the internal helper functionGetReceivedwhich was introduced in PR #17579 to deduplicate tallying code. For every wallet-related transaction output, the following unnecessary operations are currently performed in the tally loop, leading to a quite bad performance (as reported in #23645):ExtractDestination(...)), converting from TxDestination -> CScript (CWallet::IsMine(const CTxDestination& dest)); this can be avoided by directly using output scripts in the search set instead of addresses (first commit)IsMine; this can be avoided by only adding addresses to the search set which fulfilIsMinein the first place (second commit)Benchmark results
The functional test wallet_pr23662.py (not part of this PR) creates transactions with 15000 different addresses:
Then, the time is measured for calling
getreceivedbyaddressandgetreceivedbylabel, the latter with two variants. Results on my machine:getreceivedbyaddress(single)getreceivedbylabel(single)getreceivedbylabel(10000)Fixes #23645.