Skip to content

Conversation

@theStack
Copy link
Contributor

@theStack theStack commented Dec 3, 2021

The RPC calls getreceivedbyaddress/getreceivedbylabel both use the internal helper function GetReceived which was introduced in PR #17579 to deduplicate tallying code. For every wallet-related transaction output, the following unnecessary operations are currently performed in the tally loop, leading to a quite bad performance (as reported in #23645):

  • converting from CScript -> TxDestination (ExtractDestination(...)), converting from TxDestination -> CScript (CWallet::IsMine(const CTxDestination& dest)); this can be avoided by directly using output scripts in the search set instead of addresses (first commit)
  • checking if the iterated output script belongs to the wallet by calling IsMine; this can be avoided by only adding addresses to the search set which fulfil IsMine in the first place (second commit)

Benchmark results

The functional test wallet_pr23662.py (not part of this PR) creates transactions with 15000 different addresses:

  • 5000 outputs (500 txs with 10 outputs each) with label set, IsMine set (received)
  • 5000 outputs (500 txs with 10 outputs each) with label set, IsMine not set (sent)
  • 5000 outputs (500 txs with 10 outputs each) without label set, IsMine not set (sent)

Then, the time is measured for calling getreceivedbyaddress and getreceivedbylabel, the latter with two variants. Results on my machine:

branch getreceivedbyaddress (single) getreceivedbylabel (single) getreceivedbylabel (10000)
master 406.13ms 425.33ms 446.58ms
PR (first commit) 367.18ms 365.81ms 426.33ms
PR (second commit) 3.96ms 4.83ms 339.69ms

Fixes #23645.

@kristapsk
Copy link
Contributor

if users put a huge number of addresses under the same label. Not sure how that is used in production, I always assumed that there is a one-by-one label->address relation and having more than one address referred to by a label doesn't make too much sense.

JoinMarket users labels to mark addresses in a Bitcoin Core watchonly wallet as belonging to a specific JoinMarket wallet. All addresses of the same JoinMarket wallet will have the same label in Bitcoin Core watchonly wallet. It used accounts before, until they were deprecated.

@DrahtBot
Copy link
Contributor

DrahtBot commented Dec 4, 2021

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #25122 (rpc: getreceivedbylabel, return early if no addresses were found in the address book. by furszy)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@beegater
Copy link

beegater commented Dec 4, 2021

ACK!

Great, thanks a lot for addressing this so quickly!

@bitcoin bitcoin deleted a comment from jhay666 Dec 6, 2021
Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@theStack theStack force-pushed the 202112-rpc-improve_getreceivedby_performance branch from d1f056f to 7589305 Compare December 6, 2021 10:15
Copy link
Contributor

@shaavan shaavan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crACK 7589305280a7b0f43a4dd500b9fc72e070104bd2

Thanks for addressing it. These updates make the code more clear to understand and straightforward to run.

as it could actually hurt performance if users put a huge number of addresses under the same label.

Though I think this refactoring would be very advantageous for an average user, this statement still worries me a little. Maybe if someone might have some insights into how frequent a user is to put multiple addresses under the same label, they could share them.

@theStack theStack force-pushed the 202112-rpc-improve_getreceivedby_performance branch from 7589305 to f336ff7 Compare December 8, 2021 17:16
@theStack
Copy link
Contributor Author

theStack commented Dec 8, 2021

Rebased on master (necessary as the function modified in this PR has moved from ./src/wallet/rpcwallet.cpp to ./src/wallet/rpc/coins.cpp).

@shaavan: Thanks for your review! The performance worries on the second commit concerns transactions that are in wallet.mapWallet but don't satisfy the IsMine predicate (which should be all txs sent from the wallet to an address not owned by the wallet) ; if the search in the output script set (output_scripts.count(...)) takes longer than the IsMine check, then on master the loop iteration would be faster for those transactions. On the other hand a search in std::set takes O(log n), so it shouldn't be that bad.

Would be very interested in further opinions.

Copy link
Contributor

@amadeuszpawlik amadeuszpawlik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept & approach ACK

Copy link
Contributor

@stratospher stratospher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK.

  • a7b65af

    • Output scripts are used instead of addresses to perform the operations.
    • For the 'getreceivedbylabel' case, all the addresses with the given label are iterated through to insert all the output_scripts. For the 'getreceivedbyaddress' case, the required address's output script is inserted into output_scripts.
    • When tallying all the transactions, output scripts can be directly used to compute the total amount at an address/label in the wallet.
  • f336ff7

    • For the 'getreceivedbylabel' case, when iterating through all the addresses with the given label to insert the output_scripts, only output_scripts that belong to the wallet are added.
    • When tallying all the transactions, output scripts are used to compute the total amount at an address/label(and it's already known that they belong to the wallet).

@achow101
Copy link
Member

I don't think it's even necessary to use IsMine. We already filter for addresses that are ours within the address book itself by setting CAddressBookData.purpose to receive. So you can just check for that string rather than doing IsMine. It will certainly be more efficient.

@theStack
Copy link
Contributor Author

I don't think it's even necessary to use IsMine. We already filter for addresses that are ours within the address book itself by setting CAddressBookData.purpose to receive. So you can just check for that string rather than doing IsMine. It will certainly be more efficient.

Thanks for the hint! In the current version of the PR though, IsMine is not called in the hot spot (mapWallet/CTxOut iteration loop) anymore. It may still make sense to check for the CAddressBookData.purpose instead of using IsMine in the preparation loop, which creates the "needles" of the search. Will take a deeper look on the weekend.

@beegater
Copy link

beegater commented Apr 1, 2022

Can this please be merged? It would help a lot!

@kristapsk
Copy link
Contributor

I'm worried about "the second commit is potentially more controversial, as it could actually hurt performance if users put a huge number of addresses under the same label", as this is how JoinMarket uses Core wallets. Guess some benchmarking is needed here, how much is performance gains / losses in different scenarios.

@beegater
Copy link

beegater commented Apr 5, 2022

I'm worried about "the second commit is potentially more controversial, as it could actually hurt performance if users put a huge number of addresses under the same label", as this is how JoinMarket uses Core wallets. Guess some benchmarking is needed here, how much is performance gains / losses in different scenarios.

For the second commit this may be true. But the version that has now been on master for several years is FAR slower under any scenario than the solution in this initial commit. And just pushing the merge forward until some benchmarking is set up or not will just mean the slow solution will stay in all upcoming versions.
Since nobody else has risen awareness to this bug since all that time it is on master, I'd say that the 10x degraded performance of this rpc was not relevant to anyone anyway, so merging this PR which provides a speed up for most scenarios should be fine.

theStack added a commit to theStack/bitcoin that referenced this pull request Apr 14, 2022
@theStack
Copy link
Contributor Author

Finally did some benchmarking. The functional test wallet_pr23662.py (not part of this PR) creates transactions with 15000 different addresses:

  • 5000 outputs (500 txs with 10 outputs each) with label set, IsMine set (received)
  • 5000 outputs (500 txs with 10 outputs each) with label set, IsMine not set (sent)
  • 5000 outputs (500 txs with 10 outputs each) without label set, IsMine not set (sent)

Then, the time is measured for calling getreceivedbyaddress and getreceivedbylabel, the latter with two variants. Results on my machine:

branch getreceivedbyaddress (single) getreceivedbylabel (single) getreceivedbylabel (10000)
master 406.13ms 425.33ms 446.58ms
PR (first commit) 367.18ms 365.81ms 426.33ms
PR (second commit) 3.96ms 4.83ms 339.69ms

Seems like my worries ("the second commit is potentially more controversial, as it could actually hurt performance if users put a huge number of addresses under the same label") are not justified and most of the performance gain is indeed achieved by the second commit. Improvement ideas to the benchmark are very welcome, though I'd consider this pretty safe to merge at this point.

@achow101
Copy link
Member

achow101 commented Apr 19, 2022

I'm worried about "the second commit is potentially more controversial, as it could actually hurt performance if users put a huge number of addresses under the same label", as this is how JoinMarket uses Core wallets. Guess some benchmarking is needed here, how much is performance gains / losses in different scenarios.

Performance would only be hurt if a label had a huge number of addressees that did not belong to the wallet and for that to matter, there would need to be way more addresses than outputs stored in the wallet. Even for JoinMarket users (probably especially for JoinMarket users too), it will be faster because instead of doing IsMine on every single output of every single transaction in a wallet (which for coinjoins the vast majority will be irrelevant to the wallet), it is doing IsMine on things that are in the address book. For most users, the set of addresses in the address book will be smaller than the number of transaction outputs in a wallet.

@achow101
Copy link
Member

ACK f336ff7

Copy link
Contributor

@w0xlt w0xlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK f336ff7

This change makes the code clearer and better performing.

@kristapsk
Copy link
Contributor

Comments by @theStack and @achow101 calmed me down, but will try to benchmark myself just in case too.

Copy link
Member

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code ACK f336ff7

Plus, probably something like furszy@e8d8607 would be good to include here or on a follow-up PR.

Basically, inside GetReceived, we can return early if the label provided by getreceivedbylabel is not in the address book. Otherwise, we walk through all the wallet txs + outputs for no reason (output_scripts is empty).

@theStack
Copy link
Contributor Author

theStack commented May 12, 2022

@furszy: Thanks for reviewing!

Plus, probably something like furszy@e8d8607 would be good to include here or on a follow-up PR.

Basically, inside GetReceived, we can return early if the label provided by getreceivedbylabel is not in the address book. Otherwise, we walk through all the wallet txs + outputs for no reason (output_scripts is empty).

This seems to be an optimization for a very special case (passing a label that is in the address book, but doesn't apply to any address that is ours), hence I think this would fit better for a follow-up PR. Would you mind opening one? If you want, you can already do that now, branching off of this PR.

@furszy
Copy link
Member

furszy commented May 12, 2022

Hey, sure. Will do it.

This seems to be an optimization for a very special case (passing a label that is in the address book, but doesn't apply to any address that is ours)

Actually, can happen when the user inputs a label that is not in the address book as well (eg: wallet.GetLabelAddresses(label) returning an empty set).

@theStack
Copy link
Contributor Author

This seems to be an optimization for a very special case (passing a label that is in the address book, but doesn't apply to any address that is ours)

Actually, can happen when the user inputs a label that is not in the address book as well (eg: wallet.GetLabelAddresses(label) returning an empty set).

Good point. So it would be an improvement of the user-interface, which is maybe even more important; right now if the passed label doesn't exist (due to e.g. a typo from the user), the amount of 0 is returned instead of an error.

@furszy
Copy link
Member

furszy commented May 12, 2022

yeah, for that I added an error there "Address/es not found in wallet" (a general error for both getreceivedbylabel and getreceivedbyaddress if the address/label is not in the wallet).

But we could customize it a bit more too. Have an error string targeting getreceivedbyaddress -> "Address not found in wallet" and another one for getreceivedbylabel -> "label not found in wallet".

In any case, just opened #25122. Will rebase it on master once this one gets merged.

@achow101 achow101 merged commit 187504b into bitcoin:master May 16, 2022
achow101 added a commit to bitcoin-core/gui that referenced this pull request May 23, 2022
… no addresses were found in the address book

baa3ddc doc: add release notes about `getreceivedbylabel` returning an error if the label is not in the address book. (furszy)
8897a21 rpc: getreceivedbylabel, don't loop over the entire wallet txs map if no destinations were found for the input label. (furszy)

Pull request description:

  Built on top of #23662, coming from comment bitcoin/bitcoin#23662 (review).

  If `wallet.GetLabelAddresses()` returns an empty vector (the wallet does not have stored destinations with that label in the addressbook) or if none of the returned destinations are from the wallet, we can return the function right away.
  Otherwise, we are walking through all the wallet txs + outputs for no reason (`output_scripts` is empty).

ACKs for top commit:
  achow101:
    ACK baa3ddc
  theStack:
    re-ACK baa3ddc
  w0xlt:
    ACK bitcoin/bitcoin@baa3ddc

Tree-SHA512: 00e10365b179bf008da2f3ef8fbb3ee04a330426374020e3f2d0151b16991baba4ef2b944e4659452f3e4d6cb20f128d0918ddf0453933a25a4d9fd8414a1911
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request May 28, 2022
…ormance

f336ff7 rpc: avoid expensive `IsMine` calls in `GetReceived` tally (Sebastian Falbesoner)
a7b65af rpc: avoid scriptPubKey<->CTxDestination conversions in `GetReceived` tally (Sebastian Falbesoner)

Pull request description:

  The RPC calls `getreceivedbyaddress`/`getreceivedbylabel` both use the internal helper function `GetReceived` which was introduced in PR bitcoin#17579 to deduplicate tallying code. For every wallet-related transaction output, the following unnecessary operations are currently performed in the tally loop, leading to a quite bad performance (as reported in bitcoin#23645):
  - converting from CScript -> TxDestination (`ExtractDestination(...)`), converting from TxDestination -> CScript (`CWallet::IsMine(const CTxDestination& dest)`); this can be avoided by directly using output scripts in the search set instead of addresses (first commit)
  - checking if the iterated output script belongs to the wallet by calling `IsMine`; this can be avoided by only adding addresses to the search set which fulfil `IsMine` in the first place (second commit)

  ### Benchmark results
  The functional test [wallet_pr23662.py](https://github.com/theStack/bitcoin/blob/pr23662_benchmarks/test/functional/wallet_pr23662.py) (not part of this PR) creates transactions with 15000 different addresses:
  - 5000 outputs (500 txs with 10 outputs each) with label set, IsMine set (received)
  - 5000 outputs (500 txs with 10 outputs each) with label set, IsMine not set (sent)
  - 5000 outputs (500 txs with 10 outputs each) without label set, IsMine not set (sent)

  Then, the time is measured for calling `getreceivedbyaddress` and `getreceivedbylabel`, the latter with two variants. Results on my machine:

  | branch             | `getreceivedbyaddress` (single) | `getreceivedbylabel` (single) | `getreceivedbylabel` (10000) |
  |--------------------|---------------------------------|-------------------------------|------------------------------|
  | master             |             406.13ms            |          425.33ms             |          446.58ms            |
  | PR (first commit)  |             367.18ms            |          365.81ms             |          426.33ms            |
  | PR (second commit) |               3.96ms            |            4.83ms             |          339.69ms            |

  Fixes bitcoin#23645.

ACKs for top commit:
  achow101:
    ACK f336ff7
  w0xlt:
    ACK bitcoin@f336ff7
  furszy:
    Code ACK f336ff7

Tree-SHA512: 9cbf402b9e269713bd3feda9e31719d9dca8a0dfd526de12fd3d561711589195d0c50143432c65dae279c4eab90a4fc3f99e29fbc0452fefe05113e92d129b8f
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request May 28, 2022
…esses were found in the address book

baa3ddc doc: add release notes about `getreceivedbylabel` returning an error if the label is not in the address book. (furszy)
8897a21 rpc: getreceivedbylabel, don't loop over the entire wallet txs map if no destinations were found for the input label. (furszy)

Pull request description:

  Built on top of bitcoin#23662, coming from comment bitcoin#23662 (review).

  If `wallet.GetLabelAddresses()` returns an empty vector (the wallet does not have stored destinations with that label in the addressbook) or if none of the returned destinations are from the wallet, we can return the function right away.
  Otherwise, we are walking through all the wallet txs + outputs for no reason (`output_scripts` is empty).

ACKs for top commit:
  achow101:
    ACK baa3ddc
  theStack:
    re-ACK baa3ddc
  w0xlt:
    ACK bitcoin@baa3ddc

Tree-SHA512: 00e10365b179bf008da2f3ef8fbb3ee04a330426374020e3f2d0151b16991baba4ef2b944e4659452f3e4d6cb20f128d0918ddf0453933a25a4d9fd8414a1911
@bitcoin bitcoin locked and limited conversation to collaborators May 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

getreceivedby[address, label] rpc performance proposal