-
Notifications
You must be signed in to change notification settings - Fork 38.8k
rpc: Prefer to use txindex if available for GetTransaction #22383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
It looks like an empty line needs to be added to the commit message after the subject line for the linter to pass. |
|
I think this would change the behavior of The case I'm thinking about, which you noted, is when an incorrect Similarly, I'm not sure if checking the mempool first has any unintended consequences (other than deviating from current docs). That said, I'm not sure the |
2a3fe2a to
dabd776
Compare
|
I think a fundamental question is whether or not the block_index is intended to be used as a hint or as a validation parameter. Given that the current logic of this function has various fallbacks, it seems to me that block_index is a hint and that the goal of the function is to try to return the requested transaction regardless of where it is stored. So it seems to me that reorganizing the logic for performance gains is a suitable goal. |
|
FindTx returns the block hash, so it's very easy to check if it matches. If you do so, the question of whether the provided block is a hint or a search restriction doesn't apply. |
dabd776 to
5ab22ef
Compare
promag
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a behavior change, and no test fails which is unfortunate. If we want to do this then a release note would be nice.
5ab22ef to
6dd28e7
Compare
|
I've added additional checks that should ensure there is no behavior change, only performance improvements. Now it should return nullptr faster in certain conditions. |
6dd28e7 to
5dd11ea
Compare
|
review ACK 5dd11ea1406f4926ab9061702b69b409459723b4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK 5dd11ea1406f4926ab9061702b69b409459723b4
ISTM we could use additional test coverage. Edit: Am adding some, will propose separately.
5dd11ea to
78f4c8b
Compare
|
ACK 78f4c8b |
|
Concept ACK |
|
NACK -- In general, I think performance optimizations should come with data that shows they improve things. Especially in the case the code isn't being cleaned up or simplified (I think this PR makes the code slightly more complicated and harder to follow so I'd want to be sure the performance improvements warrant it). I also have other concerns which I lay out below. I can quickly come around on my review with some explanation/correction to my understanding if I'm off base anywhere.
However, from #22382 which this PR fixes, the issue is reported as:
Was the issue misread? Because the issue expects scanning the entire txindex causes a slowdown, while this PR claims excessive disk reads (in the case that txindex is enabled) causes a slowdown. So to me the two seem to contradict each other. To me it seems both
If I'm understanding this correctly, the main benefit of txindex being enabled is that we don't need Now the measurements provided in the issue seem to show some performance discrepancy (which maybe @jlopp just looked at and knew what was going on right away). But I also don't know how much variation is normal in the timing of rpc calls, and whether one measurement (under unknown conditions) is good enough to warrant a performance optimization. On top of that it's only one data point - what if this tx reported in the issue happens to be at the beginning/end of txindex or who knows what? |
@mjdietzx txindex stores the transaction position in the block, so it just seeks there to parse the transaction. Without txindex it needs to load the block - parse transactions, build hashes,... - and then search in those transactions. |
Makes sense, and thanks for the explanation @promag. So is it safe to say scanning through the entire txindex and then "seeking to the tx's position in the block and reading it from disk" is much faster than " load the block - parse transactions, build hashes,... - and then search in those transactions"? Do we have any intuitions here, like big-O and/or worst-case runtimes? If so, is it worth adding some good comments to the source code (not sure whether they are added in But here's where I'm coming from - how do I know the test measurements provided in #22382 are on mainnet (what if they are on some testnet where the txindex is extremely small? I can't find the transaction or block hashes from that issue in any block explorers). How do I even know -txindex is enabled for these measurements (and it's not just scanning through his wallet's transactions, and that's where the performance discrepancy comes from)? Without some source code comments and data/measurments showing the performance differences here, it makes me wonder if the reason people pass |
|
I don't believe that "scanning through the entire txindex" is an accurate way to describe how txindex->FindTx functions. I expect it to be more of a O(1) constant time operation while ReadBlockFromDisk() is more of a O(n) operation that is directly correlated to block size. |
|
Right, it doesn't scan the entire txindex as keys are indexed. I think leveldb lookup is O(log n). #22382 shows the performance difference from each approach. |
|
FWIW, this is the compared performance with @jlopp's modifications |
|
I cherry-picked this change onto #22437 that adds a lot of |
mjdietzx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
Based on the issue being clarified and some measurements being provided after this PR, and tests in #22437 I'm signing-off on this. I do think this PR would have been a good opportunity to benchmark performance with a few different test cases (instead of times on just one transaction that @orweinberger provided), but that's just my opinion. Don't want to hold things up bc I'm convinced this is a good optimization
| "is in a block in the blockchain.\n" | ||
|
|
||
| "\nBy default, this call only returns a transaction if it is in the mempool. If -txindex is enabled\n" | ||
| "and no blockhash argument is passed, it will return the transaction if it is in the mempool or any block.\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: there is the case where -txindex is enabled and a blockhash argument is passed, where it will return the transaction only if the block it's found in matches the blockhash argument passed
theStack
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 78f4c8b
Using -txindex if available makes perfect sense, regardless of whether a blockhash has been passed or not. To my understanding, the PR in its current form doesn't change any behaviour (in particular, if the passed blockhash doesn't match the one from the index, still the slower path of reading the block from disk is taken), i.e. the initial objections by reviewers (#22383 (comment), #22383 (review)) have been sorted out. Follow-up PR #22437 should further increase the confidence that this is the case, will also put that on my review list.
Also, the structure of GetTransaction looks more logical now, as the code blocks for getting a transaction appear in the order of "fast" (mempool) to "slow" (read block from disk).
LarryRuane
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tACK 78f4c8b
suggestions are optional
| * | ||
| * @param[in] block_index The block to read from disk, or nullptr | ||
| * @param[in] mempool If block_index is not provided, look in the mempool, if provided | ||
| * @param[in] mempool If provided, check mempool for tx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * @param[in] mempool If provided, check mempool for tx | |
| * @param[in] mempool If provided and block_index is not, check mempool for tx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest just replacing this with Optional pointer to the node's mempool to avoid duplicating what's already documented above.
| if (!block_index || block_index->GetBlockHash() == block_hash) { | ||
| hashBlock = block_hash; | ||
| return tx; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| } | |
| } | |
| return nullptr; |
Nit: This improves the performance of the (unlikely) case where a block hash is specified, but the transaction is found in the txindex but in a different block. The if at line 1170 would be false, so we would call ReadBlockFromDisk() below, but would not find the transaction there either (assuming the txindex is accurate).
I cherry-picked this PR with this suggestion onto #22437 as @jonatack described above, and the tests all pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this different from the initial solution: #22383 (comment)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LarryRuane the idea is to allow fetching the transaction from a block out of the active chain - which -txindex doesn't take into account. Before this was possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could be checked by seeing if the specified block index is in the main chain.
But I suppose even that has edge cases (eg, index out of sync), so maybe it's better to just leave it alone.
luke-jr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jnewbery
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 78f4c8b
Couple of minor documentation suggestions inline.
| "If -txindex is not enabled and a blockhash argument is passed, it will return the transaction if\n" | ||
| "the specified block is available and the transaction is found in that block.\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the following change addresses @mjdietzx's comment: https://github.com/bitcoin/bitcoin/pull/22383/files#r664172064
| "If -txindex is not enabled and a blockhash argument is passed, it will return the transaction if\n" | |
| "the specified block is available and the transaction is found in that block.\n" | |
| "If a blockhash argument is passed, it will return the transaction if\n" | |
| "the specified block is available and the transaction is in that block.\n" |
| if (!block_index || block_index->GetBlockHash() == block_hash) { | ||
| hashBlock = block_hash; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could do with a comment:
| if (!block_index || block_index->GetBlockHash() == block_hash) { | |
| hashBlock = block_hash; | |
| if (!block_index || block_index->GetBlockHash() == block_hash) { | |
| // Don't return the transaction if the provided block hash doesn't match. | |
| // The case where a transaction appears in multiple blocks (e.g. reorgs or | |
| // BIP30) is handled by the block lookup below. | |
| hashBlock = block_hash; |
rajarshimaitra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 78f4c8b
Doc updates as mentioned by @jnewbery #22383 (comment) and #22383 (comment) seems appropriate.
Would test #22437 and report back..
lsilva01
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review ACK and Tested ACK 78f4c8b on Ubuntu 20.04
|
|
||
| CTransactionRef GetTransaction(const CBlockIndex* const block_index, const CTxMemPool* const mempool, const uint256& hash, const Consensus::Params& consensusParams, uint256& hashBlock) | ||
| { | ||
| LOCK(cs_main); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized after merge that cs_main is not needed to read from the mempool (only to write to it)
…ransaction 78f4c8b prefer to use txindex if available for GetTransaction (Jameson Lopp) Pull request description: Fixes bitcoin#22382 Motivation: prevent excessive disk reads if txindex is enabled. Worth noting that this could be argued to be less of a bug and more of an issue of undefined behavior. If a user calls GetTransaction with the wrong block hash, what should happen? ACKs for top commit: jonatack: ACK 78f4c8b theStack: Code review ACK 78f4c8b LarryRuane: tACK 78f4c8b luke-jr: utACK 78f4c8b jnewbery: utACK 78f4c8b rajarshimaitra: Code review ACK bitcoin@78f4c8b lsilva01: Code Review ACK and Tested ACK bitcoin@78f4c8b on Ubuntu 20.04 Tree-SHA512: af7db5b98cb2ae4897b28476b2fa243bf7e6f850750d9347062fe8013c5720986d1a3c808f80098e5289bd84b085de03c81a44e584dc28982f721c223651bfe0
f685a13 doc: GetTransaction()/getrawtransaction follow-ups to #22383 (John Newbery) abc57e1 refactor: move `GetTransaction(...)` to node/transaction.cpp (Sebastian Falbesoner) Pull request description: ~This PR is based on #22383, which should be reviewed first~ (merged by now). In [yesterday's PR review club session to PR 22383](https://bitcoincore.reviews/22383), the idea of moving the function `GetTransaction(...)` from src/validation.cpp to src/node/transaction.cpp came up. With this, the circular dependency "index/txindex -> validation -> index/txindex" is removed (see change in `lint-circular-dependencies.sh`). Thanks to jnewbery for suggesting and to sipa for providing historical background. Relevant IRC log: ``` 17:52 <jnewbery> Was anyone surprised that GetTransaction() is in validation.cpp? It seems to me that node/transaction.cpp would be a more appropriate place for it. 17:53 <raj_> jnewbery, +1 17:53 <stickies-v> agreed! 17:54 <glozow> jnewbery ya 17:54 <jnewbery> seems weird that validation would call into txindex. I wonder if we remove this function, then validation would no longer need to #include txindex 17:54 <sipa> GetTransaction predates node/transaction.cpp, and even the generic index framework itself :) 17:55 <sipa> (before 0.8, validation itself used the txindex) 17:55 <jnewbery> (and GetTransaction() seems like a natural sibling to BroadcastTransaction(), which is already in node/transaction.cpp) 17:55 <jnewbery> sipa: right, this is not meant as a criticism of course. Just wondering if we can organize things a bit more rationally now that we have better separation between things. 17:55 <sipa> jnewbery: sure, just providing background 17:56 <sipa> seems very reasonable to move it elsewhere now ``` The commit should be trivial to review with `--color-moved`. ACKs for top commit: jnewbery: Code review ACK f685a13 rajarshimaitra: tACK f685a13 mjdietzx: crACK f685a13 LarryRuane: Code review, test ACK f685a13 Tree-SHA512: 0e844a6ecb1be04c638b55bc4478c2949549a4fcae01c984eee078de74d176fb19d508fc09360a62ad130677bfa7daf703b67870800e55942838d7313246248c
|
I was curious about response times so I ran some tests: I picked up 5 random transactions and fetched them with and without the block hash against both latest release (v0.21.1) and 78f4c8b v0.21.1 results 78f4c8b results System information |
… lock acquire 4a1b2a7 [GetTransaction] remove unneeded `cs_main` lock acquire (Sebastian Falbesoner) Pull request description: This PR is a follow-up to #22383. For reading from the mempool, only `mempool.cs` needs to be locked (see [suggestion by MarcoFalke](bitcoin/bitcoin#22383 (comment))): https://github.com/bitcoin/bitcoin/blob/b620b2d58a55a88ad21da70cb2000863ef17b651/src/txmempool.h#L554-L558 `CTxMemPool::get()` acquires this lock: https://github.com/bitcoin/bitcoin/blob/b620b2d58a55a88ad21da70cb2000863ef17b651/src/txmempool.cpp#L822-L829 so we don't need to acquire any lock ourselves in `GetTransaction()`, as the other functions called in the remaining parts also don't need to have `cs_main` locked. ACKs for top commit: tryphe: Concept ACK. tested 4a1b2a7 but not extensively. jnewbery: Code review ACK 4a1b2a7 Tree-SHA512: 60e869f72e65cf72cb144be1900ea7f3d87c12f322756994f6a3ed8cd975230b36c7c90c34b60bbf41f9186f4add36decaac1d4f0d0749fb5451b3938a8aa78c
…quire 4a1b2a7 [GetTransaction] remove unneeded `cs_main` lock acquire (Sebastian Falbesoner) Pull request description: This PR is a follow-up to bitcoin#22383. For reading from the mempool, only `mempool.cs` needs to be locked (see [suggestion by MarcoFalke](bitcoin#22383 (comment))): https://github.com/bitcoin/bitcoin/blob/b620b2d58a55a88ad21da70cb2000863ef17b651/src/txmempool.h#L554-L558 `CTxMemPool::get()` acquires this lock: https://github.com/bitcoin/bitcoin/blob/b620b2d58a55a88ad21da70cb2000863ef17b651/src/txmempool.cpp#L822-L829 so we don't need to acquire any lock ourselves in `GetTransaction()`, as the other functions called in the remaining parts also don't need to have `cs_main` locked. ACKs for top commit: tryphe: Concept ACK. tested 4a1b2a7 but not extensively. jnewbery: Code review ACK 4a1b2a7 Tree-SHA512: 60e869f72e65cf72cb144be1900ea7f3d87c12f322756994f6a3ed8cd975230b36c7c90c34b60bbf41f9186f4add36decaac1d4f0d0749fb5451b3938a8aa78c
…erage, improve rpc_rawtransaction 387355b test, refactor: rpc_rawtransaction PEP8 (Jon Atack) 7d5cec2 refactor: separate the rpc_rawtransaction tests into functions (Jon Atack) 409779d move-only: regroup similar rpc_rawtransaction tests together (Jon Atack) d861040 test: remove no longer needed (ASCII art) comments (Jon Atack) 14398b3 test: add and harmonize getrawtransaction logging (Jon Atack) 85d8869 test: run 2nd getrawtransaction section with/without -txindex (Jon Atack) 0097740 refactor: txid to constant in rpc_rawtransaction to isolate tests (Jon Atack) 8c19d13 refactor: dedup/reorg createrawtransaction sequence number tests (Jon Atack) 7f07359 Test src/node/transaction::GetTransaction() without -txindex (Jon Atack) Pull request description: Following up on bitcoin/bitcoin#22383 (review), this pull adds missing `src/node/transaction::GetTransaction()` test coverage for combinations of `-txindex` and `blockhash` and does some refactoring of the test file. ACKs for top commit: mjdietzx: reACK 387355b josibake: reACK bitcoin/bitcoin@387355b MarcoFalke: Approach ACK 387355b 🔆 Tree-SHA512: b47c4ff87d69c61434e5729c954b338bc13744eddaba0879ca9f5f42243ba2cb4640d94c5f74de9f2735a8bf5e66b3d1c3bd3b7c26cd7324da7d3270ce87c6fd
Summary: ``` Motivation: prevent excessive disk reads if txindex is enabled. ``` This will also benefit to `isfinaltransaction`. Backport of [[bitcoin/bitcoin#22383 | core#22383]]. Test Plan: ninja all check-all Reviewers: #bitcoin_abc, PiRK Reviewed By: #bitcoin_abc, PiRK Differential Revision: https://reviews.bitcoinabc.org/D12208
Fixes #22382
Motivation: prevent excessive disk reads if txindex is enabled.
Worth noting that this could be argued to be less of a bug and more of an issue of undefined behavior. If a user calls GetTransaction with the wrong block hash, what should happen?