-
Notifications
You must be signed in to change notification settings - Fork 38.6k
refactor: inline UndoWriteToDisk and WriteBlockToDisk to reduce serialization calls
#31490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31490. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
|
Concept ACK |
andrewtoth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
Should we not prefer the more modern and explicit uint32_t vs unsigned int?
I was hoping someone will recommend that - done: https://github.com/bitcoin/bitcoin/compare/e913e773926ecb72e327acf60c68655b5611cb7a..d69766164d177707ec7be19c4c188bd79ba3e4a3 Edit: Added block size calculation deduplication as well to the PR |
maflcko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about the changes.
What is the goal? Is this an optimization? If yes, how can it be observed?
Also, the changes seem to be based on a misunderstanding of the log and check macros.
ryanofsky
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 9a7e1ced7c3bb17193c7401181365c4075d45ec2. This seems like a safer, more efficient approach, but I still think the API is very confusing.
No need to address here, but for a followup I think it would make more sense for WriteBlockToDisk instead of SaveBlockToDisk call FindNextBlockPos, and for UndoWriteToDisk instead of WriteUndoDataForBlock to be call FindUndoPos, so higher level SaveBlockToDisk / WriteUndoDataForBlock functions don't contain any logic dealing with header fields.
I also find names of these functions to be inconsistent, overlong and confusing. Would rename:
- SaveBlockToDisk to SaveBlock
- WriteBlockToDisk to WriteBlock
- WriteUndoDataForBlock to SaveBlockUndo
- UndoWriteToDisk to WriteBlockUndo
maflcko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
concept ack, if this improves a benchmark. In the future, it would be good to mention a speedup in the pull request description, so that reviewers can see the motivation and goal for the change.
We've been surprised multiple times by `Assume` doing heavy computations in `release`: * bitcoin#31178 (comment) * bitcoin#31490 (comment) Since `Assume` is a macro, it could have been written differently to avoid parameter evaluation (similarly to `LogDebug`, which doesn't evaluate the parameters). Co-Authored-By: Anthony Towns <[email protected]>
BrandonOdiwuor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
Agree with this. If this actually shows up as significant in real workloads (IBD), concept ACK and we could potentially take this further by caching the size even earlier (as part of deserializing over the wire) to avoid the need for the calculation at all. But if there's no noticeable speedup, I'm not a fan of muddying the api. |
Thanks for the reviews, I'm running full IBD benchmarks currently, we'll see the results shortly (can't just do a quick reindex-chainstate since the changes are undo and block writing related). I have two other changes in queue that will be based on this refactor (#31539 and #31144, which I've drafted until these are sorted). |
Absolutely, but that's a bigger change (would cache the serialized sizes in |
Co-authored-by: Ryan Ofsky <[email protected]> Co-authored-by: Hodlinator <[email protected]> -BEGIN VERIFY SCRIPT- grep -r -wE 'WriteBlock|ReadRawBlock|ReadBlock|WriteBlockUndo|ReadBlockUndo' $(git ls-files src/ ':!src/leveldb') && \ echo "Error: One or more target names already exist!" && exit 1 sed -i \ -e 's/\bSaveBlockToDisk/WriteBlock/g' \ -e 's/\bReadRawBlockFromDisk/ReadRawBlock/g' \ -e 's/\bReadBlockFromDisk/ReadBlock/g' \ -e 's/\bWriteUndoDataForBlock/WriteBlockUndo/g' \ -e 's/\bUndoReadFromDisk/ReadBlockUndo/g' \ $(git ls-files src/ ':!src/leveldb') -END VERIFY SCRIPT-
ryanofsky
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 223081e. Since last review, "Save" was renamed to "Write", uint32_t references were dropped, some log statements and comments were improved as suggested, and a lot of tweaks made to commits and commit messages which should make this easier to review.
| return block; | ||
| } | ||
|
|
||
| static void SaveBlockBench(benchmark::Bench& bench) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "bench: add SaveBlockBench" (86b85bb)
Could rename SaveBlock to WriteBlock here too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, if I need to edit, I'll rename this as well
hodlinator
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 223081e
Thanks for reorganizing the first commits!
Confirmed that git -c log.showSignature=false log --oneline --follow src/bench/readwriteblock.cpp shows 7 commits.
Cool with the sanity-check in the scripted diff, not sure I've seen that before.
Commit message dfb2f9d
Might add some more context:
"Similarly +to UndoWriteToDisk in parent commit+, WriteBlockToDisk wasn't really extracting"
Commit message 42bc491
(What's the inspiration for all the semicolons? Doesn't appear to be one of the uses described here: https://grammarist.com/punctuation/how-to-use-semicolons-in-a-list/)
Benchmarked with new bench target
₿ build/src/bench/bench_bitcoin -filter=SaveBlockBench -min-time=10000
At second commit (86b85bb)
| ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 3,172,375.74 | 315.22 | 0.6% | 20,053,788.28 | 9,071,225.51 | 2.211 | 3,133,287.73 | 0.5% | 11.12 | `SaveBlockBench`
(Median result of 3 runs).
At final commit (223081e)
| ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 3,159,241.92 | 316.53 | 1.4% | 19,805,232.51 | 8,963,495.75 | 2.210 | 3,080,238.64 | 0.4% | 11.08 | `SaveBlockBench`
(Median result of 3 runs).
Conclusion
Unfortunately this confirms that serializing blocks is insanely fast, making this PR more of a refactor than an optimization.
| if (!FlushUndoFile(_pos.nFile, true)) { | ||
| LogPrintLevel(BCLog::BLOCKSTORAGE, BCLog::Level::Warning, "Failed to flush undo file %05i\n", _pos.nFile); | ||
| if (!FlushUndoFile(pos.nFile, true)) { | ||
|
LogPrintLevel(BCLog::BLOCKSTORAGE, BCLog::Level::Warning, "Failed to flush undo file %05i\n", pos. |
UndoWriteToDiskandWriteBlockToDiskwere delegating a subset of their functionality to single-use methods that didn't optimally capture a meaningful chunk of the algorithm, resulting in calculating things twice (serialized size, header size).This change inlines the awkward methods (asserting that all previous behavior was retained), and in separate commits makes the usages less confusing.
Besides making the methods slightly more intuitive, the refactorings reduce duplicate calculations as well.
The speed difference is insignificant for now (~0.5% for the new
SaveBlockToDiskBench), but are a cleanup for follow-ups such as #31539