Skip to content

Conversation

@l0rinc
Copy link
Contributor

@l0rinc l0rinc commented Dec 13, 2024

UndoWriteToDisk and WriteBlockToDisk were delegating a subset of their functionality to single-use methods that didn't optimally capture a meaningful chunk of the algorithm, resulting in calculating things twice (serialized size, header size).
This change inlines the awkward methods (asserting that all previous behavior was retained), and in separate commits makes the usages less confusing.
Besides making the methods slightly more intuitive, the refactorings reduce duplicate calculations as well.

The speed difference is insignificant for now (~0.5% for the new SaveBlockToDiskBench), but are a cleanup for follow-ups such as #31539

@DrahtBot
Copy link
Contributor

DrahtBot commented Dec 13, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31490.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK ryanofsky, hodlinator, TheCharlatan, andrewtoth
Concept ACK BrandonOdiwuor, theuni

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #31551 (optimization: bulk reads(27%)/writes(290%) in [undo]block [de]serialization, 6% faster IBD by l0rinc)
  • #31539 (optimization: buffer reads(23%)/writes(290%) in [undo]block [de]serialization, 6% faster IBD by l0rinc)
  • #31533 (fuzz: Add fuzz target for block index tree and related validation events by mzumsande)
  • #31144 (optimization: batch XOR operations 12% faster IBD by l0rinc)
  • #29641 (scripted-diff: Use LogInfo over LogPrintf [WIP, NOMERGE, DRAFT] by maflcko)
  • #29307 (util: explicitly close all AutoFiles that have been written by vasild)
  • #26966 (index: initial sync speedup, parallelize process by furszy)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@sedited
Copy link
Contributor

sedited commented Dec 13, 2024

Concept ACK

Copy link
Contributor

@andrewtoth andrewtoth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

Should we not prefer the more modern and explicit uint32_t vs unsigned int?

@l0rinc
Copy link
Contributor Author

l0rinc commented Dec 15, 2024

prefer the more modern and explicit uint32_t

I was hoping someone will recommend that - done: https://github.com/bitcoin/bitcoin/compare/e913e773926ecb72e327acf60c68655b5611cb7a..d69766164d177707ec7be19c4c188bd79ba3e4a3

Edit: Added block size calculation deduplication as well to the PR

@l0rinc l0rinc changed the title refactor: Cache blockundo serialized size for consecutive calls refactor: Cache block[undo] serialized size for consecutive calls Dec 15, 2024
Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the changes.

What is the goal? Is this an optimization? If yes, how can it be observed?

Also, the changes seem to be based on a misunderstanding of the log and check macros.

@l0rinc l0rinc changed the title refactor: Cache block[undo] serialized size for consecutive calls optimization: Cache block[undo] serialized size for consecutive calls Dec 17, 2024
Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 9a7e1ced7c3bb17193c7401181365c4075d45ec2. This seems like a safer, more efficient approach, but I still think the API is very confusing.

No need to address here, but for a followup I think it would make more sense for WriteBlockToDisk instead of SaveBlockToDisk call FindNextBlockPos, and for UndoWriteToDisk instead of WriteUndoDataForBlock to be call FindUndoPos, so higher level SaveBlockToDisk / WriteUndoDataForBlock functions don't contain any logic dealing with header fields.

I also find names of these functions to be inconsistent, overlong and confusing. Would rename:

  • SaveBlockToDisk to SaveBlock
  • WriteBlockToDisk to WriteBlock
  • WriteUndoDataForBlock to SaveBlockUndo
  • UndoWriteToDisk to WriteBlockUndo

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

concept ack, if this improves a benchmark. In the future, it would be good to mention a speedup in the pull request description, so that reviewers can see the motivation and goal for the change.

l0rinc added a commit to l0rinc/bitcoin that referenced this pull request Dec 18, 2024
We've been surprised multiple times by `Assume` doing heavy computations in `release`:
* bitcoin#31178 (comment)
* bitcoin#31490 (comment)

Since `Assume` is a macro, it could have been written differently to avoid parameter evaluation (similarly to `LogDebug`, which doesn't evaluate the parameters).

Co-Authored-By: Anthony Towns <[email protected]>
Copy link
Contributor

@BrandonOdiwuor BrandonOdiwuor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

@theuni
Copy link
Member

theuni commented Dec 18, 2024

concept ack, if this improves a benchmark. In the future, it would be good to mention a speedup in the pull request description, so that reviewers can see the motivation and goal for the change.

Agree with this. If this actually shows up as significant in real workloads (IBD), concept ACK and we could potentially take this further by caching the size even earlier (as part of deserializing over the wire) to avoid the need for the calculation at all.

But if there's no noticeable speedup, I'm not a fan of muddying the api.

@l0rinc
Copy link
Contributor Author

l0rinc commented Dec 18, 2024

If this actually shows up as significant in real workloads

Thanks for the reviews, I'm running full IBD benchmarks currently, we'll see the results shortly (can't just do a quick reindex-chainstate since the changes are undo and block writing related).

I have two other changes in queue that will be based on this refactor (#31539 and #31144, which I've drafted until these are sorted).
The 3 changes together seem to result in >5% speedup for IBD (every kind, regardless of dbcache or prunedness) - but those benchmarks are also still running.

@l0rinc l0rinc changed the title optimization: Cache block[undo] serialized size for consecutive calls optimization: cache block[undo] serialized size for consecutive calls Dec 19, 2024
@l0rinc
Copy link
Contributor Author

l0rinc commented Dec 19, 2024

we could potentially take this further by caching the size even earlier

Absolutely, but that's a bigger change (would cache the serialized sizes in CBlock, guarding against any other mutation (which requires better encapsulation), storing GetSerializeSize for TX_NO_WITNESS() and TX_WITH_WITNESS() lazily, similarly to the existing checked* flags)... but that's a big change, affecting a lot of consensus code, I'm still working on that and will push in a separate PR.

Co-authored-by: Ryan Ofsky <[email protected]>
Co-authored-by: Hodlinator <[email protected]>

-BEGIN VERIFY SCRIPT-
grep -r -wE 'WriteBlock|ReadRawBlock|ReadBlock|WriteBlockUndo|ReadBlockUndo' $(git ls-files src/ ':!src/leveldb') && \
    echo "Error: One or more target names already exist!" && exit 1
sed -i \
    -e 's/\bSaveBlockToDisk/WriteBlock/g' \
    -e 's/\bReadRawBlockFromDisk/ReadRawBlock/g' \
    -e 's/\bReadBlockFromDisk/ReadBlock/g' \
    -e 's/\bWriteUndoDataForBlock/WriteBlockUndo/g' \
    -e 's/\bUndoReadFromDisk/ReadBlockUndo/g' \
    $(git ls-files src/ ':!src/leveldb')
-END VERIFY SCRIPT-
Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 223081e. Since last review, "Save" was renamed to "Write", uint32_t references were dropped, some log statements and comments were improved as suggested, and a lot of tweaks made to commits and commit messages which should make this easier to review.

return block;
}

static void SaveBlockBench(benchmark::Bench& bench)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "bench: add SaveBlockBench" (86b85bb)

Could rename SaveBlock to WriteBlock here too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, if I need to edit, I'll rename this as well

@l0rinc l0rinc requested a review from maflcko January 9, 2025 16:09
Copy link
Contributor

@hodlinator hodlinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 223081e

Thanks for reorganizing the first commits!
Confirmed that git -c log.showSignature=false log --oneline --follow src/bench/readwriteblock.cpp shows 7 commits.

Cool with the sanity-check in the scripted diff, not sure I've seen that before.

Commit message dfb2f9d

Might add some more context:
"Similarly +to UndoWriteToDisk in parent commit+, WriteBlockToDisk wasn't really extracting"

Commit message 42bc491

(What's the inspiration for all the semicolons? Doesn't appear to be one of the uses described here: https://grammarist.com/punctuation/how-to-use-semicolons-in-a-list/)

Benchmarked with new bench target

₿ build/src/bench/bench_bitcoin -filter=SaveBlockBench -min-time=10000

At second commit (86b85bb)

|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|        3,172,375.74 |              315.22 |    0.6% |   20,053,788.28 |    9,071,225.51 |  2.211 |   3,133,287.73 |    0.5% |     11.12 | `SaveBlockBench`

(Median result of 3 runs).

At final commit (223081e)

|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|        3,159,241.92 |              316.53 |    1.4% |   19,805,232.51 |    8,963,495.75 |  2.210 |   3,080,238.64 |    0.4% |     11.08 | `SaveBlockBench`

(Median result of 3 runs).

Conclusion

Unfortunately this confirms that serializing blocks is insanely fast, making this PR more of a refactor than an optimization.

if (!FlushUndoFile(_pos.nFile, true)) {
LogPrintLevel(BCLog::BLOCKSTORAGE, BCLog::Level::Warning, "Failed to flush undo file %05i\n", _pos.nFile);
if (!FlushUndoFile(pos.nFile, true)) {
LogPrintLevel(BCLog::BLOCKSTORAGE, BCLog::Level::Warning, "Failed to flush undo file %05i\n", pos.