-
Notifications
You must be signed in to change notification settings - Fork 38.7k
refactor: C++20: Use std::rotl #29085
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsNo conflicts as of last run. |
|
Not sure. This needs a benchmark to check against slowdown, see #29057 (comment) Also, C++20 is enabled for years, so I don't understand the other commit. I think you either forgot to compile locally, or your compiler doesn't emit any warnings that the changes are wrong? |
txmempool.cpp:88:61: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
88 | mapTx.modify(mapTx.iterator_to(descendant), [=, this](CTxMemPoolEntry& e) {
| ~~^~~~
txmempool.cpp:99:32: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
99 | mapTx.modify(updateIt, [=, this](CTxMemPoolEntry& e) { e.UpdateDescendantState(modifySize, modifyFee, modifyCount); });
| ~~^~~~
txmempool.cpp:300:38: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
300 | mapTx.modify(ancestorIt, [=, this](CTxMemPoolEntry& e) { e.UpdateDescendantState(updateSize, updateFee, updateCount); });
| ~~^~~~
txmempool.cpp:315:26: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
315 | mapTx.modify(it, [=, this](CTxMemPoolEntry& e){ e.UpdateAncestorState(updateSize, updateFee, updateCount, updateSigOpsCost); });
| ~~^~~~
txmempool.cpp:345:39: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
345 | mapTx.modify(dit, [=, this](CTxMemPoolEntry& e){ e.UpdateAncestorState(modifySize, modifyFee, -1, modifySigOps); });
| ~~^~~~
txmempool.cpp:903:46: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
903 | mapTx.modify(ancestorIt, [=, this](CTxMemPoolEntry& e){ e.UpdateDescendantState(0, nFeeDelta, 0);});
| ~~^~~~
txmempool.cpp:910:48: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
910 | mapTx.modify(descendantIt, [=, this](CTxMemPoolEntry& e){ e.UpdateAncestorState(0, nFeeDelta, 0, 0); });
| ~~^~~~ |
I don't understand this comment. What's the 'other' commit?
I compiled locally just fine and ran all test using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the issue is that most of the spots that were changed don't actually make use of the implicit capture of this (e.g. by accessing a class or struct member in the body of the lambda) and therefore don't need to be changed.
Only the example in the scheduler does that (which is a doc). Not the use of [=] in general is deprecated, just using [=] in combination with accessing this via implicit capture.
See https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0806r2.html for a longer explanation.
|
Seems right that the capture doesn't always need to be explicit, I dropped the second commit. I don't see any performance impact on chacha20 from the first commit: MASTER$ src/bench/bench_bitcoin -filter="CHACHA20_1MB|CHACHA20_256BYTES|CHACHA20_64BYTES"
$ src/bench/bench_bitcoin -filter="CHACHA20_1MB|CHACHA20_256BYTES|CHACHA20_64BYTES"
$ src/bench/bench_bitcoin -filter="CHACHA20_1MB|CHACHA20_256BYTES|CHACHA20_64BYTES"
THIS BRANCH$ src/bench/bench_bitcoin -filter="CHACHA20_1MB|CHACHA20_256BYTES|CHACHA20_64BYTES"
$ src/bench/bench_bitcoin -filter="CHACHA20_1MB|CHACHA20_256BYTES|CHACHA20_64BYTES"
$ src/bench/bench_bitcoin -filter="CHACHA20_1MB|CHACHA20_256BYTES|CHACHA20_64BYTES"
|
Done, thanks! I grepped for it didn't find this one, probably because of the all-caps. |
|
I guess this is fine, because the shift is constant and the compiler can skip the expensive modulo? |
|
@maflcko Constant propagation + strength reduction should make that code not emit an actual modulo operation in any compiler from the last 20 years (it'll use But hopefully that pattern in libc++ also matches whatever pattern matching clang has to make sure it instead emits an actual rot instruction on platforms where it exists. |
|
lgtm ACK 842c288852c4d66de359e6907d9c82b7e618ab65 |
|
Any reason not to do |
I don't see any, apparently I just need to learn to grep patterns better... Done! |
|
Concept ACK. |
fanquake
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 6044628
Had a look at the difference in master vs this PR (rebased) on aarch64. Seems like for the most part, this gives the same output +- a handful of instructions. (Identical is identical minus difference in addresses / argument ordering etc).
| Function | GCC 13.2 | Clang 17.0.6 |
|---|---|---|
| MurmurHash3 | Identical | PR gives 5 less instructions: https://www.diffchecker.com/Jmlpl4rG/ |
| ChaCha20Aligned::Keystream | PR gives 1 less instruction: https://www.diffchecker.com/d6dp5tsk/ | Identical |
| KeccakF | Identical | Identical |
| CSipHasher:write (unsigned long) | Identical | Identical |
So this seems ok, unless anyone can find any major regressions.
The output on corecheck also seems ok: https://corecheck.dev/bitcoin/bitcoin/pulls/29085.
|
@fanquake Which side of the diff is which version? |
|
Looks like the improvement is a bit more pronouced for x86_64, at least when using Clang (17.0.6). For example, comparing |
While exploring some C++20 changes and checking against our code I found this potential improvement:
rotl32in crypto/chacha20 withstd::rotlfrom the newbitheader.