Skip to content

Conversation

@fjahr
Copy link
Contributor

@fjahr fjahr commented Dec 14, 2023

While exploring some C++20 changes and checking against our code I found this potential improvement:

  1. We can replace our custom implementation of rotl32 in crypto/chacha20 with std::rotl from the new bit header.

@DrahtBot
Copy link
Contributor

DrahtBot commented Dec 14, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK fanquake
Concept ACK hebasto
Stale ACK maflcko

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

No conflicts as of last run.

@maflcko
Copy link
Member

maflcko commented Dec 15, 2023

Not sure. This needs a benchmark to check against slowdown, see #29057 (comment)

Also, C++20 is enabled for years, so I don't understand the other commit. I think you either forgot to compile locally, or your compiler doesn't emit any warnings that the changes are wrong?

@fanquake
Copy link
Member

txmempool.cpp:88:61: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   88 |             mapTx.modify(mapTx.iterator_to(descendant), [=, this](CTxMemPoolEntry& e) {
      |                                                           ~~^~~~
txmempool.cpp:99:32: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
   99 |     mapTx.modify(updateIt, [=, this](CTxMemPoolEntry& e) { e.UpdateDescendantState(modifySize, modifyFee, modifyCount); });
      |                              ~~^~~~
txmempool.cpp:300:38: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
  300 |         mapTx.modify(ancestorIt, [=, this](CTxMemPoolEntry& e) { e.UpdateDescendantState(updateSize, updateFee, updateCount); });
      |                                    ~~^~~~
txmempool.cpp:315:26: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
  315 |     mapTx.modify(it, [=, this](CTxMemPoolEntry& e){ e.UpdateAncestorState(updateSize, updateFee, updateCount, updateSigOpsCost); });
      |                        ~~^~~~
txmempool.cpp:345:39: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
  345 |                 mapTx.modify(dit, [=, this](CTxMemPoolEntry& e){ e.UpdateAncestorState(modifySize, modifyFee, -1, modifySigOps); });
      |                                     ~~^~~~
txmempool.cpp:903:46: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
  903 |                 mapTx.modify(ancestorIt, [=, this](CTxMemPoolEntry& e){ e.UpdateDescendantState(0, nFeeDelta, 0);});
      |                                            ~~^~~~
txmempool.cpp:910:48: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
  910 |                 mapTx.modify(descendantIt, [=, this](CTxMemPoolEntry& e){ e.UpdateAncestorState(0, nFeeDelta, 0, 0); });
      |                                              ~~^~~~

@fjahr
Copy link
Contributor Author

fjahr commented Dec 15, 2023

Also, C++20 is enabled for years, so I don't understand the other commit.

I don't understand this comment. What's the 'other' commit?

I think you either forgot to compile locally, or your compiler doesn't emit any warnings that the changes are wrong?

I compiled locally just fine and ran all test using Apple clang version 15.0.0 as the compiler. I will look into the warnings.

Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue is that most of the spots that were changed don't actually make use of the implicit capture of this (e.g. by accessing a class or struct member in the body of the lambda) and therefore don't need to be changed.
Only the example in the scheduler does that (which is a doc). Not the use of [=] in general is deprecated, just using [=] in combination with accessing this via implicit capture.
See https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0806r2.html for a longer explanation.

@fjahr
Copy link
Contributor Author

fjahr commented Dec 15, 2023

Seems right that the capture doesn't always need to be explicit, I dropped the second commit.

I don't see any performance impact on chacha20 from the first commit:

MASTER

$ src/bench/bench_bitcoin -filter="CHACHA20_1MB|CHACHA20_256BYTES|CHACHA20_64BYTES"

ns/byte byte/s err% total benchmark
1.92 522,177,619.15 0.1% 0.02 CHACHA20_1MB
1.94 514,584,551.79 0.3% 0.01 CHACHA20_256BYTES
2.02 495,631,174.53 0.1% 0.01 CHACHA20_64BYTES

$ src/bench/bench_bitcoin -filter="CHACHA20_1MB|CHACHA20_256BYTES|CHACHA20_64BYTES"

ns/byte byte/s err% total benchmark
1.91 522,318,337.55 0.1% 0.02 CHACHA20_1MB
1.94 514,750,575.59 0.2% 0.01 CHACHA20_256BYTES
2.02 495,682,263.12 0.0% 0.01 CHACHA20_64BYTES

$ src/bench/bench_bitcoin -filter="CHACHA20_1MB|CHACHA20_256BYTES|CHACHA20_64BYTES"

ns/byte byte/s err% total benchmark
1.92 521,106,902.91 0.3% 0.02 CHACHA20_1MB
1.94 515,778,495.78 0.0% 0.01 CHACHA20_256BYTES
2.02 495,731,465.14 0.0% 0.01 CHACHA20_64BYTES
THIS BRANCH

$ src/bench/bench_bitcoin -filter="CHACHA20_1MB|CHACHA20_256BYTES|CHACHA20_64BYTES"

ns/byte byte/s err% total benchmark
1.92 520,578,875.51 0.3% 0.02 CHACHA20_1MB
1.94 515,833,155.08 0.1% 0.01 CHACHA20_256BYTES
2.02 495,727,504.36 0.0% 0.01 CHACHA20_64BYTES

$ src/bench/bench_bitcoin -filter="CHACHA20_1MB|CHACHA20_256BYTES|CHACHA20_64BYTES"

ns/byte byte/s err% total benchmark
1.91 522,209,865.84 0.1% 0.02 CHACHA20_1MB
1.94 515,822,396.86 0.0% 0.01 CHACHA20_256BYTES
2.02 495,715,040.19 0.0% 0.01 CHACHA20_64BYTES

$ src/bench/bench_bitcoin -filter="CHACHA20_1MB|CHACHA20_256BYTES|CHACHA20_64BYTES"

ns/byte byte/s err% total benchmark
1.92 522,069,205.88 0.1% 0.02 CHACHA20_1MB
1.94 514,707,650.50 0.2% 0.01 CHACHA20_256BYTES
2.02 495,706,912.69 0.0% 0.01 CHACHA20_64BYTES

@fjahr fjahr changed the title refactor: C++20: Use lambda implicit capture and std::rotl refactor: C++20: Use std::rotl Dec 16, 2023
@fanquake
Copy link
Member

If we are going to change it here, should also switch in hash.cpp:

bitcoin/src/hash.cpp

Lines 12 to 15 in c840dea

inline uint32_t ROTL32(uint32_t x, int8_t r)
{
return (x << r) | (x >> (32 - r));
}

Also cc @theuni.

@fjahr
Copy link
Contributor Author

fjahr commented Dec 24, 2023

If we are going to change it here, should also switch in hash.cpp

Done, thanks! I grepped for it didn't find this one, probably because of the all-caps.

@maflcko
Copy link
Member

maflcko commented Dec 25, 2023

@sipa
Copy link
Member

sipa commented Dec 25, 2023

@maflcko Constant propagation + strength reduction should make that code not emit an actual modulo operation in any compiler from the last 20 years (it'll use x & (d-1) instead of x % d, if d is a known power of two).

But hopefully that pattern in libc++ also matches whatever pattern matching clang has to make sure it instead emits an actual rot instruction on platforms where it exists.

@DrahtBot
Copy link
Contributor

Guix builds (on x86_64)

File commit 4b1196a
(master)
commit 1defdd46bbed418ebe19ed087d9d499c5fe7e12c
(master and this pull)
SHA256SUMS.part 7aefd803a584a152... f66270f8381da78c...
*-aarch64-linux-gnu-debug.tar.gz 735f38fb37b239f8... 0178f4f1e25a9ccd...
*-aarch64-linux-gnu.tar.gz 83a257c74e184b96... e70f485ce7c509e1...
*-arm-linux-gnueabihf-debug.tar.gz 513acf435b18fab2... 66dcefcdfcdb8171...
*-arm-linux-gnueabihf.tar.gz 581f911fcca5ae02... 419f97951c22103a...
*-arm64-apple-darwin-unsigned.tar.gz 7a3f2b4e01f27896... ee665ca81967f2e3...
*-arm64-apple-darwin-unsigned.zip dff45b014236e990... d8316d5dc9bd85c5...
*-arm64-apple-darwin.tar.gz 0d9696b7bd76b36f... 0a812fb90bc27826...
*-powerpc64-linux-gnu-debug.tar.gz dfe6cfdcd6ad3c6f... a5e96360b63b9c9b...
*-powerpc64-linux-gnu.tar.gz 54c013201daad1ff... 13216629ab10e2fa...
*-powerpc64le-linux-gnu-debug.tar.gz 8e944b002273b678... 6213e437a6763983...
*-powerpc64le-linux-gnu.tar.gz 1ba8199984d7c43a... 75447353ea712f0a...
*-riscv64-linux-gnu-debug.tar.gz e9abe48aeb7acc68... a884e5a19ce37301...
*-riscv64-linux-gnu.tar.gz f4d1484dec65b967... 341b54b9c84ca385...
*-x86_64-apple-darwin-unsigned.tar.gz 915650aa35e6e322... 8928011cd2a21165...
*-x86_64-apple-darwin-unsigned.zip 6a6139cb415de743... d136c4f68df7be28...
*-x86_64-apple-darwin.tar.gz c479d099d1df9a85... 226bc38c63f84332...
*-x86_64-linux-gnu-debug.tar.gz 9e27161179f18a4a... e6abee7e9248190f...
*-x86_64-linux-gnu.tar.gz f69632ca3dc98896... a7bce0cff5b25503...
*.tar.gz c04997c9f9e51d80... a25b31625606e2cf...
guix_build.log 412e8bd624c6cbcf... e6a2ef83e981021e...
guix_build.log.diff 55ad9bb7e18060bd...

@maflcko
Copy link
Member

maflcko commented Jan 4, 2024

lgtm ACK 842c288852c4d66de359e6907d9c82b7e618ab65

@theuni
Copy link
Member

theuni commented Jan 4, 2024

Any reason not to do Rotl in sha3 and ROTL in siphash as well?

@fjahr
Copy link
Contributor Author

fjahr commented Jan 5, 2024

Any reason not to do Rotl in sha3 and ROTL in siphash as well?

I don't see any, apparently I just need to learn to grep patterns better...

Done!

@hebasto
Copy link
Member

hebasto commented Jan 5, 2024

Concept ACK.

Copy link
Member

@fanquake fanquake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 6044628

Had a look at the difference in master vs this PR (rebased) on aarch64. Seems like for the most part, this gives the same output +- a handful of instructions. (Identical is identical minus difference in addresses / argument ordering etc).

Function GCC 13.2 Clang 17.0.6
MurmurHash3 Identical PR gives 5 less instructions: https://www.diffchecker.com/Jmlpl4rG/
ChaCha20Aligned::Keystream PR gives 1 less instruction: https://www.diffchecker.com/d6dp5tsk/ Identical
KeccakF Identical Identical
CSipHasher:write (unsigned long) Identical Identical

So this seems ok, unless anyone can find any major regressions.
The output on corecheck also seems ok: https://corecheck.dev/bitcoin/bitcoin/pulls/29085.

@DrahtBot DrahtBot requested review from hebasto and maflcko January 16, 2024 16:23
@sipa
Copy link
Member

sipa commented Jan 16, 2024

@fanquake Which side of the diff is which version?

@fanquake
Copy link
Member

Which side of the diff is which version?

Left is master @ f1fcc96 , right was the PR based on f1fcc96.

@fanquake
Copy link
Member

Looks like the improvement is a bit more pronouced for x86_64, at least when using Clang (17.0.6). For example, comparing MurmurHash3, master (left), and this PR (right), we get less code and more rol instructions: https://www.diffchecker.com/I7s5JeTr/. For GCC 13.2, this PR makes no difference to the generated assembly: https://www.diffchecker.com/nM5DgKfN/.

@fanquake fanquake merged commit 03c5b00 into bitcoin:master Jan 18, 2024
@bitcoin bitcoin locked and limited conversation to collaborators Jan 17, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants