ARMv8 SHA2 Intrinsics #24115

prusnak · 2022-01-20T18:17:31Z

This PR adds support for ARMv8 SHA2 Intrinsics.

Integration part was done by me.
The original SHA2 NI code comes from https://github.com/noloader/SHA-Intrinsics/blob/master/sha256-arm.c
Minor optimizations from https://github.com/rollmeister/bitcoin-armv8/blob/master/src/crypto/sha256.cpp are applied too.
The 2-way transform added by @sipa

laanwj · 2022-01-20T18:55:57Z

Concept ACK!

detection when the feature can be used

On Linux (the only system we care about for ARM, i guess), the following would be the way to do detection:

#include <sys/auxv.h>
#include <asm/hwcap.h>
…
#ifdef __arm__
/* ARM 32 bit */
if (getauxval(AT_HWCAP2) & HWCAP2_SHA2) {
    have_arm_shani = true;
}
#endif
#ifdef __aarch64__
/* ARM 64 bit */
if (getauxval(AT_HWCAP) & HWCAP_SHA2) {
    have_arm_shani = true;
}
#endif

Note that the capability bit is on a different HWCAP word on 32 and 64 bit (dunno if you even want to support 32 bit here).

prusnak · 2022-01-20T19:26:59Z

the following would be the way to do detection:

Added in f7dd1ef

sipa · 2022-01-20T20:43:36Z

On commit f7dd1efae715593f5c9ff8186d518d25d1c9023c

On a Linux aarch64 Cortex-A53 system with:

$ cat /proc/cpuinfo 
processor       : 0
BogoMIPS        : 200.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd03
CPU revision    : 4

which I presume means it has the necessary SHA2 extensions.

The GCC 9.3.0 compiler used supports the extensions (crypto/libbitcoin_crypto_arm_shani.a is being built):

checking for x86 SHA-NI intrinsics... no
checking whether C++ compiler accepts -march=armv8-a+crc+crypto... yes
checking whether C++ compiler accepts -march=armv8-a+crc+crypto... (cached) yes
checking for AArch64 CRC32 intrinsics... yes
checking for AArch64 SHA-NI intrinsics... yes

Still, the extension doesn't seem to be detected. debug.log says:

2022-01-20T20:37:05Z Using the 'standard' SHA256 implementation

prusnak · 2022-01-20T21:42:26Z

@sipa should be fixed in c0849fc

sipa · 2022-01-20T22:16:18Z

On c0849fc:

2022-01-20T22:15:44Z Using the 'arm_shani(1way)' SHA256 implementation

sipa · 2022-01-20T23:01:30Z

This PR (c0849fc):

ns/byte	byte/s	err%	total	benchmark
2.56	390,886,099.70	0.9%	0.03	`SHA256`
10.27	97,329,584.86	0.0%	0.01	`SHA256D64_1024`
14.60	68,478,670.64	0.4%	0.01	`SHA256_32b`

On master (e3ce019):

ns/byte	byte/s	err%	total	benchmark
15.69	63,715,155.54	0.0%	0.17	`SHA256`
43.42	23,029,615.98	0.5%	0.03	`SHA256D64_1024`
41.67	23,995,549.21	0.1%	0.01	`SHA256_32b`

PastaPastaPasta · 2022-01-21T05:08:06Z

On Linux (the only system we care about for ARM, i guess)

M1 macs would like a word with you.

PastaPastaPasta · 2022-01-21T05:31:50Z

Speaking of m1, I was able to compile this locally on my m1 pro 10 core, ./configure realized that SHA2 intrinsics could be used. See benchmarks below.

on c0849fc

ns/byte	byte/s	err%	total	benchmark
0.47	2,148,996,364.97	0.3%	0.01	`SHA256`
1.48	676,354,727.08	0.3%	0.01	`SHA256D64_1024`
1.15	873,060,380.08	0.1%	0.01	`SHA256_32b`

on master

ns/byte	byte/s	err%	total	benchmark
3.10	322,550,263.01	1.3%	0.03	`SHA256`
9.26	107,941,088.30	0.7%	0.01	`SHA256D64_1024`
6.22	160,743,377.26	1.6%	0.01	`SHA256_32b`

prusnak · 2022-01-21T10:22:03Z

Speaking of m1, I was able to compile this locally on my m1 pro 10 core, ./configure realized that SHA2 intrinsics could be used. See benchmarks below.

Yes, support for Apple Silicon is included in this PR.

src/crypto/sha256_arm_shani.cpp

hebasto · 2022-01-21T22:45:09Z

Tested c0849fc on Mac mini (M1, 2020):

% time ./src/bitcoind -datadir=/Users/hebasto/SHANI -assumevalid=0 -stopatheight=719700 -prune=550
2022-01-21T07:44:20Z Bitcoin Core version v22.99.0-c0849fc4fd9a (release build)
2022-01-21T07:44:20Z Validating signatures for all blocks.
2022-01-21T07:44:20Z Setting nMinimumChainWork=00000000000000000000000000000000000000001fa4663bbbe19f82de910280
2022-01-21T07:44:20Z Prune configured to target 550 MiB on disk for block and undo files.
2022-01-21T07:44:20Z Using the 'arm_shani(1way)' SHA256 implementation
...
2022-01-21T22:38:17Z Shutdown: done
./src/bitcoind -datadir=/Users/hebasto/SHANI -assumevalid=0  -prune=550  149587.28s user 11456.17s system 300% cpu 14:53:56.52 total

UPDATE. The same for the master branch (e3ce019):

% time ./src/bitcoind -datadir=/Users/hebasto/MASTER -assumevalid=0 -stopatheight=719700 -prune=550
2022-01-21T22:49:25Z Bitcoin Core version v22.99.0-e3ce019667fb (release build)
2022-01-21T22:49:25Z Validating signatures for all blocks.
2022-01-21T22:49:25Z Setting nMinimumChainWork=00000000000000000000000000000000000000001fa4663bbbe19f82de910280
2022-01-21T22:49:25Z Prune configured to target 550 MiB on disk for block and undo files.
2022-01-21T22:49:25Z Using the 'standard' SHA256 implementation
...
2022-01-22T14:37:08Z Shutdown: done
./src/bitcoind -datadir=/Users/hebasto/MASTER -assumevalid=0  -prune=550  174110.07s user 11526.30s system 326% cpu 15:47:43.83 total

51 min or 6% faster IBD.

sipa · 2022-01-21T23:50:11Z

See https://github.com/sipa/bitcoin/commits/pr24115, which adds a 2-way 64-byte optimized variant. On my Cortex-A53 It's roughly a 2x speedup for the SHA256D64_1024 benchmark (relevant for Merkle root computation) compared to this PR. For more modern architectures I could imagine it's more:

ns/byte	byte/s	err%	total	benchmark
2.60	384,105,263.28	0.3%	0.03	`SHA256`
5.35	187,019,153.94	0.1%	0.01	`SHA256D64_1024`
14.61	68,437,280.69	0.0%	0.01	`SHA256_32b`

For reference, master again:

ns/byte	byte/s	err%	total	benchmark
15.69	63,715,155.54	0.0%	0.17	`SHA256`
43.42	23,029,615.98	0.5%	0.03	`SHA256D64_1024`
41.67	23,995,549.21	0.1%	0.01	`SHA256_32b`

PastaPastaPasta · 2022-01-22T05:04:25Z

@sipa's branch on m1:

ns/byte	byte/s	err%	total	benchmark
0.46	2,174,603,243.64	0.6%	0.01	`SHA256`
0.95	1,053,985,898.82	0.8%	0.01	`SHA256D64_1024`
1.15	871,857,965.44	0.4%	0.01	`SHA256_32b`

previous results on c0849

ns/byte	byte/s	err%	total	benchmark
0.47	2,148,996,364.97	0.3%	0.01	`SHA256`
1.48	676,354,727.08	0.3%	0.01	`SHA256D64_1024`
1.15	873,060,380.08	0.1%	0.01	`SHA256_32b`

src/crypto/sha256.cpp

prusnak · 2022-01-22T11:55:51Z

I confirm the numbers on M1:

before @sipa's improvements (f06f46c):

ns/byte	byte/s	err%	total	benchmark
0.45	2,198,307,083.71	0.3%	0.01	`SHA256`
1.49	671,303,457.11	0.3%	0.01	`SHA256D64_1024`
1.15	866,670,334.07	0.2%	0.01	`SHA256_32b`

after @sipa's improvements (0e72995):

ns/byte	byte/s	err%	total	benchmark
0.46	2,197,198,571.82	0.3%	0.01	`SHA256`
0.94	1,059,405,097.65	0.3%	0.01	`SHA256D64_1024`
1.17	857,291,152.82	0.5%	0.01	`SHA256_32b`

prusnak · 2022-01-22T11:56:47Z

I confirm the numbers on M1:

I merged @sipa's improvements into this branch => 0e72995 ❤️

PastaPastaPasta · 2022-01-22T14:11:41Z

I'm not able to build this branch on m1 at the moment

config.status: creating libbitcoinconsensus.pc
config.status: creating Makefile
config.status: creating src/Makefile
config.status: creating doc/man/Makefile
config.status: creating share/setup.nsi
config.status: creating share/qt/Info.plist
config.status: creating test/config.ini
config.status: creating contrib/devtools/split-debug.sh
config.status: creating src/config/bitcoin-config.h
config.status: src/config/bitcoin-config.h is unchanged
config.status: executing depfiles commands
config.status: executing libtool commands
 cd . && /bin/sh /Users/pasta/workspace/bitcoin/build-aux/missing automake-1.16 --foreign
/bin/sh: /Users/pasta/workspace/bitcoin/build-aux/missing: No such file or directory
make: *** [Makefile.in] Error 1

I just checked out sipa's branch here: sipa@0e72995 and compilation worked trivially

sipa · 2022-01-22T14:30:42Z

@prusnak @PastaPastaPasta Perhaps you want to also benchmark with the two last commits removed (so at "Optimization: precompute a few 3rd transform intermediaries"). Whether the last two help may be very architecture-dependent. For me they contribute a ~30% speedup, but maytbe on M1 that is not the case.

prusnak · 2022-01-22T15:04:57Z

@sipa benchmark of 38ed75f Optimization: precompute a few 3rd transform intermediaries on M1:

ns/byte	byte/s	err%	total	benchmark
0.46	2,163,137,327.85	0.0%	0.01	`SHA256`
1.28	780,225,941.01	0.3%	0.01	`SHA256D64_1024`
1.18	850,467,547.93	0.2%	0.01	`SHA256_32b`

The improvement of using 0e72995 is there also for M1.

sipa · 2022-01-22T15:06:40Z

Looks like the 2-way version is a clear win on M1 as well, thanks!

prusnak · 2022-01-28T08:45:56Z

@fanquake rebased on top of current master

hebasto · 2022-01-28T22:43:23Z

Guix builds:

$ find guix-build-$(git rev-parse --short=12 HEAD)/output/ -type f -print0 | env LC_ALL=C sort -z | xargs -r0 sha256sum
4a309ef27036065f787330a50659c85323f78f7b5d3c69a79e3eca232f4d3e55  guix-build-aaa1d03d3ace/output/aarch64-linux-gnu/SHA256SUMS.part
cfedbd51f5bf65d57fe9200e22e56f3a38f44308eea565b201483811996d957b  guix-build-aaa1d03d3ace/output/aarch64-linux-gnu/bitcoin-aaa1d03d3ace-aarch64-linux-gnu-debug.tar.gz
85c9195cf594fbbf3ce6c6b76370f746dd4342c793b54b8b5bdb650a53761eaf  guix-build-aaa1d03d3ace/output/aarch64-linux-gnu/bitcoin-aaa1d03d3ace-aarch64-linux-gnu.tar.gz
280234b6a20bbcddb847e0bbaf27f35ac30d292ea5dd096f8c8096220f8ef29a  guix-build-aaa1d03d3ace/output/arm-linux-gnueabihf/SHA256SUMS.part
48c0fd0a6a4c1e86e942f6907804ed87527f329954e81e5bdc22beed83cbe6b3  guix-build-aaa1d03d3ace/output/arm-linux-gnueabihf/bitcoin-aaa1d03d3ace-arm-linux-gnueabihf-debug.tar.gz
c435b7a53606f33f9d23f28e6c98ce3d55f50f61fd67bf2b4ebe65261fe68aba  guix-build-aaa1d03d3ace/output/arm-linux-gnueabihf/bitcoin-aaa1d03d3ace-arm-linux-gnueabihf.tar.gz
e0d61af7b471ba5135a848be6d4d6cf0caa8042dab9d929f6578de17e47e40c2  guix-build-aaa1d03d3ace/output/arm64-apple-darwin/SHA256SUMS.part
e9eda618bf90d5d1522005bd13bd296ae89aec9c3b0b2528b9c9f67e70178828  guix-build-aaa1d03d3ace/output/arm64-apple-darwin/bitcoin-aaa1d03d3ace-arm64-apple-darwin.tar.gz
a6540dcea7c2a2562edd0f2232f511da97339d388cdca686e084d03e381d6fa2  guix-build-aaa1d03d3ace/output/arm64-apple-darwin/bitcoin-aaa1d03d3ace-osx-unsigned.dmg
6d970ae7c94b78c5f97e978718862e85c3375c48d3b8ade6fafce39e7c8e0f0e  guix-build-aaa1d03d3ace/output/arm64-apple-darwin/bitcoin-aaa1d03d3ace-osx-unsigned.tar.gz
cdbc9eb5281b14ecbecc2956a5a41709fda93762115ce3cee9516a68874676b8  guix-build-aaa1d03d3ace/output/dist-archive/bitcoin-aaa1d03d3ace.tar.gz
d063099449e40036d15ed5b023f602ba824edf303ecbe78bcd3f01feeabb535f  guix-build-aaa1d03d3ace/output/powerpc64-linux-gnu/SHA256SUMS.part
3e6da4026d466039cd361e48a492ea780021bc0efb8e67e948097871ce4226cc  guix-build-aaa1d03d3ace/output/powerpc64-linux-gnu/bitcoin-aaa1d03d3ace-powerpc64-linux-gnu-debug.tar.gz
669291a4767509f053bdd8789f4631ff6d28cd4ad62156105bc98cdb7d3a295a  guix-build-aaa1d03d3ace/output/powerpc64-linux-gnu/bitcoin-aaa1d03d3ace-powerpc64-linux-gnu.tar.gz
868cbe0f73cd786d91dd6d1990550e6c9c673ef3d874c545150b6e030951a24d  guix-build-aaa1d03d3ace/output/powerpc64le-linux-gnu/SHA256SUMS.part
90157bdc29262abd421535e4d6921e72aff7f3d43ca5634bc76598d8daf3a1ec  guix-build-aaa1d03d3ace/output/powerpc64le-linux-gnu/bitcoin-aaa1d03d3ace-powerpc64le-linux-gnu-debug.tar.gz
8275665e19f85193de4e7e5ee7b451d6a4f8e414a33586e7066575812e878eda  guix-build-aaa1d03d3ace/output/powerpc64le-linux-gnu/bitcoin-aaa1d03d3ace-powerpc64le-linux-gnu.tar.gz
2723b48d06e8217adb41bcc640417cba3470b234cce7815104e878866d775046  guix-build-aaa1d03d3ace/output/riscv64-linux-gnu/SHA256SUMS.part
f63822813587ec9e4bcc044c4b7918b1330d6b16be09f28bd95fedfd3dcdb147  guix-build-aaa1d03d3ace/output/riscv64-linux-gnu/bitcoin-aaa1d03d3ace-riscv64-linux-gnu-debug.tar.gz
3ec47d6968e2e430e3ff1629f07243f8589bd30406c0de916c2bbc6d5d88e0e8  guix-build-aaa1d03d3ace/output/riscv64-linux-gnu/bitcoin-aaa1d03d3ace-riscv64-linux-gnu.tar.gz
2120e8021edfe8170e318d82e44e405af68bb04c3fe3a3cd0407a17507cd74a0  guix-build-aaa1d03d3ace/output/x86_64-apple-darwin/SHA256SUMS.part
11b173cbeff7b20c717fd880446904eec2d33b30076348c0e9698e876f5be4a4  guix-build-aaa1d03d3ace/output/x86_64-apple-darwin/bitcoin-aaa1d03d3ace-osx-unsigned.dmg
af86643730d769ddf6635e9e240b50bd5b94aa576cf7450b4289d8ec1fc883ed  guix-build-aaa1d03d3ace/output/x86_64-apple-darwin/bitcoin-aaa1d03d3ace-osx-unsigned.tar.gz
3c3591954aaf6b0557b74baf156f892b9b90ec63c35f3495889431afe0a17f93  guix-build-aaa1d03d3ace/output/x86_64-apple-darwin/bitcoin-aaa1d03d3ace-osx64.tar.gz
553d227d7e32c72390e259a46f37e4fd98f5781bd4b6d7ed8d83c5a9ba04f65b  guix-build-aaa1d03d3ace/output/x86_64-linux-gnu/SHA256SUMS.part
479f3597f1577a33dad6634b97515f4ceac0ff163d64e7fa00b8685793d36231  guix-build-aaa1d03d3ace/output/x86_64-linux-gnu/bitcoin-aaa1d03d3ace-x86_64-linux-gnu-debug.tar.gz
c273a93458f2afa04c66d7e23346835dc6201f8e7b878ffbf661ae104c8746e7  guix-build-aaa1d03d3ace/output/x86_64-linux-gnu/bitcoin-aaa1d03d3ace-x86_64-linux-gnu.tar.gz

UPDATE: build artifacts are available in https://github.com/hebasto/artefacts/tree/master/pr24115/guix-build-aaa1d03d3ace/output

prusnak · 2022-01-29T14:18:56Z

Sjors · 2022-01-31T20:04:21Z

Concept ACK. I find it near-impossible to follow what sha256d64_arm_shani is doing, but that's mainly because our c++ TransformD64 is undocumented (introduced in #13191). In particular I don't understand how the algorithm follows from a single sha256. I assume it's an optimization. Otherwise they seem similar enough, with sha256d64_arm_shani splitting the input to take advantage of the 2-way instructions. And the tests pass :-)

sipa · 2022-01-31T20:27:01Z

@Sjors That's quite possibly worth documenting in general (for all D64 code).

What these functions do:

Take a pointer to an input N*64 bytes buffer, and an output N*32 bytes buffer (N=1 for 1-way ,N=2 for 2-way, etc).
Treat the input as the concatenation of N 64-byte inputs, compute SHA256(SHA256(input)) for each, and concatenate those outputs in the output buffer.

A bit about SHA256's structure. SHA256(bytes) is really the following algorithm:

Append padding to input (between 9 and 72 bytes); the result is always a multiple of 64 bytes.
Initialize the state (a 32-byte value, typically represented as 8 32-bit integers) to the initial state, a constant.
Then split the input into blocks of 64 bytes, and for each do state = Transform(state, block), where Transform is the SHA256 transformation function at a high level.
The hash is equal to the final state.

In case of SHA256(SHA256(64 bytes)), there are 3 Transforms being invoked:

The first operates on the 64 bytes of input, starting with initial state.
The second continues on the resulting state, processing 64 bytes of padding. That padding is a constant (it's just a function of the length of the input).
The third operates on the 32 bytes of output produced by the second transform, followed by 32 bytes of padding, which is again constant, and starting with a new initial state.

There are 3 types of optimizations we can do in this case:

Start by inlining the 3 transforms into one function body, together with all the initializations. The intermediary conversion to bytes after the second transform and then back to integers for the 3rd transform can be bypassed (serializing & deserializing is a no-op).
Observing that lots of intermediary values now actually become known at compile time. In particular, lots of values occurring during the 2nd transform (whose input is 100% fixed). I did this by simply writing the code up to this point, adding printf statements on these intermediaries, then turning the printed values into constants in the code and skipping their computation.
Taking advantage of vectorization and/or instruction level parallellism. In the case of x86 and ARM SHA instructions, we literally just duplicate every line of code (after doing the operations above), alternating between working on variables relating to a first or a second 64-byte input. This works because these instructions have a long pipeline, and there are sufficient registers available in hardware to store (most) of the data relating to two instances at once. This improves the throughput.

The individual commits in https://github.com/sipa/bitcoin/commits/pr24115 show the process.

Note that I don't think it's really required for verifying correctness to see these steps (otherwise I'd have argued for including them in this PR), but it may help understand how it came to be.

Sjors · 2022-02-01T09:29:20Z

I think this is the step that confuses me:

The second continues on the resulting state, processing 64 bytes of padding. That padding is a constant (it's just a function of the length of the input).

If the first transform is the equivalent of a single sha256(64 bytes) and the third is the equivalent of a second sha256() on the 32 byte result of the first, what is the second transform doing?

I did this by simply writing the code up to this point, adding printf statements on these intermediaries, then turning the printed values into constants in the code and skipping their computation.

This is definitely worth documenting (can be another PR). Even nicer if we can generate the values in a Python script (for manual comparison, not code generation).

sipa · 2022-02-01T13:51:01Z

@Sjors

There are two SHA256 invocations:

H1 = SHA256(input)
H2 = SHA256(H1)

Input is 64 bytes, which means it gets 64 bytes of padding (because the padding is always between 9 and 72 bytes long, and the result is always a multiple of 64).

For H2, SHA256(H1) just gets a 32-byte input, so it also only gets a 32-byte padding, and the result just needs one transform.

So we can write it this way:

H1 = Transform(Transform(Init(), input), Pad(64))
H2 = Transform(Init(), H1 + Pad(32))

The first transform is the inner one for H1, the second the outer one for H1. The third transform is the H2 one.

Sjors · 2022-02-01T14:08:31Z

Ah that makes sense.

Input is 64 bytes, which means it gets 64 bytes of padding

I naively assumed a 64 byte message wasn't padded, but it is: https://datatracker.ietf.org/doc/html/rfc6234#section-4.1

mutatrum · 2022-02-01T15:28:56Z

IBD up to block 700000 on a Rock Pi 4a w/ NVMe SSD, assumevalid=0, dbcache=2000:

master (bd482b3): 68H52M
shani (4abca94): 65H29M

Improvement ~5%

sipa · 2022-02-01T15:30:58Z

@Sjors

I naively assumed a 64 byte message wasn't padded, but it is: https://datatracker.ietf.org/doc/html/rfc6234#section-4.1

Yes, it has to be. Otherwise you'd have a trivial 2nd preimage attack between hash(X) and hash(X || padding(len(X))), for non-multiple-of-64-bytes X.

DrahtBot · 2022-02-03T12:08:58Z

Guix builds

File	commit `133f73e` (master)	commit 19a5c3f8cc5c75da931d491340c1de68867934a1 (master and this pull)
SHA256SUMS.part	`4d29ebeb3309d60d...`	`b7844e9c678b97f6...`
*-aarch64-linux-gnu-debug.tar.gz	`c29ba0d0426063e8...`	`89ff765de2630eb7...`
*-aarch64-linux-gnu.tar.gz	`e52c846b1841b3eb...`	`d6ea7915e264d3be...`
*-arm-linux-gnueabihf-debug.tar.gz	`5d3f9731cf88da5b...`	`d254cb4dc971d678...`
*-arm-linux-gnueabihf.tar.gz	`282b33b13dd1f8a0...`	`ea7a0cf5a93cb32a...`
*-arm64-apple-darwin.tar.gz	`3f0dba0a6549c410...`	`3400772805824d22...`
*-osx-unsigned.dmg	`6692809452b6cb62...`	`25da88b4e7f96778...`
*-osx-unsigned.tar.gz	`4366f453672f800d...`	`3ad587987969546b...`
*-osx64.tar.gz	`5387d0cdc36be37a...`	`198a4d9b96a87a5f...`
*-powerpc64-linux-gnu-debug.tar.gz	`1ed57908059941ef...`	`e3a27aa697de6985...`
*-powerpc64-linux-gnu.tar.gz	`21079e1bae0f459b...`	`7c91ce0fa9a40f23...`
*-powerpc64le-linux-gnu-debug.tar.gz	`6b441c519f2d3185...`	`2bfee29cc7203662...`
*-powerpc64le-linux-gnu.tar.gz	`132f95573d0fd005...`	`48a553082d2d03b8...`
*-riscv64-linux-gnu-debug.tar.gz	`3c5e1f8e3d9aa92a...`	`64dc9f50e4993c2e...`
*-riscv64-linux-gnu.tar.gz	`b17efbca76425fc5...`	`78bba78a4dd894cb...`
*-x86_64-linux-gnu-debug.tar.gz	`9f61cd7fca8d6425...`	`edbfec791406e681...`
*-x86_64-linux-gnu.tar.gz	`323dc8d7d0aa8290...`	`b38a112a72acb36a...`
*.tar.gz	`5887839fbd29cd1c...`	`58afa778369cde02...`
guix_build.log	`53868781dafe6675...`	`8cae536cac2f270c...`
guix_build.log.diff		`a14b825c6610d5c4...`

DrahtBot · 2022-02-11T22:23:29Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#24322 ([kernel 1/n] Introduce initial libbitcoinkernel by dongcarl)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

laanwj · 2022-02-14T19:46:33Z

Code review and lightly tested ACK aaa1d03
I have checked

that the code gets compiled (bitcoind contains the instructions)
on a old ARM64 device without the instruction set that it correctly doesn't enable the code.
on a recent ARM64 with the instruction set that it uses and enables the code

hebasto · 2022-02-16T11:22:52Z

src/crypto/sha256_arm_shani.cpp

+        MSG3 = vreinterpretq_u32_u8(vrev32q_u8(vld1q_u8(chunk + 48)));
+        chunk += 64;
+
+        // Original implemenation preloaded message and constant addition which was 1-3% slower.


typo: implemenation ==> implementation

@sipa

aaa1d03 Add optimized sha256d64_arm_shani::Transform_2way (Pieter Wuille) fe06298 Implement sha256_arm_shani::Transform (Pavol Rusnak) 48a72fa Add sha256_arm_shani to build system (Pavol Rusnak) c2b7934 Rename SHANI to X86_SHANI to allow future implementation of ARM_SHANI (Pavol Rusnak) Pull request description: This PR adds support for ARMv8 SHA2 Intrinsics. Fixes bitcoin#13401 and bitcoin#17414 * Integration part was done by me. * The original SHA2 NI code comes from https://github.com/noloader/SHA-Intrinsics/blob/master/sha256-arm.c * Minor optimizations from https://github.com/rollmeister/bitcoin-armv8/blob/master/src/crypto/sha256.cpp are applied too. * The 2-way transform added by @sipa ACKs for top commit: laanwj: Code review and lightly tested ACK aaa1d03 Tree-SHA512: 9689d6390c004269cb1ee79ed05430d7d35a6efef2554a2b6732f7258a11e7e959b3306c04b4e8637a9623fb4c12d1c1b3592da0ff0dc6d737932db302509669 # Conflicts: # configure.ac # src/Makefile.am # src/crypto/sha256.cpp

@sipa

aaa1d03 Add optimized sha256d64_arm_shani::Transform_2way (Pieter Wuille) fe06298 Implement sha256_arm_shani::Transform (Pavol Rusnak) 48a72fa Add sha256_arm_shani to build system (Pavol Rusnak) c2b7934 Rename SHANI to X86_SHANI to allow future implementation of ARM_SHANI (Pavol Rusnak) Pull request description: This PR adds support for ARMv8 SHA2 Intrinsics. Fixes bitcoin#13401 and bitcoin#17414 * Integration part was done by me. * The original SHA2 NI code comes from https://github.com/noloader/SHA-Intrinsics/blob/master/sha256-arm.c * Minor optimizations from https://github.com/rollmeister/bitcoin-armv8/blob/master/src/crypto/sha256.cpp are applied too. * The 2-way transform added by @sipa ACKs for top commit: laanwj: Code review and lightly tested ACK aaa1d03 Tree-SHA512: 9689d6390c004269cb1ee79ed05430d7d35a6efef2554a2b6732f7258a11e7e959b3306c04b4e8637a9623fb4c12d1c1b3592da0ff0dc6d737932db302509669 # Conflicts: # configure.ac # src/Makefile.am # src/crypto/sha256.cpp

…shani} 7fd0860 Bugfix: configure: Define defaults for enable_arm_{crc,shani} (Luke Dashjr) Pull request description: Fix for #17398 and #24115 Trivial, mostly for consistency (you'd have to *try* to break this) ACKs for top commit: pk-b2: ACK 7fd0860 seejee: ACK 7fd0860 vincenzopalazzo: ACK 7fd0860 Tree-SHA512: 51c389787c369f431ca57071f03392438bff9fd41f128c63ce74ca30d2257213f8be225efcb5c1329ad80b714f44427d721215d4f848cc8e63060fa5bc8f1f2e

…m_{crc,shani} 7fd0860 Bugfix: configure: Define defaults for enable_arm_{crc,shani} (Luke Dashjr) Pull request description: Fix for bitcoin#17398 and bitcoin#24115 Trivial, mostly for consistency (you'd have to *try* to break this) ACKs for top commit: pk-b2: ACK bitcoin@7fd0860 seejee: ACK bitcoin@7fd0860 vincenzopalazzo: ACK bitcoin@7fd0860 Tree-SHA512: 51c389787c369f431ca57071f03392438bff9fd41f128c63ce74ca30d2257213f8be225efcb5c1329ad80b714f44427d721215d4f848cc8e63060fa5bc8f1f2e

prusnak force-pushed the armv8-shani branch from 28995a6 to 283b2ed Compare January 20, 2022 18:30

prusnak marked this pull request as draft January 20, 2022 18:30

laanwj added the Utils/log/libs label Jan 20, 2022

prusnak force-pushed the armv8-shani branch from 283b2ed to ed9710c Compare January 20, 2022 19:26

prusnak force-pushed the armv8-shani branch 2 times, most recently from 3d77517 to f7dd1ef Compare January 20, 2022 19:57

prusnak force-pushed the armv8-shani branch from f7dd1ef to c0849fc Compare January 20, 2022 21:42

prusnak marked this pull request as ready for review January 20, 2022 22:18

prusnak mentioned this pull request Jan 20, 2022

ARMv8 sha2 support #13401

Closed

sipa reviewed Jan 21, 2022

View reviewed changes

src/crypto/sha256_arm_shani.cpp Outdated Show resolved Hide resolved

PastaPastaPasta reviewed Jan 22, 2022

View reviewed changes

src/crypto/sha256.cpp Show resolved Hide resolved

prusnak force-pushed the armv8-shani branch from 41bb53e to 1cfacec Compare January 22, 2022 17:34

prusnak force-pushed the armv8-shani branch from 3abdb59 to aaa1d03 Compare January 28, 2022 08:46

DrahtBot removed the DrahtBot Guix build requested label Feb 3, 2022

DrahtBot mentioned this pull request Feb 11, 2022

[kernel 0/n] Introduce bitcoin-chainstate #24304

Merged

DrahtBot mentioned this pull request Feb 13, 2022

[kernel 1/n] Introduce initial libbitcoinkernel #24322

Merged

laanwj merged commit c23bf06 into bitcoin:master Feb 14, 2022

prusnak deleted the armv8-shani branch February 14, 2022 22:17

sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Feb 15, 2022

Merge bitcoin#24115: ARMv8 SHA2 Intrinsics

42ca852

hebasto reviewed Feb 16, 2022

View reviewed changes

hebasto mentioned this pull request Apr 1, 2022

v23.0 testing #24501

Closed

5 tasks

luke-jr mentioned this pull request May 2, 2022

Bugfix: configure: Define defaults for enable_arm_{crc,shani} #25051

Merged

str4d mentioned this pull request Jul 15, 2022

Backport more recent SHA-256 assembly optimisations zcash/zcash#6080

Open

barton2526 mentioned this pull request Dec 5, 2022

ARMv8 SHA2 Intrinsics gridcoin-community/Gridcoin-Research#2612

Merged

bitcoin locked and limited conversation to collaborators Feb 16, 2023

ARMv8 SHA2 Intrinsics #24115

ARMv8 SHA2 Intrinsics #24115

Uh oh!

Conversation

prusnak commented Jan 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

laanwj commented Jan 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

prusnak commented Jan 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sipa commented Jan 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

prusnak commented Jan 20, 2022

Uh oh!

sipa commented Jan 20, 2022

Uh oh!

sipa commented Jan 20, 2022

Uh oh!

PastaPastaPasta commented Jan 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PastaPastaPasta commented Jan 21, 2022

Uh oh!

prusnak commented Jan 21, 2022

Uh oh!

Uh oh!

hebasto commented Jan 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sipa commented Jan 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PastaPastaPasta commented Jan 22, 2022

Uh oh!

Uh oh!

prusnak commented Jan 22, 2022

Uh oh!

prusnak commented Jan 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PastaPastaPasta commented Jan 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sipa commented Jan 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

prusnak commented Jan 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sipa commented Jan 22, 2022

Uh oh!

prusnak commented Jan 28, 2022

Uh oh!

hebasto commented Jan 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Guix builds:

Uh oh!

prusnak commented Jan 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Sjors commented Jan 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sipa commented Jan 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Sjors commented Feb 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sipa commented Feb 1, 2022

Uh oh!

Sjors commented Feb 1, 2022

Uh oh!

mutatrum commented Feb 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

prusnak commented Jan 20, 2022 •

edited

Loading

laanwj commented Jan 20, 2022 •

edited

Loading

prusnak commented Jan 20, 2022 •

edited

Loading

sipa commented Jan 20, 2022 •

edited

Loading

PastaPastaPasta commented Jan 21, 2022 •

edited

Loading

hebasto commented Jan 21, 2022 •

edited

Loading

sipa commented Jan 21, 2022 •

edited

Loading

prusnak commented Jan 22, 2022 •

edited

Loading

PastaPastaPasta commented Jan 22, 2022 •

edited

Loading

sipa commented Jan 22, 2022 •

edited

Loading

prusnak commented Jan 22, 2022 •

edited

Loading

hebasto commented Jan 28, 2022 •

edited

Loading

prusnak commented Jan 29, 2022 •

edited

Loading

Sjors commented Jan 31, 2022 •

edited

Loading

sipa commented Jan 31, 2022 •

edited

Loading

Sjors commented Feb 1, 2022 •

edited

Loading

mutatrum commented Feb 1, 2022 •

edited

Loading

sipa commented Feb 1, 2022 •

edited

Loading

DrahtBot commented Feb 11, 2022 •

edited

Loading

laanwj commented Feb 14, 2022 •

edited

Loading