sha256: small speedup for sse4 path. #13400

theuni · 2018-06-05T19:06:33Z

This is an optimization described in the intel sha256 whitepaper (page 10). It speeds up the SHA256D64_1024 bench (sse4 path, no avx2) for me by ~6%.

sipa · 2018-06-05T19:07:47Z

ACK, benchmarked to be around 5% faster for SSE4 (when disabling the AVX2 code on my i7-7820HQ).

theuni · 2018-06-05T20:17:46Z

Sadly, @laanwj saw a 25% penalty for this change on pre-avx2 AMD. Closing :(

The algorithm used to calculate Sigma0/Sigma1 here is the same as the one used in bitcoin#13400, which resulted in a ~5% speedup on Intel CPUs, but close to 25% penalty on AMD. So unfortunately, this should result in a ~5% slowdown on Intel. However, bitcoin#13400 was operating on 128bit registers and these are 32bit, it's possible that there's no penalty on AMD, and this change is not needed.

theuni added 2 commits June 4, 2018 17:52

crypto: split out Rotations

a2724af

crypto: sha256 optim: reduce register copies

ea3ed0c

maflcko added the Refactoring label Jun 5, 2018

theuni closed this Jun 5, 2018

theuni mentioned this pull request Jun 5, 2018

SHA256 implementations based on Intel SHA Extensions #13386

Merged

bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sha256: small speedup for sse4 path. #13400

sha256: small speedup for sse4 path. #13400

Uh oh!

theuni commented Jun 5, 2018

Uh oh!

sipa commented Jun 5, 2018

Uh oh!

theuni commented Jun 5, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sha256: small speedup for sse4 path. #13400

sha256: small speedup for sse4 path. #13400

Uh oh!

Conversation

theuni commented Jun 5, 2018

Uh oh!

sipa commented Jun 5, 2018

Uh oh!

theuni commented Jun 5, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants