Skip to content

Conversation

@theuni
Copy link
Member

@theuni theuni commented Jun 5, 2018

This is an optimization described in the intel sha256 whitepaper (page 10). It speeds up the SHA256D64_1024 bench (sse4 path, no avx2) for me by ~6%.

@sipa
Copy link
Member

sipa commented Jun 5, 2018

ACK, benchmarked to be around 5% faster for SSE4 (when disabling the AVX2 code on my i7-7820HQ).

@theuni
Copy link
Member Author

theuni commented Jun 5, 2018

Sadly, @laanwj saw a 25% penalty for this change on pre-avx2 AMD. Closing :(

@theuni theuni closed this Jun 5, 2018
theuni added a commit to theuni/bitcoin that referenced this pull request Jun 12, 2018
The algorithm used to calculate Sigma0/Sigma1 here is the same as the one used
in bitcoin#13400, which resulted in a ~5% speedup on Intel CPUs, but close to 25%
penalty on AMD.

So unfortunately, this should result in a ~5% slowdown on Intel.

However, bitcoin#13400 was operating on 128bit registers and these are 32bit, it's
possible that there's no penalty on AMD, and this change is not needed.
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants