Improve OpenSSL digest performance #118613

vcsjones · 2025-08-11T21:58:54Z

In OpenSSL 1.x, hash algorithm functions like EVP_sha256 are basically static variable lookups to a function table, making them fairly efficient.

OpenSSL 3 however, has a different model for this. OpenSSL 3 instead has a fetch mechanism using EVP_fetch which is the preferred mechanism for "get me an EVP_MD". This is called an explicit fetch. For compatibility, the old OpenSSL 1.x functions were kept, however instead of EVP_sha256 being a simple "get me this variable's value" it returns a shim that does MD_fetch for you, ever time it is used. This is called an implicit fetch.

The OpenSSL documentation recommends using explicit fetching, and memoizing the value yourself. https://docs.openssl.org/master/man7/ossl-guide-libcrypto-introduction/#performance

This has worthwhile performance benefits. For "single block" inputs in to the hash algorithm, this reduces the call time by 200-300ns. That can be anywhere from a 25% to 33% reduction for empty and a couple of blocks of input data.

Even though we memoize the value over on the managed side in the HashAlgorithmDispenser, this memorization only really avoids the p/invoke boundary. What is actually getting memoized is a lookup function (more or less)

Since the logic for this is a little more complicated now, everything is now buttoned up in a few macros.

Full Benchmark Table

Method	Toolchain	DataSize	Mean	Error	StdDev	Ratio
Sha1HashData	branch	0	368.1 ns	0.95 ns	0.88 ns	0.56
Sha1HashData	main	0	659.1 ns	0.79 ns	0.70 ns	1.00

Sha1ComputeHash	branch	0	368.7 ns	0.23 ns	0.18 ns	0.65
Sha1ComputeHash	main	0	565.1 ns	0.55 ns	0.43 ns	1.00

Sha256HashData	branch	0	371.7 ns	0.50 ns	0.45 ns	0.52
Sha256HashData	main	0	720.9 ns	3.65 ns	3.42 ns	1.00

Sha256ComputeHash	branch	0	284.3 ns	0.34 ns	0.28 ns	0.62
Sha256ComputeHash	main	0	456.4 ns	0.35 ns	0.28 ns	1.00

Sha384HashData	branch	0	533.3 ns	1.10 ns	0.98 ns	0.75
Sha384HashData	main	0	714.8 ns	1.46 ns	1.29 ns	1.00

Sha384ComputeHash	branch	0	337.1 ns	0.72 ns	0.64 ns	0.65
Sha384ComputeHash	main	0	516.1 ns	1.06 ns	0.89 ns	1.00

Sha512HashData	branch	0	588.3 ns	0.85 ns	0.75 ns	1.03
Sha512HashData	main	0	570.2 ns	1.74 ns	1.63 ns	1.00

Sha512ComputeHash	branch	0	420.4 ns	0.30 ns	0.24 ns	0.68
Sha512ComputeHash	main	0	618.4 ns	0.63 ns	0.56 ns	1.00

Sha3_256HashData	branch	0	683.4 ns	0.69 ns	0.61 ns	0.74
Sha3_256HashData	main	0	925.9 ns	1.16 ns	1.03 ns	1.00

Sha3_256ComputeHash	branch	0	470.0 ns	0.62 ns	0.55 ns	0.61
Sha3_256ComputeHash	main	0	775.7 ns	1.19 ns	0.99 ns	1.00

Sha3_384HashData	branch	0	676.6 ns	0.88 ns	0.73 ns	0.74
Sha3_384HashData	main	0	908.4 ns	1.11 ns	1.04 ns	1.00

Sha3_384ComputeHash	branch	0	535.0 ns	0.71 ns	0.63 ns	0.68
Sha3_384ComputeHash	main	0	782.5 ns	2.69 ns	2.52 ns	1.00

Sha3_512HashData	branch	0	573.2 ns	2.51 ns	2.35 ns	0.61
Sha3_512HashData	main	0	937.2 ns	0.66 ns	0.55 ns	1.00

Sha3_512ComputeHash	branch	0	541.0 ns	5.75 ns	5.09 ns	0.71
Sha3_512ComputeHash	main	0	761.7 ns	1.31 ns	1.16 ns	1.00

Shake128HashData	branch	0	700.2 ns	4.10 ns	3.83 ns	0.74
Shake128HashData	main	0	950.3 ns	3.18 ns	2.97 ns	1.00

Shake128GetHashAndReset	branch	0	499.9 ns	0.62 ns	0.52 ns	0.67
Shake128GetHashAndReset	main	0	749.6 ns	1.04 ns	0.87 ns	1.00

Shake256HashData	branch	0	758.2 ns	2.13 ns	1.88 ns	0.81
Shake256HashData	main	0	937.1 ns	1.37 ns	1.21 ns	1.00

Shake256GetHashAndReset	branch	0	444.3 ns	1.28 ns	1.13 ns	0.60
Shake256GetHashAndReset	main	0	736.8 ns	0.44 ns	0.39 ns	1.00

Sha1HashData	branch	64	528.8 ns	1.24 ns	1.10 ns	0.69
Sha1HashData	main	64	765.3 ns	1.37 ns	1.22 ns	1.00

Sha1ComputeHash	branch	64	332.5 ns	0.34 ns	0.29 ns	0.57
Sha1ComputeHash	main	64	586.7 ns	3.02 ns	2.67 ns	1.00

Sha256HashData	branch	64	497.6 ns	0.87 ns	0.73 ns	0.67
Sha256HashData	main	64	743.5 ns	1.09 ns	0.91 ns	1.00

Sha256ComputeHash	branch	64	304.5 ns	0.29 ns	0.24 ns	0.52
Sha256ComputeHash	main	64	584.1 ns	0.62 ns	0.51 ns	1.00

Sha384HashData	branch	64	600.4 ns	0.69 ns	0.57 ns	0.78
Sha384HashData	main	64	770.1 ns	0.83 ns	0.69 ns	1.00

Sha384ComputeHash	branch	64	437.1 ns	0.98 ns	0.82 ns	0.70
Sha384ComputeHash	main	64	621.7 ns	0.84 ns	0.65 ns	1.00

Sha512HashData	branch	64	614.3 ns	0.81 ns	0.72 ns	1.09
Sha512HashData	main	64	565.1 ns	1.33 ns	1.18 ns	1.00

Sha512ComputeHash	branch	64	412.8 ns	0.73 ns	0.61 ns	0.65
Sha512ComputeHash	main	64	635.1 ns	0.94 ns	0.83 ns	1.00

Sha3_256HashData	branch	64	771.7 ns	3.40 ns	2.84 ns	0.83
Sha3_256HashData	main	64	924.8 ns	0.83 ns	0.74 ns	1.00

Sha3_256ComputeHash	branch	64	495.3 ns	0.92 ns	0.77 ns	0.62
Sha3_256ComputeHash	main	64	797.3 ns	0.98 ns	0.81 ns	1.00

Sha3_384HashData	branch	64	684.3 ns	0.58 ns	0.45 ns	0.77
Sha3_384HashData	main	64	886.9 ns	0.48 ns	0.40 ns	1.00

Sha3_384ComputeHash	branch	64	567.6 ns	0.85 ns	0.76 ns	0.72
Sha3_384ComputeHash	main	64	785.0 ns	0.78 ns	0.65 ns	1.00

Sha3_512HashData	branch	64	768.6 ns	0.63 ns	0.53 ns	0.81
Sha3_512HashData	main	64	954.2 ns	1.23 ns	1.03 ns	1.00

Sha3_512ComputeHash	branch	64	566.1 ns	1.85 ns	1.64 ns	0.71
Sha3_512ComputeHash	main	64	799.3 ns	1.67 ns	1.57 ns	1.00

Shake128HashData	branch	64	778.8 ns	0.62 ns	0.55 ns	0.88
Shake128HashData	main	64	888.2 ns	0.48 ns	0.37 ns	1.00

Shake128GetHashAndReset	branch	64	523.2 ns	0.63 ns	0.50 ns	0.76
Shake128GetHashAndReset	main	64	684.9 ns	1.53 ns	1.35 ns	1.00

Shake256HashData	branch	64	715.3 ns	0.80 ns	0.71 ns	0.77
Shake256HashData	main	64	931.5 ns	0.86 ns	0.67 ns	1.00

Shake256GetHashAndReset	branch	64	524.6 ns	0.87 ns	0.73 ns	0.68
Shake256GetHashAndReset	main	64	773.0 ns	0.94 ns	0.78 ns	1.00

Sha1HashData	branch	128	446.7 ns	0.83 ns	0.74 ns	0.57
Sha1HashData	main	128	787.2 ns	0.97 ns	0.86 ns	1.00

Sha1ComputeHash	branch	128	417.7 ns	0.31 ns	0.26 ns	0.67
Sha1ComputeHash	main	128	623.9 ns	1.21 ns	1.08 ns	1.00

Sha256HashData	branch	128	590.0 ns	0.59 ns	0.46 ns	0.85
Sha256HashData	main	128	690.9 ns	1.47 ns	1.38 ns	1.00

Sha256ComputeHash	branch	128	323.0 ns	0.61 ns	0.54 ns	0.49
Sha256ComputeHash	main	128	664.2 ns	0.71 ns	0.59 ns	1.00

Sha384HashData	branch	128	510.4 ns	1.23 ns	1.09 ns	0.62
Sha384HashData	main	128	828.0 ns	3.59 ns	3.36 ns	1.00

Sha384ComputeHash	branch	128	458.1 ns	0.49 ns	0.44 ns	0.70
Sha384ComputeHash	main	128	650.5 ns	2.67 ns	2.23 ns	1.00

Sha512HashData	branch	128	610.4 ns	1.47 ns	1.23 ns	0.75
Sha512HashData	main	128	814.9 ns	1.86 ns	1.55 ns	1.00

Sha512ComputeHash	branch	128	428.9 ns	1.57 ns	1.47 ns	0.72
Sha512ComputeHash	main	128	594.6 ns	0.90 ns	0.75 ns	1.00

Sha3_256HashData	branch	128	759.6 ns	0.62 ns	0.52 ns	0.79
Sha3_256HashData	main	128	956.2 ns	2.02 ns	1.79 ns	1.00

Sha3_256ComputeHash	branch	128	565.4 ns	1.96 ns	1.74 ns	0.72
Sha3_256ComputeHash	main	128	786.7 ns	0.61 ns	0.51 ns	1.00

Sha3_384HashData	branch	128	967.0 ns	4.84 ns	4.53 ns	0.85
Sha3_384HashData	main	128	1,140.2 ns	3.28 ns	3.07 ns	1.00

Sha3_384ComputeHash	branch	128	758.5 ns	0.74 ns	0.58 ns	0.76
Sha3_384ComputeHash	main	128	995.0 ns	3.98 ns	3.32 ns	1.00

Sha3_512HashData	branch	128	970.4 ns	2.98 ns	2.49 ns	0.86
Sha3_512HashData	main	128	1,127.0 ns	1.22 ns	1.08 ns	1.00

Sha3_512ComputeHash	branch	128	758.1 ns	0.87 ns	0.73 ns	0.76
Sha3_512ComputeHash	main	128	991.6 ns	2.06 ns	1.82 ns	1.00

Shake128HashData	branch	128	708.7 ns	1.76 ns	1.65 ns	0.74
Shake128HashData	main	128	958.3 ns	0.98 ns	0.77 ns	1.00

Shake128GetHashAndReset	branch	128	540.5 ns	1.20 ns	1.06 ns	0.86
Shake128GetHashAndReset	main	128	631.6 ns	1.23 ns	1.09 ns	1.00

Shake256HashData	branch	128	578.8 ns	1.13 ns	1.00 ns	0.61
Shake256HashData	main	128	950.0 ns	3.88 ns	3.63 ns	1.00

Shake256GetHashAndReset	branch	128	522.7 ns	0.89 ns	0.75 ns	0.83
Shake256GetHashAndReset	main	128	628.4 ns	0.41 ns	0.34 ns	1.00

dotnet-policy-service · 2025-08-11T22:00:30Z

Tagging subscribers to this area: @dotnet/area-system-security, @bartonjs, @vcsjones
See info in area-owners.md if you want to be subscribed.

src/native/libs/System.Security.Cryptography.Native/pal_evp.c

bartonjs · 2025-08-11T22:26:45Z

Wouldn't we get basically the same performance improvement by just caching the answer in managed and not doing anything here in native?

src/native/libs/System.Security.Cryptography.Native/pal_evp.c

vcsjones · 2025-08-11T22:31:13Z

Wouldn't we get basically the same performance improvement by just caching the answer in managed and not doing anything here in native?

I don't think so.

I attempted to address that here:

Even though we memoize the value over on the managed side in the HashAlgorithmDispenser, this memorization only really avoids the p/invoke boundary. What is actually getting memoized is a lookup function (more or less)

We are already doing the memoization. For example:

runtime/src/libraries/Common/src/Interop/Unix/System.Security.Cryptography.Native/Interop.EVP.DigestAlgs.cs

Lines 43 to 44 in f30c40e

    
           private static IntPtr EvpSha256() => 
        
               s_evpSha256 != IntPtr.Zero ? s_evpSha256 : (s_evpSha256 = CryptoNative_EvpSha256());

The problem is what EVP_sha256 returns on OpenSSL 3 is another lazy lookup itself for a thing that does EVP_fetch. We are not memoizing a real EVP_MD. It's an EVP_MD that looks up another EVP_MD (an implicit fetcher).

bartonjs · 2025-08-11T22:37:10Z

Weird, it looks like stable info to me:

https://github.com/openssl/openssl/blob/076f7b24fee1b80a5cda898f385ae813217c823f/crypto/evp/legacy_sha.c#L92-L105

But perhaps EVP_ORIG_GLOBAL means something like "ask fetch first, but use the LEGACY_EVP_MD_METH_TABLE as a fallback"

bartonjs

It looks simple enough to not bother holding for 11.

Basically, it'll work, and be faster, or not, and explode on RH.old; but I don't see very much need for bake time.

src/native/libs/System.Security.Cryptography.Native/pal_evp.c

Copilot

Pull Request Overview

This PR improves OpenSSL digest algorithm performance by switching from implicit to explicit fetching in OpenSSL 3.x. The change addresses performance issues where OpenSSL 3.x's compatibility functions for legacy algorithms perform expensive implicit fetches on each call.

Key changes:

Replaces individual hash algorithm functions with macro-based implementations that use explicit fetching
Introduces proper memoization using pthread_once to ensure fetched algorithms are cached
Falls back to implicit fetching when explicit fetching fails or is unavailable

Comments suppressed due to low confidence (1)

src/native/libs/System.Security.Cryptography.Native/pal_evp.c:24

The backslash continuation should have a space before it for consistency with the other macro lines. All other continuation lines in the macro have a space before the backslash.

vcsjones · 2025-08-12T00:09:49Z

/azp run runtime-libraries-coreclr outerloop

azure-pipelines · 2025-08-12T00:10:04Z

Azure Pipelines successfully started running 1 pipeline(s).

vcsjones · 2025-08-12T00:10:12Z

Outerloop has more exotic OpenSSL configurations, so let's see if anything fails to build or run there.

vcsjones · 2025-08-12T02:31:26Z

/azp run runtime-extra-platforms

azure-pipelines · 2025-08-12T02:31:44Z

Azure Pipelines failed to run 1 pipeline(s).

vcsjones · 2025-08-12T19:41:00Z

/azp run runtime-extra-platforms

azure-pipelines · 2025-08-12T19:41:18Z

Azure Pipelines successfully started running 1 pipeline(s).

vcsjones · 2025-08-13T03:41:48Z

Weird, it looks like stable info to me

From the docs:

Prior to OpenSSL 3.0, constant method tables (such as EVP_sha256()) were used directly to access methods. If you pass one of these convenience functions to an operation the fixed methods are ignored, and only the name is used to internally fetch methods from a provider.

vcsjones · 2025-08-13T14:33:42Z

/ba-g failures are unrelated, and tracked elsewhere.

vcsjones added 2 commits August 11, 2025 15:58

Improve performance of SHA-1 and SHA-2

7a3b608

Figure out SHA-3

b898849

github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Aug 11, 2025

dotnet-policy-service bot assigned vcsjones Aug 11, 2025

vcsjones added area-System.Security and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Aug 11, 2025

vcsjones commented Aug 11, 2025

View reviewed changes

PranavSenthilnathan reviewed Aug 11, 2025

View reviewed changes

src/native/libs/System.Security.Cryptography.Native/pal_evp.c Outdated Show resolved Hide resolved

Use full function name

e05c9d2

bartonjs reviewed Aug 11, 2025

View reviewed changes

src/native/libs/System.Security.Cryptography.Native/pal_evp.c Show resolved Hide resolved

bartonjs approved these changes Aug 11, 2025

View reviewed changes

src/native/libs/System.Security.Cryptography.Native/pal_evp.c Outdated Show resolved Hide resolved

vcsjones marked this pull request as ready for review August 11, 2025 22:47

Copilot AI review requested due to automatic review settings August 11, 2025 22:47

Copilot AI reviewed Aug 11, 2025

View reviewed changes

Code review feedback

22ae7d9

bartonjs approved these changes Aug 11, 2025

View reviewed changes

build-analysis bot mentioned this pull request Aug 12, 2025

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

3 tasks

Merge branch 'main' into evpmd-perf

290b238

build-analysis bot mentioned this pull request Aug 12, 2025

System.Net.Mail.Tests.SmtpClientTlsTest_Send.AuthenticationException_Propagates failure: expected typeof(System.Security.Authentication.AuthenticationException), actual typeof(System.Net.Mail.SmtpException) #117963

Closed

This was referenced Aug 12, 2025

[android] System.Net.Http.Functional.Tests failure with TimeoutException #118118

Closed

[Android][CoreCLR] System.Security.Cryptography.Tests killed by lowmemorykiller #118603

Open

vcsjones merged commit 7bc40a5 into dotnet:main Aug 13, 2025
91 of 97 checks passed

vcsjones deleted the evpmd-perf branch August 13, 2025 14:36

vcsjones added this to the 10.0.0 milestone Aug 13, 2025

vcsjones added the tenet-performance Performance related issue label Aug 13, 2025

AndyAyersMS mentioned this pull request Aug 19, 2025

[Perf] Linux/x64: 1 Improvement on 8/13/2025 2:36:55 PM +00:00 dotnet/perf-autofiling-issues#60795

Closed

github-actions bot locked and limited conversation to collaborators Sep 13, 2025

vcsjones added the cryptographic-docs-impact Issues impacting cryptographic docs. Cleared and reused after documentation is updated each release. label Sep 25, 2025

bartonjs added the tracking This issue is tracking the completion of other related issues. label Oct 24, 2025

Improve OpenSSL digest performance #118613

Improve OpenSSL digest performance #118613

Uh oh!

Conversation

vcsjones commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dotnet-policy-service bot commented Aug 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bartonjs commented Aug 11, 2025

Uh oh!

Uh oh!

vcsjones commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bartonjs commented Aug 11, 2025

Uh oh!

bartonjs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

vcsjones commented Aug 12, 2025

Uh oh!

azure-pipelines bot commented Aug 12, 2025

Uh oh!

vcsjones commented Aug 12, 2025

Uh oh!

vcsjones commented Aug 12, 2025

Uh oh!

azure-pipelines bot commented Aug 12, 2025

Uh oh!

vcsjones commented Aug 12, 2025

Uh oh!

azure-pipelines bot commented Aug 12, 2025

Uh oh!

vcsjones commented Aug 13, 2025

Uh oh!

vcsjones commented Aug 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vcsjones commented Aug 11, 2025 •

edited

Loading

vcsjones commented Aug 11, 2025 •

edited

Loading