Skip to content

Conversation

@vcsjones
Copy link
Member

@vcsjones vcsjones commented Aug 11, 2025

In OpenSSL 1.x, hash algorithm functions like EVP_sha256 are basically static variable lookups to a function table, making them fairly efficient.

OpenSSL 3 however, has a different model for this. OpenSSL 3 instead has a fetch mechanism using EVP_fetch which is the preferred mechanism for "get me an EVP_MD". This is called an explicit fetch. For compatibility, the old OpenSSL 1.x functions were kept, however instead of EVP_sha256 being a simple "get me this variable's value" it returns a shim that does MD_fetch for you, ever time it is used. This is called an implicit fetch.

The OpenSSL documentation recommends using explicit fetching, and memoizing the value yourself. https://docs.openssl.org/master/man7/ossl-guide-libcrypto-introduction/#performance

This has worthwhile performance benefits. For "single block" inputs in to the hash algorithm, this reduces the call time by 200-300ns. That can be anywhere from a 25% to 33% reduction for empty and a couple of blocks of input data.

Even though we memoize the value over on the managed side in the HashAlgorithmDispenser, this memorization only really avoids the p/invoke boundary. What is actually getting memoized is a lookup function (more or less)

Since the logic for this is a little more complicated now, everything is now buttoned up in a few macros.

Full Benchmark Table
Method Toolchain DataSize Mean Error StdDev Ratio
Sha1HashData branch 0 368.1 ns 0.95 ns 0.88 ns 0.56
Sha1HashData main 0 659.1 ns 0.79 ns 0.70 ns 1.00
Sha1ComputeHash branch 0 368.7 ns 0.23 ns 0.18 ns 0.65
Sha1ComputeHash main 0 565.1 ns 0.55 ns 0.43 ns 1.00
Sha256HashData branch 0 371.7 ns 0.50 ns 0.45 ns 0.52
Sha256HashData main 0 720.9 ns 3.65 ns 3.42 ns 1.00
Sha256ComputeHash branch 0 284.3 ns 0.34 ns 0.28 ns 0.62
Sha256ComputeHash main 0 456.4 ns 0.35 ns 0.28 ns 1.00
Sha384HashData branch 0 533.3 ns 1.10 ns 0.98 ns 0.75
Sha384HashData main 0 714.8 ns 1.46 ns 1.29 ns 1.00
Sha384ComputeHash branch 0 337.1 ns 0.72 ns 0.64 ns 0.65
Sha384ComputeHash main 0 516.1 ns 1.06 ns 0.89 ns 1.00
Sha512HashData branch 0 588.3 ns 0.85 ns 0.75 ns 1.03
Sha512HashData main 0 570.2 ns 1.74 ns 1.63 ns 1.00
Sha512ComputeHash branch 0 420.4 ns 0.30 ns 0.24 ns 0.68
Sha512ComputeHash main 0 618.4 ns 0.63 ns 0.56 ns 1.00
Sha3_256HashData branch 0 683.4 ns 0.69 ns 0.61 ns 0.74
Sha3_256HashData main 0 925.9 ns 1.16 ns 1.03 ns 1.00
Sha3_256ComputeHash branch 0 470.0 ns 0.62 ns 0.55 ns 0.61
Sha3_256ComputeHash main 0 775.7 ns 1.19 ns 0.99 ns 1.00
Sha3_384HashData branch 0 676.6 ns 0.88 ns 0.73 ns 0.74
Sha3_384HashData main 0 908.4 ns 1.11 ns 1.04 ns 1.00
Sha3_384ComputeHash branch 0 535.0 ns 0.71 ns 0.63 ns 0.68
Sha3_384ComputeHash main 0 782.5 ns 2.69 ns 2.52 ns 1.00
Sha3_512HashData branch 0 573.2 ns 2.51 ns 2.35 ns 0.61
Sha3_512HashData main 0 937.2 ns 0.66 ns 0.55 ns 1.00
Sha3_512ComputeHash branch 0 541.0 ns 5.75 ns 5.09 ns 0.71
Sha3_512ComputeHash main 0 761.7 ns 1.31 ns 1.16 ns 1.00
Shake128HashData branch 0 700.2 ns 4.10 ns 3.83 ns 0.74
Shake128HashData main 0 950.3 ns 3.18 ns 2.97 ns 1.00
Shake128GetHashAndReset branch 0 499.9 ns 0.62 ns 0.52 ns 0.67
Shake128GetHashAndReset main 0 749.6 ns 1.04 ns 0.87 ns 1.00
Shake256HashData branch 0 758.2 ns 2.13 ns 1.88 ns 0.81
Shake256HashData main 0 937.1 ns 1.37 ns 1.21 ns 1.00
Shake256GetHashAndReset branch 0 444.3 ns 1.28 ns 1.13 ns 0.60
Shake256GetHashAndReset main 0 736.8 ns 0.44 ns 0.39 ns 1.00
Sha1HashData branch 64 528.8 ns 1.24 ns 1.10 ns 0.69
Sha1HashData main 64 765.3 ns 1.37 ns 1.22 ns 1.00
Sha1ComputeHash branch 64 332.5 ns 0.34 ns 0.29 ns 0.57
Sha1ComputeHash main 64 586.7 ns 3.02 ns 2.67 ns 1.00
Sha256HashData branch 64 497.6 ns 0.87 ns 0.73 ns 0.67
Sha256HashData main 64 743.5 ns 1.09 ns 0.91 ns 1.00
Sha256ComputeHash branch 64 304.5 ns 0.29 ns 0.24 ns 0.52
Sha256ComputeHash main 64 584.1 ns 0.62 ns 0.51 ns 1.00
Sha384HashData branch 64 600.4 ns 0.69 ns 0.57 ns 0.78
Sha384HashData main 64 770.1 ns 0.83 ns 0.69 ns 1.00
Sha384ComputeHash branch 64 437.1 ns 0.98 ns 0.82 ns 0.70
Sha384ComputeHash main 64 621.7 ns 0.84 ns 0.65 ns 1.00
Sha512HashData branch 64 614.3 ns 0.81 ns 0.72 ns 1.09
Sha512HashData main 64 565.1 ns 1.33 ns 1.18 ns 1.00
Sha512ComputeHash branch 64 412.8 ns 0.73 ns 0.61 ns 0.65
Sha512ComputeHash main 64 635.1 ns 0.94 ns 0.83 ns 1.00
Sha3_256HashData branch 64 771.7 ns 3.40 ns 2.84 ns 0.83
Sha3_256HashData main 64 924.8 ns 0.83 ns 0.74 ns 1.00
Sha3_256ComputeHash branch 64 495.3 ns 0.92 ns 0.77 ns 0.62
Sha3_256ComputeHash main 64 797.3 ns 0.98 ns 0.81 ns 1.00
Sha3_384HashData branch 64 684.3 ns 0.58 ns 0.45 ns 0.77
Sha3_384HashData main 64 886.9 ns 0.48 ns 0.40 ns 1.00
Sha3_384ComputeHash branch 64 567.6 ns 0.85 ns 0.76 ns 0.72
Sha3_384ComputeHash main 64 785.0 ns 0.78 ns 0.65 ns 1.00
Sha3_512HashData branch 64 768.6 ns 0.63 ns 0.53 ns 0.81
Sha3_512HashData main 64 954.2 ns 1.23 ns 1.03 ns 1.00
Sha3_512ComputeHash branch 64 566.1 ns 1.85 ns 1.64 ns 0.71
Sha3_512ComputeHash main 64 799.3 ns 1.67 ns 1.57 ns 1.00
Shake128HashData branch 64 778.8 ns 0.62 ns 0.55 ns 0.88
Shake128HashData main 64 888.2 ns 0.48 ns 0.37 ns 1.00
Shake128GetHashAndReset branch 64 523.2 ns 0.63 ns 0.50 ns 0.76
Shake128GetHashAndReset main 64 684.9 ns 1.53 ns 1.35 ns 1.00
Shake256HashData branch 64 715.3 ns 0.80 ns 0.71 ns 0.77
Shake256HashData main 64 931.5 ns 0.86 ns 0.67 ns 1.00
Shake256GetHashAndReset branch 64 524.6 ns 0.87 ns 0.73 ns 0.68
Shake256GetHashAndReset main 64 773.0 ns 0.94 ns 0.78 ns 1.00
Sha1HashData branch 128 446.7 ns 0.83 ns 0.74 ns 0.57
Sha1HashData main 128 787.2 ns 0.97 ns 0.86 ns 1.00
Sha1ComputeHash branch 128 417.7 ns 0.31 ns 0.26 ns 0.67
Sha1ComputeHash main 128 623.9 ns 1.21 ns 1.08 ns 1.00
Sha256HashData branch 128 590.0 ns 0.59 ns 0.46 ns 0.85
Sha256HashData main 128 690.9 ns 1.47 ns 1.38 ns 1.00
Sha256ComputeHash branch 128 323.0 ns 0.61 ns 0.54 ns 0.49
Sha256ComputeHash main 128 664.2 ns 0.71 ns 0.59 ns 1.00
Sha384HashData branch 128 510.4 ns 1.23 ns 1.09 ns 0.62
Sha384HashData main 128 828.0 ns 3.59 ns 3.36 ns 1.00
Sha384ComputeHash branch 128 458.1 ns 0.49 ns 0.44 ns 0.70
Sha384ComputeHash main 128 650.5 ns 2.67 ns 2.23 ns 1.00
Sha512HashData branch 128 610.4 ns 1.47 ns 1.23 ns 0.75
Sha512HashData main 128 814.9 ns 1.86 ns 1.55 ns 1.00
Sha512ComputeHash branch 128 428.9 ns 1.57 ns 1.47 ns 0.72
Sha512ComputeHash main 128 594.6 ns 0.90 ns 0.75 ns 1.00
Sha3_256HashData branch 128 759.6 ns 0.62 ns 0.52 ns 0.79
Sha3_256HashData main 128 956.2 ns 2.02 ns 1.79 ns 1.00
Sha3_256ComputeHash branch 128 565.4 ns 1.96 ns 1.74 ns 0.72
Sha3_256ComputeHash main 128 786.7 ns 0.61 ns 0.51 ns 1.00
Sha3_384HashData branch 128 967.0 ns 4.84 ns 4.53 ns 0.85
Sha3_384HashData main 128 1,140.2 ns 3.28 ns 3.07 ns 1.00
Sha3_384ComputeHash branch 128 758.5 ns 0.74 ns 0.58 ns 0.76
Sha3_384ComputeHash main 128 995.0 ns 3.98 ns 3.32 ns 1.00
Sha3_512HashData branch 128 970.4 ns 2.98 ns 2.49 ns 0.86
Sha3_512HashData main 128 1,127.0 ns 1.22 ns 1.08 ns 1.00
Sha3_512ComputeHash branch 128 758.1 ns 0.87 ns 0.73 ns 0.76
Sha3_512ComputeHash main 128 991.6 ns 2.06 ns 1.82 ns 1.00
Shake128HashData branch 128 708.7 ns 1.76 ns 1.65 ns 0.74
Shake128HashData main 128 958.3 ns 0.98 ns 0.77 ns 1.00
Shake128GetHashAndReset branch 128 540.5 ns 1.20 ns 1.06 ns 0.86
Shake128GetHashAndReset main 128 631.6 ns 1.23 ns 1.09 ns 1.00
Shake256HashData branch 128 578.8 ns 1.13 ns 1.00 ns 0.61
Shake256HashData main 128 950.0 ns 3.88 ns 3.63 ns 1.00
Shake256GetHashAndReset branch 128 522.7 ns 0.89 ns 0.75 ns 0.83
Shake256GetHashAndReset main 128 628.4 ns 0.41 ns 0.34 ns 1.00

@github-actions github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Aug 11, 2025
@vcsjones vcsjones added area-System.Security and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Aug 11, 2025
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-security, @bartonjs, @vcsjones
See info in area-owners.md if you want to be subscribed.

@bartonjs
Copy link
Member

Wouldn't we get basically the same performance improvement by just caching the answer in managed and not doing anything here in native?

@vcsjones
Copy link
Member Author

vcsjones commented Aug 11, 2025

Wouldn't we get basically the same performance improvement by just caching the answer in managed and not doing anything here in native?

I don't think so.

I attempted to address that here:

Even though we memoize the value over on the managed side in the HashAlgorithmDispenser, this memorization only really avoids the p/invoke boundary. What is actually getting memoized is a lookup function (more or less)

We are already doing the memoization. For example:

private static IntPtr EvpSha256() =>
s_evpSha256 != IntPtr.Zero ? s_evpSha256 : (s_evpSha256 = CryptoNative_EvpSha256());

The problem is what EVP_sha256 returns on OpenSSL 3 is another lazy lookup itself for a thing that does EVP_fetch. We are not memoizing a real EVP_MD. It's an EVP_MD that looks up another EVP_MD (an implicit fetcher).

@bartonjs
Copy link
Member

Weird, it looks like stable info to me:

https://github.com/openssl/openssl/blob/076f7b24fee1b80a5cda898f385ae813217c823f/crypto/evp/legacy_sha.c#L92-L105

But perhaps EVP_ORIG_GLOBAL means something like "ask fetch first, but use the LEGACY_EVP_MD_METH_TABLE as a fallback"

Copy link
Member

@bartonjs bartonjs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks simple enough to not bother holding for 11.

Basically, it'll work, and be faster, or not, and explode on RH.old; but I don't see very much need for bake time.

@vcsjones vcsjones marked this pull request as ready for review August 11, 2025 22:47
Copilot AI review requested due to automatic review settings August 11, 2025 22:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves OpenSSL digest algorithm performance by switching from implicit to explicit fetching in OpenSSL 3.x. The change addresses performance issues where OpenSSL 3.x's compatibility functions for legacy algorithms perform expensive implicit fetches on each call.

Key changes:

  • Replaces individual hash algorithm functions with macro-based implementations that use explicit fetching
  • Introduces proper memoization using pthread_once to ensure fetched algorithms are cached
  • Falls back to implicit fetching when explicit fetching fails or is unavailable
Comments suppressed due to low confidence (1)

src/native/libs/System.Security.Cryptography.Native/pal_evp.c:24

  • The backslash continuation should have a space before it for consistency with the other macro lines. All other continuation lines in the macro have a space before the backslash.
\

@vcsjones
Copy link
Member Author

/azp run runtime-libraries-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@vcsjones
Copy link
Member Author

Outerloop has more exotic OpenSSL configurations, so let's see if anything fails to build or run there.

@vcsjones
Copy link
Member Author

/azp run runtime-extra-platforms

@azure-pipelines
Copy link

Azure Pipelines failed to run 1 pipeline(s).

@vcsjones
Copy link
Member Author

/azp run runtime-extra-platforms

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@vcsjones
Copy link
Member Author

Weird, it looks like stable info to me

From the docs:

Prior to OpenSSL 3.0, constant method tables (such as EVP_sha256()) were used directly to access methods. If you pass one of these convenience functions to an operation the fixed methods are ignored, and only the name is used to internally fetch methods from a provider.

@vcsjones
Copy link
Member Author

/ba-g failures are unrelated, and tracked elsewhere.

@vcsjones vcsjones merged commit 7bc40a5 into dotnet:main Aug 13, 2025
91 of 97 checks passed
@vcsjones vcsjones deleted the evpmd-perf branch August 13, 2025 14:36
@vcsjones vcsjones added this to the 10.0.0 milestone Aug 13, 2025
@vcsjones vcsjones added the tenet-performance Performance related issue label Aug 13, 2025
@github-actions github-actions bot locked and limited conversation to collaborators Sep 13, 2025
@vcsjones vcsjones added the cryptographic-docs-impact Issues impacting cryptographic docs. Cleared and reused after documentation is updated each release. label Sep 25, 2025
@bartonjs bartonjs added the tracking This issue is tracking the completion of other related issues. label Oct 24, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-System.Security cryptographic-docs-impact Issues impacting cryptographic docs. Cleared and reused after documentation is updated each release. tenet-performance Performance related issue tracking This issue is tracking the completion of other related issues.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants