-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Improve OpenSSL digest performance #118613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Tagging subscribers to this area: @dotnet/area-system-security, @bartonjs, @vcsjones |
|
Wouldn't we get basically the same performance improvement by just caching the answer in managed and not doing anything here in native? |
I don't think so. I attempted to address that here:
We are already doing the memoization. For example: Lines 43 to 44 in f30c40e
The problem is what |
|
Weird, it looks like stable info to me: But perhaps EVP_ORIG_GLOBAL means something like "ask fetch first, but use the LEGACY_EVP_MD_METH_TABLE as a fallback" |
bartonjs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks simple enough to not bother holding for 11.
Basically, it'll work, and be faster, or not, and explode on RH.old; but I don't see very much need for bake time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR improves OpenSSL digest algorithm performance by switching from implicit to explicit fetching in OpenSSL 3.x. The change addresses performance issues where OpenSSL 3.x's compatibility functions for legacy algorithms perform expensive implicit fetches on each call.
Key changes:
- Replaces individual hash algorithm functions with macro-based implementations that use explicit fetching
- Introduces proper memoization using pthread_once to ensure fetched algorithms are cached
- Falls back to implicit fetching when explicit fetching fails or is unavailable
Comments suppressed due to low confidence (1)
src/native/libs/System.Security.Cryptography.Native/pal_evp.c:24
- The backslash continuation should have a space before it for consistency with the other macro lines. All other continuation lines in the macro have a space before the backslash.
\
|
/azp run runtime-libraries-coreclr outerloop |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Outerloop has more exotic OpenSSL configurations, so let's see if anything fails to build or run there. |
|
/azp run runtime-extra-platforms |
|
Azure Pipelines failed to run 1 pipeline(s). |
|
/azp run runtime-extra-platforms |
|
Azure Pipelines successfully started running 1 pipeline(s). |
From the docs:
|
|
/ba-g failures are unrelated, and tracked elsewhere. |
In OpenSSL 1.x, hash algorithm functions like
EVP_sha256are basically static variable lookups to a function table, making them fairly efficient.OpenSSL 3 however, has a different model for this. OpenSSL 3 instead has a fetch mechanism using
EVP_fetchwhich is the preferred mechanism for "get me an EVP_MD". This is called an explicit fetch. For compatibility, the old OpenSSL 1.x functions were kept, however instead ofEVP_sha256being a simple "get me this variable's value" it returns a shim that doesMD_fetchfor you, ever time it is used. This is called an implicit fetch.The OpenSSL documentation recommends using explicit fetching, and memoizing the value yourself. https://docs.openssl.org/master/man7/ossl-guide-libcrypto-introduction/#performance
This has worthwhile performance benefits. For "single block" inputs in to the hash algorithm, this reduces the call time by 200-300ns. That can be anywhere from a 25% to 33% reduction for empty and a couple of blocks of input data.
Even though we memoize the value over on the managed side in the HashAlgorithmDispenser, this memorization only really avoids the p/invoke boundary. What is actually getting memoized is a lookup function (more or less)
Since the logic for this is a little more complicated now, everything is now buttoned up in a few macros.
Full Benchmark Table