DRBG: Use the EVP layer to do AES encryption by kroeckx · Pull Request #5580 · openssl/openssl

kroeckx · 2018-03-10T11:31:55Z

This is WIP because there are still style issues (line too long) in it, and I want to properly check errors. But it passes the regression tests.

To generate 1000000000 bytes of random data with "openssl rand", I need this amount of time:
1.1.0: 20.233s
HEAD: 12.944s
This PR: 3.024s

t-j-h · 2018-03-10T11:36:11Z

Can you add in your equivalent timing for 1.0.2 - that would help.

mspncp · 2018-03-10T11:38:08Z

Interesting test results! But why is master much faster than 1.1.0? How does this fit together with the regression reported in #5559?

kroeckx · 2018-03-10T12:13:02Z

Note that that was one of the regressions, he started at 24K, and that commit changed it from 19K to 15K, so there are clearly other regressions too.

levitte · 2018-03-10T12:12:57Z

crypto/rand/drbg_ctr.c

I don't see you free this context anywhere, and that might explain why Travis thinks this leaks...

I was thinking I needed to free that and then forgot all about it. :)

Should be fixed

kroeckx · 2018-03-10T12:17:07Z

On 1.0.2 it takes 41.48 s

kroeckx · 2018-03-10T12:21:36Z

On a 1.0.2 shared build it takes 26.862, the 1.1 numers where all for a shared build too

kroeckx · 2018-03-10T17:38:18Z

The 1.1.0 numbers I've reported before were from the Debian version which is at least build with -O2 instead of -O3. All build with a "./config shared" currently gives me:
1.0.2: 26.86
1.1.0: 19.13
master: 13.1
This PR: 2.95

mspncp · 2018-03-10T17:42:22Z

I have two questions:

There was a discussion with @p-steuer about pre-increment vs. post-increment in #4530 (comment). Did you have that in your mind? How was this problem addressed?

It was my first attempt to implement ctr_generate cleanly using EVP_aes_x_ctr. However, all other drbg functions turned out not to be a natural fit for the aes-ctr evp interfaces, best example is probably ctr_update. After all i found it cleaner to "export" the required functions from the e_aes.c module such that the subsequent changes in drbg_rand.c become trivial.

Looking at these things again, i noticed i could use CRYPTO_ctr128_encrypt_ctr32 from modes.h which already implements the 32-to-128 bit counter-overflow instead of re-implementing this. However, the functions from e_aes.c would still be needed.

In general, using aes-ctr functions to implement ctr_drbg generate requires the drbg counter to be incremented after operations (instead of before operations, as in the nist standard's pseudo code). I put these changes in an extra patch.

Patrick added the CAVS tests in 7aba2e4. Did you run them before and after your modifications? IMHO adding the CAVS tests to the testsuite and verifying them should be part of this pull request.

kroeckx · 2018-03-10T17:48:12Z

I used the ECB mode instead of a counter mode. Everything related to the counter is still in the DRBG, it didn't move to the EVP layer. I'll look into those CAVS tests.

mspncp · 2018-03-10T18:10:46Z

Ah, I see!

kroeckx · 2018-03-10T18:37:44Z

I've added the CAVS tests. It also had extra kat tests in that pull request by they fail both before and after my patch.

kroeckx · 2018-03-10T19:13:44Z

Switching back to aes-128-ctr gives this:
master: 9.2
This PR: 2.3

levitte

Picking on style for a bit...

levitte · 2018-03-10T22:43:22Z

crypto/rand/drbg_ctr.c

Style: indent

levitte · 2018-03-10T22:43:59Z

crypto/rand/drbg_ctr.c

Style: indent

levitte · 2018-03-10T22:44:29Z

crypto/rand/drbg_ctr.c

levitte · 2018-03-10T22:45:11Z

crypto/rand/drbg_ctr.c

Small indentation

levitte · 2018-03-10T22:45:22Z

crypto/rand/drbg_ctr.c

intent? :) I at least got some right!

levitte · 2018-03-10T22:45:44Z

crypto/rand/drbg_ctr.c

levitte · 2018-03-10T22:45:54Z

crypto/rand/drbg_ctr.c

... you know the drill

levitte · 2018-03-10T22:46:03Z

crypto/rand/drbg_ctr.c

mspncp · 2018-03-10T23:31:22Z

crypto/rand/drbg_ctr.c

What makes you think that this is an error?

No idea why that's there, fixed.

Now all execution branches return success. So you might want to change the return value of ctr_XOR() back to void again and remove the return value checks in ctr_update().

Signed-off-by: Patrick Steuer <[email protected]>

mspncp · 2018-03-11T23:23:56Z

I compared the performances with and without e6f5902 on my computer and strangely performance was exactly the same, about 25 sec. Does anyone see my mistake, or did I miss something?

Here's my current git log:

419a3af688 (HEAD -> pr-5580, github/pr-5580) aes ctr_drbg: add cavs tests
e6f5902d21 DRBG: Use the EVP layer to do AES encryption
78a50c7524 Configurations/10-main.conf: VC-<target> cleanups.

I tested 419a3af vs. 78a50c7, using the following commands:

make clean
perl ./Configure linux-x86_64 no-idea no-camellia threads shared --strict-warnings --prefix=/home/msp/install/openssl
make -j 8
time util/shlib_wrap.sh openssl rand  -out /dev/null  1000000000

Performance test for `419a3af`

msp@msppc:~/src/openssl$ time util/shlib_wrap.sh openssl rand  -out /dev/null  1000000000

real    0m24,795s
user    0m24,759s
sys     0m0,037s

Performance test for `78a50c7`

msp@msppc:~/src/openssl$ time util/shlib_wrap.sh openssl rand  -out /dev/null  1000000000

real    0m24,797s
user    0m24,768s
sys     0m0,031s

Is hardware support enabled?

I checked grep aes /proc/cpuinfo, and also did a speed test, which suggests that hardware support is available.

msp@msppc:~/src/openssl$ util/shlib_wrap.sh apps/openssl speed -elapsed aes-128-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128 cbc for 3s on 16 size blocks: 26239530 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 64 size blocks: 7217622 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 256 size blocks: 1846145 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 1024 size blocks: 464811 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 8192 size blocks: 58329 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 16384 size blocks: 29176 aes-128 cbc's in 3.00s
OpenSSL 1.1.1-pre3-dev  xx XXX xxxx
built on: Sun Mar 11 22:26:48 2018 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr)
compiler: gcc -fPIC -pthread -m64 -DDEBUG_UNUSED -DPEDANTIC -pedantic -Wno-long-long -Wall -Wextra -Wno-unused-parameter -Wno-missing-field-initializers -Wswitch -Wsign-compare -Wmissing-prototypes -Wshadow -Wformat -Wtype-limits -Wundef -Werror -Wmisleading-indentation -Wall -O0 -g -DOPENSSL_USE_NODELETE -DL_ENDIAN -DDSO_DLFCN -DHAVE_DLFCN_H -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPADLOCK_ASM -DPOLY1305_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128 cbc     139944.16k   153975.94k   157537.71k   158655.49k   159277.06k   159339.86k

msp@msppc:~/src/openssl$ util/shlib_wrap.sh apps/openssl speed -elapsed -evp aes-128-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128-cbc for 3s on 16 size blocks: 97827193 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 33430983 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 9190109 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 2348750 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 295717 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 16384 size blocks: 147945 aes-128-cbc's in 3.00s
OpenSSL 1.1.1-pre3-dev  xx XXX xxxx
built on: Sun Mar 11 22:26:48 2018 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr)
compiler: gcc -fPIC -pthread -m64 -DDEBUG_UNUSED -DPEDANTIC -pedantic -Wno-long-long -Wall -Wextra -Wno-unused-parameter -Wno-missing-field-initializers -Wswitch -Wsign-compare -Wmissing-prototypes -Wshadow -Wformat -Wtype-limits -Wundef -Werror -Wmisleading-indentation -Wall -O0 -g -DOPENSSL_USE_NODELETE -DL_ENDIAN -DDSO_DLFCN -DHAVE_DLFCN_H -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPADLOCK_ASM -DPOLY1305_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128-cbc     521745.03k   713194.30k   784222.63k   801706.67k   807504.55k   807976.96k

kroeckx · 2018-03-11T23:48:51Z

Try using "apps/openssl" instead of "openssl"? My guess is that the installed version is not a 1.1.0 version, so it's not picking up your new shared library.

mspncp · 2018-03-11T23:59:56Z

Thanks! I new there must have been some stupid mistake. But I didn't see that I forgot the relative path. Now the result looks much better!

real    0m2,561s
user    0m2,532s
sys     0m0,033s

p-steuer · 2018-03-12T10:11:17Z

Heres some perf data i collected on my thinkpad ...

master (b38fa98)

$ LD_LIBRARY_PATH=. time apps/openssl rand 1000000000 > /dev/null
10.70user 0.18system 0:10.89elapsed 99%CPU

$ LD_LIBRARY_PATH=. apps/openssl speed rand
[...]
options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -fPIC -pthread -m64  -Wa,--noexecstack -Wall -O3 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DDSO_DLFCN -DHAVE_DLFCN_H -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DNDEBUG
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
rand              4355.11k    15758.76k    42451.46k    76392.45k    97383.77k    98724.52k

master+this PR

$ LD_LIBRARY_PATH=. time apps/openssl rand 1000000000 > /dev/null
2.87user 0.17system 0:03.05elapsed 99%CPU

$ LD_LIBRARY_PATH=. apps/openssl speed rand
[...]
options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -fPIC -pthread -m64  -Wa,--noexecstack -Wall -O3 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DDSO_DLFCN -DHAVE_DLFCN_H -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DNDEBUG
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
rand             10363.86k    39082.40k   120344.92k   258768.90k   380887.04k   394062.51k

Copied from my original PR

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
rand             85196.83k   337101.99k  1086201.86k  2433860.27k  3830319.79k  4126512.47k

So for x86 this PR is a pretty good improvement.

But it should be evaluated on other architectures as well ..

For s390 it shows a 20% / 35% performance degeneration for longer / shorter buffers. This is because AES_encrypt already has hardware support on s390, so this PR only adds EVP overhead:

master:

  96.55%  openssl  libcrypto.so.1.1    [.] AES_encrypt
   1.69%  openssl  libcrypto.so.1.1    [.] drbg_ctr_generate
   0.35%  openssl  libcrypto.so.1.1    [.] ctr_update

master+this PR:

  69.69%  openssl  libcrypto.so.1.1    [.] AES_encrypt
  11.88%  openssl  libcrypto.so.1.1    [.] EVP_EncryptUpdate
   6.90%  openssl  libcrypto.so.1.1    [.] aes_ecb_cipher

The main reason this PR's performance gain (on x86) stays much behind the original PR's is that the generate operation is still a loop, encrypting one AES block at a time.

If this PR is not planned to be included in coming 1.1.1 release, i would suggest to take the time to evaluate the pre-inc vs post-inc thing and possibly go for a ctr-mode-based generate operation.

p-steuer · 2018-03-12T16:59:53Z

It also had extra kat tests in that pull request by they fail both before and after my patch.

Commit a93ba40 breaks those kat tests. Didnt look into detail but its most likely due to DRBG instances reseeding automatically:

The kats check for correct 32-to-128-bit counter overflow during generate operation and thus need to perform a lot of "deterministic" generate ops i.e, w/o reseeding.

kroeckx · 2018-03-12T18:02:10Z

I can try that test again after disabling the reseeding.

kroeckx · 2018-03-12T18:05:18Z

We're moving the beta soon, but I guess this is not a feature and so could possible be merged after we've moved to beta, but I would like to have something before we release and this looks like the most simple way for now.

kroeckx · 2018-03-12T20:46:33Z

The extra kat test still fails after disabling reseeding

p-steuer · 2018-03-13T14:58:56Z

Didnt look into detail but its most likely due to DRBG instances reseeding automatically

My guess was wrong: The kats use callbacks returning all zero buffs to to provide a deterministic result.

The first commit breaking the kats is a93ba40 :
Before, a drbg instance's reseed_counter was initialized to 1. The patch renamed reseed_counter to generate_counter, which is now initialized to 0. I.e. automatic reseeding happens one generate operation later.

The second commit breaking the kats is 4917e91 :
Before, a drbg instance's reseed_interval was initialized to MAX_RESEED_INTERVAL (2^24) as default. This was changed to MASTER_RESEED_INTERVAL (2^8).

So the kat failure is not a bug, its just due to changes in the reseed behavior. The test cases could be adapted to reflect the new behavior, but since they only made sense in the ctr-mode based original implementation, they can just be ignored in the context of this PR.

p-steuer · 2018-03-14T09:39:52Z

@kroeckx

[...] but I would like to have something before we release and this looks like the most simple way for now.

Okay. Long-term i would welcome a CTR-mode based/optimized*** solution.

@mspncp :

There was a discussion with @p-steuer about pre-increment vs. post-increment in #4530 (comment). Did you have that in your mind? How was this problem addressed?

It was my first attempt to implement ctr_generate cleanly using EVP_aes_x_ctr. However, all other drbg functions turned out not to be a natural fit for the aes-ctr evp interfaces, best example is probably ctr_update. After all i found it cleaner to "export" the required functions from the e_aes.c module such that the subsequent changes in drbg_rand.c become trivial.

[...]

In general, using aes-ctr functions to implement ctr_drbg generate requires the drbg counter to be incremented after operations (instead of before operations, as in the nist standard's pseudo code). I put these changes in an extra patch.

Regarding the "pre-increment vs. post-increment" w.r.t. FIPS compliance discussion: The GCM specs (NIST SP 800-38D) feature a pre-inc pseudo code. However, libraries (including libcrypto) implement GCM using CTR-mode* based encryption, instead of using an "inc/ECB"-loop to match said pseudocode. My PR tried to do the same optimization for DRBG.

Regarding the "EVP-interface must be used" discussion: I dont agree that libcrypto submodules should be limited** by only being allowed to use exported interfaces.

*(which is usually implemented as post-inc, i.e. cipher's context structure keeps the counter updated for next operation)
**See discussion in #4530 for why i feel forcing EVP might not be the right choice in this case.
edit:
***Note that its not only a question of enabling hw support for "plain" aes algorithm but also of enabling dedicated aes-ctr hw support.

kroeckx · 2018-03-21T17:54:48Z

Is there something that still needs to happen? Or can someone review this?

richsalz

Yes, this gets back some big performance loss.

Reviewed-by: Rich Salz <[email protected]> GH: #5580

Signed-off-by: Patrick Steuer <[email protected]> Reviewed-by: Kurt Roeckx <[email protected]> Reviewed-by: Rich Salz <[email protected]> GH: #5580

kroeckx mentioned this pull request Mar 10, 2018

Lack of performances compared to 1.1.0 #5559

Closed

levitte requested changes Mar 10, 2018

View reviewed changes

kroeckx force-pushed the drbg_evp branch from 66fd529 to 63fa494 Compare March 10, 2018 16:37

kroeckx changed the title ~~WIP: DRBG: Use the EVP layer to do AES encryption~~ DRBG: Use the EVP layer to do AES encryption Mar 10, 2018

kroeckx force-pushed the drbg_evp branch 3 times, most recently from 6a4c2e4 to 568a2d1 Compare March 10, 2018 17:29

kroeckx force-pushed the drbg_evp branch from 568a2d1 to d37ac53 Compare March 10, 2018 18:35

kroeckx force-pushed the drbg_evp branch from d37ac53 to efe02f9 Compare March 10, 2018 21:40

levitte requested changes Mar 10, 2018

View reviewed changes

kroeckx force-pushed the drbg_evp branch 2 times, most recently from 289fda5 to a48b63f Compare March 10, 2018 22:53

mspncp reviewed Mar 10, 2018

View reviewed changes

kroeckx force-pushed the drbg_evp branch from a48b63f to 201a237 Compare March 10, 2018 23:49

kroeckx and others added 2 commits March 11, 2018 12:27

DRBG: Use the EVP layer to do AES encryption

e6f5902

aes ctr_drbg: add cavs tests

419a3af

Signed-off-by: Patrick Steuer <[email protected]>

kroeckx force-pushed the drbg_evp branch from 201a237 to 419a3af Compare March 11, 2018 11:28

mspncp mentioned this pull request Mar 12, 2018

rand -out /dev/null gives permission denied #5590

Closed

mattcaswell added this to the 1.1.1 milestone Mar 20, 2018

richsalz approved these changes Mar 21, 2018

View reviewed changes

kroeckx closed this Mar 21, 2018

levitte pushed a commit that referenced this pull request Mar 21, 2018

DRBG: Use the EVP layer to do AES encryption

dbdcc04

Reviewed-by: Rich Salz <[email protected]> GH: #5580

levitte pushed a commit that referenced this pull request Mar 21, 2018

aes ctr_drbg: add cavs tests

e613b1e

Signed-off-by: Patrick Steuer <[email protected]> Reviewed-by: Kurt Roeckx <[email protected]> Reviewed-by: Rich Salz <[email protected]> GH: #5580

mspncp mentioned this pull request Nov 15, 2019

WIP: AESNI based DRBG CTR #10456

Closed

Uh oh!

Conversation

kroeckx commented Mar 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

t-j-h commented Mar 10, 2018

Uh oh!

mspncp commented Mar 10, 2018

Uh oh!

kroeckx commented Mar 10, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kroeckx commented Mar 10, 2018

Uh oh!

kroeckx commented Mar 10, 2018

Uh oh!

kroeckx commented Mar 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mspncp commented Mar 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kroeckx commented Mar 10, 2018

Uh oh!

mspncp commented Mar 10, 2018

Uh oh!

kroeckx commented Mar 10, 2018

Uh oh!

kroeckx commented Mar 10, 2018

Uh oh!

levitte left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mspncp commented Mar 11, 2018

Performance test for 419a3af

Performance test for 78a50c7

Is hardware support enabled?

Uh oh!

kroeckx commented Mar 11, 2018 via email

Uh oh!

mspncp commented Mar 11, 2018

Uh oh!

p-steuer commented Mar 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

p-steuer commented Mar 12, 2018

Uh oh!

kroeckx commented Mar 12, 2018

Uh oh!

kroeckx commented Mar 10, 2018 •

edited

Loading

kroeckx commented Mar 10, 2018 •

edited

Loading

mspncp commented Mar 10, 2018 •

edited

Loading

Performance test for `419a3af`

Performance test for `78a50c7`

p-steuer commented Mar 12, 2018 •

edited

Loading

p-steuer commented Mar 13, 2018 •

edited

Loading

p-steuer commented Mar 14, 2018 •

edited

Loading