Skip to content

DRBG: Use the EVP layer to do AES encryption#5580

Closed
kroeckx wants to merge 2 commits intoopenssl:masterfrom
kroeckx:drbg_evp
Closed

DRBG: Use the EVP layer to do AES encryption#5580
kroeckx wants to merge 2 commits intoopenssl:masterfrom
kroeckx:drbg_evp

Conversation

@kroeckx
Copy link
Member

@kroeckx kroeckx commented Mar 10, 2018

This is WIP because there are still style issues (line too long) in it, and I want to properly check errors. But it passes the regression tests.

To generate 1000000000 bytes of random data with "openssl rand", I need this amount of time:
1.1.0: 20.233s
HEAD: 12.944s
This PR: 3.024s

@t-j-h
Copy link
Member

t-j-h commented Mar 10, 2018

Can you add in your equivalent timing for 1.0.2 - that would help.

@mspncp
Copy link
Contributor

mspncp commented Mar 10, 2018

Interesting test results! But why is master much faster than 1.1.0? How does this fit together with the regression reported in #5559?

@kroeckx
Copy link
Member Author

kroeckx commented Mar 10, 2018

Note that that was one of the regressions, he started at 24K, and that commit changed it from 19K to 15K, so there are clearly other regressions too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see you free this context anywhere, and that might explain why Travis thinks this leaks...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking I needed to free that and then forgot all about it. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed

@kroeckx
Copy link
Member Author

kroeckx commented Mar 10, 2018

On 1.0.2 it takes 41.48 s

@kroeckx
Copy link
Member Author

kroeckx commented Mar 10, 2018

On a 1.0.2 shared build it takes 26.862, the 1.1 numers where all for a shared build too

@kroeckx kroeckx changed the title WIP: DRBG: Use the EVP layer to do AES encryption DRBG: Use the EVP layer to do AES encryption Mar 10, 2018
@kroeckx kroeckx force-pushed the drbg_evp branch 3 times, most recently from 6a4c2e4 to 568a2d1 Compare March 10, 2018 17:29
@kroeckx
Copy link
Member Author

kroeckx commented Mar 10, 2018

The 1.1.0 numbers I've reported before were from the Debian version which is at least build with -O2 instead of -O3. All build with a "./config shared" currently gives me:
1.0.2: 26.86
1.1.0: 19.13
master: 13.1
This PR: 2.95

@mspncp
Copy link
Contributor

mspncp commented Mar 10, 2018

I have two questions:

There was a discussion with @p-steuer about pre-increment vs. post-increment in #4530 (comment). Did you have that in your mind? How was this problem addressed?

It was my first attempt to implement ctr_generate cleanly using EVP_aes_x_ctr. However, all other drbg functions turned out not to be a natural fit for the aes-ctr evp interfaces, best example is probably ctr_update. After all i found it cleaner to "export" the required functions from the e_aes.c module such that the subsequent changes in drbg_rand.c become trivial.

Looking at these things again, i noticed i could use CRYPTO_ctr128_encrypt_ctr32 from modes.h which already implements the 32-to-128 bit counter-overflow instead of re-implementing this. However, the functions from e_aes.c would still be needed.

In general, using aes-ctr functions to implement ctr_drbg generate requires the drbg counter to be incremented after operations (instead of before operations, as in the nist standard's pseudo code). I put these changes in an extra patch.

Patrick added the CAVS tests in 7aba2e4. Did you run them before and after your modifications? IMHO adding the CAVS tests to the testsuite and verifying them should be part of this pull request.

@kroeckx
Copy link
Member Author

kroeckx commented Mar 10, 2018

I used the ECB mode instead of a counter mode. Everything related to the counter is still in the DRBG, it didn't move to the EVP layer. I'll look into those CAVS tests.

@mspncp
Copy link
Contributor

mspncp commented Mar 10, 2018

Ah, I see!

@kroeckx
Copy link
Member Author

kroeckx commented Mar 10, 2018

I've added the CAVS tests. It also had extra kat tests in that pull request by they fail both before and after my patch.

@kroeckx
Copy link
Member Author

kroeckx commented Mar 10, 2018

Switching back to aes-128-ctr gives this:
master: 9.2
This PR: 2.3

Copy link
Member

@levitte levitte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Picking on style for a bit...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: indent

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: indent

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indent

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small indentation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intent

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intent? :) I at least got some right!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intent

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... you know the drill

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup...

@kroeckx kroeckx force-pushed the drbg_evp branch 2 times, most recently from 289fda5 to a48b63f Compare March 10, 2018 22:53
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What makes you think that this is an error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea why that's there, fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now all execution branches return success. So you might want to change the return value of ctr_XOR() back to void again and remove the return value checks in ctr_update().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@mspncp
Copy link
Contributor

mspncp commented Mar 11, 2018

I compared the performances with and without e6f5902 on my computer and strangely performance was exactly the same, about 25 sec. Does anyone see my mistake, or did I miss something?

Here's my current git log:

419a3af688 (HEAD -> pr-5580, github/pr-5580) aes ctr_drbg: add cavs tests
e6f5902d21 DRBG: Use the EVP layer to do AES encryption
78a50c7524 Configurations/10-main.conf: VC-<target> cleanups.

I tested 419a3af vs. 78a50c7, using the following commands:

make clean
perl ./Configure linux-x86_64 no-idea no-camellia threads shared --strict-warnings --prefix=/home/msp/install/openssl
make -j 8
time util/shlib_wrap.sh openssl rand  -out /dev/null  1000000000

Performance test for 419a3af

msp@msppc:~/src/openssl$ time util/shlib_wrap.sh openssl rand  -out /dev/null  1000000000

real    0m24,795s
user    0m24,759s
sys     0m0,037s

Performance test for 78a50c7

msp@msppc:~/src/openssl$ time util/shlib_wrap.sh openssl rand  -out /dev/null  1000000000

real    0m24,797s
user    0m24,768s
sys     0m0,031s

Is hardware support enabled?

I checked grep aes /proc/cpuinfo, and also did a speed test, which suggests that hardware support is available.

msp@msppc:~/src/openssl$ util/shlib_wrap.sh apps/openssl speed -elapsed aes-128-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128 cbc for 3s on 16 size blocks: 26239530 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 64 size blocks: 7217622 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 256 size blocks: 1846145 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 1024 size blocks: 464811 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 8192 size blocks: 58329 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 16384 size blocks: 29176 aes-128 cbc's in 3.00s
OpenSSL 1.1.1-pre3-dev  xx XXX xxxx
built on: Sun Mar 11 22:26:48 2018 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr)
compiler: gcc -fPIC -pthread -m64 -DDEBUG_UNUSED -DPEDANTIC -pedantic -Wno-long-long -Wall -Wextra -Wno-unused-parameter -Wno-missing-field-initializers -Wswitch -Wsign-compare -Wmissing-prototypes -Wshadow -Wformat -Wtype-limits -Wundef -Werror -Wmisleading-indentation -Wall -O0 -g -DOPENSSL_USE_NODELETE -DL_ENDIAN -DDSO_DLFCN -DHAVE_DLFCN_H -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPADLOCK_ASM -DPOLY1305_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128 cbc     139944.16k   153975.94k   157537.71k   158655.49k   159277.06k   159339.86k
msp@msppc:~/src/openssl$ util/shlib_wrap.sh apps/openssl speed -elapsed -evp aes-128-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128-cbc for 3s on 16 size blocks: 97827193 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 33430983 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 9190109 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 2348750 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 295717 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 16384 size blocks: 147945 aes-128-cbc's in 3.00s
OpenSSL 1.1.1-pre3-dev  xx XXX xxxx
built on: Sun Mar 11 22:26:48 2018 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr)
compiler: gcc -fPIC -pthread -m64 -DDEBUG_UNUSED -DPEDANTIC -pedantic -Wno-long-long -Wall -Wextra -Wno-unused-parameter -Wno-missing-field-initializers -Wswitch -Wsign-compare -Wmissing-prototypes -Wshadow -Wformat -Wtype-limits -Wundef -Werror -Wmisleading-indentation -Wall -O0 -g -DOPENSSL_USE_NODELETE -DL_ENDIAN -DDSO_DLFCN -DHAVE_DLFCN_H -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPADLOCK_ASM -DPOLY1305_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128-cbc     521745.03k   713194.30k   784222.63k   801706.67k   807504.55k   807976.96k

@kroeckx
Copy link
Member Author

kroeckx commented Mar 11, 2018 via email

@mspncp
Copy link
Contributor

mspncp commented Mar 11, 2018

Thanks! I new there must have been some stupid mistake. But I didn't see that I forgot the relative path. Now the result looks much better!

real    0m2,561s
user    0m2,532s
sys     0m0,033s

@p-steuer
Copy link
Member

p-steuer commented Mar 12, 2018

Heres some perf data i collected on my thinkpad ...

master (b38fa98)

$ LD_LIBRARY_PATH=. time apps/openssl rand 1000000000 > /dev/null
10.70user 0.18system 0:10.89elapsed 99%CPU

$ LD_LIBRARY_PATH=. apps/openssl speed rand
[...]
options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -fPIC -pthread -m64  -Wa,--noexecstack -Wall -O3 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DDSO_DLFCN -DHAVE_DLFCN_H -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DNDEBUG
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
rand              4355.11k    15758.76k    42451.46k    76392.45k    97383.77k    98724.52k

master+this PR

$ LD_LIBRARY_PATH=. time apps/openssl rand 1000000000 > /dev/null
2.87user 0.17system 0:03.05elapsed 99%CPU

$ LD_LIBRARY_PATH=. apps/openssl speed rand
[...]
options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -fPIC -pthread -m64  -Wa,--noexecstack -Wall -O3 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DDSO_DLFCN -DHAVE_DLFCN_H -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DNDEBUG
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
rand             10363.86k    39082.40k   120344.92k   258768.90k   380887.04k   394062.51k

Copied from my original PR

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
rand             85196.83k   337101.99k  1086201.86k  2433860.27k  3830319.79k  4126512.47k

So for x86 this PR is a pretty good improvement.

But it should be evaluated on other architectures as well ..

For s390 it shows a 20% / 35% performance degeneration for longer / shorter buffers. This is because AES_encrypt already has hardware support on s390, so this PR only adds EVP overhead:

master:

  96.55%  openssl  libcrypto.so.1.1    [.] AES_encrypt
   1.69%  openssl  libcrypto.so.1.1    [.] drbg_ctr_generate
   0.35%  openssl  libcrypto.so.1.1    [.] ctr_update

master+this PR:

  69.69%  openssl  libcrypto.so.1.1    [.] AES_encrypt
  11.88%  openssl  libcrypto.so.1.1    [.] EVP_EncryptUpdate
   6.90%  openssl  libcrypto.so.1.1    [.] aes_ecb_cipher

The main reason this PR's performance gain (on x86) stays much behind the original PR's is that the generate operation is still a loop, encrypting one AES block at a time.

If this PR is not planned to be included in coming 1.1.1 release, i would suggest to take the time to evaluate the pre-inc vs post-inc thing and possibly go for a ctr-mode-based generate operation.

@p-steuer
Copy link
Member

It also had extra kat tests in that pull request by they fail both before and after my patch.

Commit a93ba40 breaks those kat tests. Didnt look into detail but its most likely due to DRBG instances reseeding automatically:

The kats check for correct 32-to-128-bit counter overflow during generate operation and thus need to perform a lot of "deterministic" generate ops i.e, w/o reseeding.

@kroeckx
Copy link
Member Author

kroeckx commented Mar 12, 2018

I can try that test again after disabling the reseeding.

@kroeckx
Copy link
Member Author

kroeckx commented Mar 12, 2018

We're moving the beta soon, but I guess this is not a feature and so could possible be merged after we've moved to beta, but I would like to have something before we release and this looks like the most simple way for now.

@kroeckx
Copy link
Member Author

kroeckx commented Mar 12, 2018

The extra kat test still fails after disabling reseeding

@p-steuer
Copy link
Member

p-steuer commented Mar 13, 2018

Didnt look into detail but its most likely due to DRBG instances reseeding automatically

My guess was wrong: The kats use callbacks returning all zero buffs to to provide a deterministic result.

The first commit breaking the kats is a93ba40 :
Before, a drbg instance's reseed_counter was initialized to 1. The patch renamed reseed_counter to generate_counter, which is now initialized to 0. I.e. automatic reseeding happens one generate operation later.

The second commit breaking the kats is 4917e91 :
Before, a drbg instance's reseed_interval was initialized to MAX_RESEED_INTERVAL (2^24) as default. This was changed to MASTER_RESEED_INTERVAL (2^8).

So the kat failure is not a bug, its just due to changes in the reseed behavior. The test cases could be adapted to reflect the new behavior, but since they only made sense in the ctr-mode based original implementation, they can just be ignored in the context of this PR.

@p-steuer
Copy link
Member

p-steuer commented Mar 14, 2018

@kroeckx

[...] but I would like to have something before we release and this looks like the most simple way for now.

Okay. Long-term i would welcome a CTR-mode based/optimized*** solution.

@mspncp :

There was a discussion with @p-steuer about pre-increment vs. post-increment in #4530 (comment). Did you have that in your mind? How was this problem addressed?

It was my first attempt to implement ctr_generate cleanly using EVP_aes_x_ctr. However, all other drbg functions turned out not to be a natural fit for the aes-ctr evp interfaces, best example is probably ctr_update. After all i found it cleaner to "export" the required functions from the e_aes.c module such that the subsequent changes in drbg_rand.c become trivial.

[...]

In general, using aes-ctr functions to implement ctr_drbg generate requires the drbg counter to be incremented after operations (instead of before operations, as in the nist standard's pseudo code). I put these changes in an extra patch.

Regarding the "pre-increment vs. post-increment" w.r.t. FIPS compliance discussion: The GCM specs (NIST SP 800-38D) feature a pre-inc pseudo code. However, libraries (including libcrypto) implement GCM using CTR-mode* based encryption, instead of using an "inc/ECB"-loop to match said pseudocode. My PR tried to do the same optimization for DRBG.

Regarding the "EVP-interface must be used" discussion: I dont agree that libcrypto submodules should be limited** by only being allowed to use exported interfaces.

*(which is usually implemented as post-inc, i.e. cipher's context structure keeps the counter updated for next operation)
**See discussion in #4530 for why i feel forcing EVP might not be the right choice in this case.
edit:
***Note that its not only a question of enabling hw support for "plain" aes algorithm but also of enabling dedicated aes-ctr hw support.

@mattcaswell mattcaswell added this to the 1.1.1 milestone Mar 20, 2018
@kroeckx
Copy link
Member Author

kroeckx commented Mar 21, 2018

Is there something that still needs to happen? Or can someone review this?

Copy link
Contributor

@richsalz richsalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this gets back some big performance loss.

@kroeckx kroeckx closed this Mar 21, 2018
levitte pushed a commit that referenced this pull request Mar 21, 2018
levitte pushed a commit that referenced this pull request Mar 21, 2018
Signed-off-by: Patrick Steuer <[email protected]>
Reviewed-by: Kurt Roeckx <[email protected]>
Reviewed-by: Rich Salz <[email protected]>
GH: #5580
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants