rsa/rsa_ossl.c: implement variant of "Smooth CRT-RSA."#6915
rsa/rsa_ossl.c: implement variant of "Smooth CRT-RSA."#6915dot-asm wants to merge 5 commits intoopenssl:masterfrom
Conversation
crypto/rsa/rsa_ossl.c
Outdated
There was a problem hiding this comment.
Parenthesis around & operation ?
|
Red cross from travis was unrelated, it was failure to pull a package. |
crypto/rsa/rsa_ossl.c
Outdated
There was a problem hiding this comment.
Just in case question is popped. Outputs from BN_mod_exp_mont_consttime are not fixed-length, but they go straight to length-invariant bn_mod_sub_fixed_top, so that [the] only variation would be bn_correct_top [itself] at the ends of BN_mod_exp_mont_consttime, Variation is very unlike and it's not unreasonable to assume that even them then it won't be measurable in the context.
There was a problem hiding this comment.
It occurs to me that, unlike the issue below with r0, this bn_correct_top is not obviously saved by blinding because the distribution involves p and q separately.
I would have considered the bn_correct_top one to clear out, but I also favor a more complete non-minimal BIGNUM story, which would allow BN_mod_exp_mont_consttime to just return non-minimal values. How much work to put into the interim fix, I don't have strong views on.
There was a problem hiding this comment.
I don't quite follow. (It's late here:-) You mean to introduce bn_mod_exp_fixed_length? It's not very much trouble...
crypto/rsa/rsa_ossl.c
Outdated
There was a problem hiding this comment.
Is this not just an equality check at this point? This code is checking that vrfy and I are equal mod rsa->n, but uses BN_mod_sub_quick which, as a precondition, requires that vrfy and I are fully reduced. That means that, if they were equal mod rsa->n, they're simply equal.
(Our version of this logic just does an equality check.)
There was a problem hiding this comment.
Commentary above suggests that I being larger than rsa->n is supposed to fly. As for question whether or not BN_mod_sub_quick does the job. Note that goal is not to produce real result modulo rsa->n, but to compare if verfy equals to I or I-rsa->n. Everything else, be it positive or negative, means that verification failed. I can add commentary...
There was a problem hiding this comment.
Hrm. If we allow I being larger than rsa->n, then the preconditions for the reduction in the smooth path don't hold and we have worse problems. It seems you'd want to fix up I earlier.
The callers of rsa_mod_exp within OpenSSL already check the input is fully-reduced (RSA_R_DATA_TOO_LARGE_FOR_MODULUS), so it is impossible there. But I suppose if I claim that RSA_meth_get_mod_exp is a concern above, I must also claim RSA_meth_get_mod_exp is a concern here. (Though it is slightly different. Saying you cannot use the result of RSA_meth_get_mod_exp at all is different from saying a mod-exp function had a previously unenforced requirement of reduced inputs.)
How much the project choses to care about RSA_meth_get_mod_exp I don't have opinions on. Personally I think it is a ridiculous function and should never have been included. :-)
crypto/bn/bn_mod.c
Outdated
There was a problem hiding this comment.
Nit: Perhaps s/width/bit width/, so be clear you're not talking about word width.
crypto/rsa/rsa_ossl.c
Outdated
There was a problem hiding this comment.
This is probably worth elaborating on in the comment since it's not obvious. While r1 is fully reduced mod p, m1 is only fully reduced mod q, which may be larger than p. This doesn't fit in the "obvious" modular subtraction precondition, but bn_mod_sub_fixed_top does explicitly allows for this.
(We instead canonicalized things so that p > q, which is differently complicated; we end up storing extra copies of several fields to deal with this and other input width issues. But we also removed RSA_FLAG_CACHE_PRIVATE and always cache stuff, so that option was easier for us. But it did allow a minor optimization: cache iqmp in Montgomery form to save a Montgomery reduction. I should note that the more general bn_mod_sub_fixed_top is then less efficient for the case when both are guaranteed to be reduced. Notably, the EC code needs a lot of modular subtractions, when it comes time to fix its timing issues.)
There was a problem hiding this comment.
This doesn't fit in the "obvious" modular subtraction precondition, but bn_mod_sub_fixed_top does explicitly allows for this.
Just in case, I did verify q>p to work. I'll add commentary [tomorrow]...
crypto/rsa/rsa_ossl.c
Outdated
There was a problem hiding this comment.
r0 here is fixed-top and yet leaks out of the public API, by way of RSA_meth_get_mod_exp or 1.0.x's public structs. Even separate from the public API, it leaks into BN_sub via the unblinded RSA_X931_PADDING path of rsa_ossl_private_encrypt. It leaks into BN_bn2binpad when unblinded which does not actually tolerate non-minimal things yet. (It calls BN_num_bytes and the like.) It also leaks into BN_BLINDING_invert_ex which currently uses plain BN_mod_mul (though we want Montgomery reduction). I think that tolerates it because BN_mul and BN_sqr calls bn_correct_top (though we don't want a bn_correct_top in there), but that's a tenuous set of requirements...
This is unfortunate since it is indeed one of the places where we don't want to force minimal-width BIGNUMs, but that may require a more complete solution to #6640.
There was a problem hiding this comment.
This is unfortunate since it is indeed one of the places where we don't want to force minimal-width BIGNUMs, but that may require a more complete solution to #6640.
(I guess, when blinding is on, we're vaguely okay with forcing minimal-width here. It's the blinding removal that's sensitive. Anyway, point still stands that we probably need to force minimal-width for now, but we don't want to long-term.)
There was a problem hiding this comment.
Looking into blinding is the tomorrow's exercise...
crypto/rsa/rsa_ossl.c
Outdated
There was a problem hiding this comment.
As with the similar reduction below, rely on the looser bounds for Montgomery reduction, that m1 is bounded by R*q, not q^2. m1 is bounded by p*q, so for the smaller prime, this only works because the two primes are the same size. Might be worth a comment to that effect.
crypto/rsa/rsa_ossl.c
Outdated
There was a problem hiding this comment.
This means that if the smooth codepath runs on a key that RSA_FLAG_CACHE_PRIVATE was cleared, we change behavior and the RSA now remembers bits of itself, which I believe is user-visible if the caller changes the key after use. Perhaps smooth needs to require RSA_FLAG_CACHE_PRIVATE, or perhaps the non-RSA_FLAG_CACHE_PRIVATE case needs to create local BN_MONT_CTX and pass that in.
This complexity, incidentally, is a big part of why I filed #5158.
(We took away RSA_FLAG_CACHE_PRIVATE and always act as if it is on.)
There was a problem hiding this comment.
Yeah, I was anticipating this question... I mean I know that it's kind of an edge and still am thinking about it...
crypto/bn/bn_mod.c
Outdated
There was a problem hiding this comment.
Mostly checking this out loud, though I wouldn't object if you put it in a comment:
a is reduced and b is the same bit width, so we know:
0 <= a < m
0 <= b < 2^w <= 2*m
Thus the first subtraction gives:
2*m < r = a - b < m - 0
Thus two conditional additions does the job.
94d2cdb to
ea49309
Compare
|
Ready. |
|
On general note. There are few bn_correct_top-s on the path. Generally speaking probability that top would actually be corrected is rather low. One has to keep in mind that attacks are customarily about gathering statistics. And since we are operating on blinded values that periodically change, question would be if attack can be actually mounted fulfill in that window. There is sensitive point outside blinded operation, and that is unblinding in private_decrypt when it deduces plaintext. Variations are bn_correct_top itself, and then possible variation in BN_bn2pinpad. It would take ability to track access pattern with limb granularity to capture relevant variations. Well, that's what cachebleed was doing (on specific processors), but in that case the window of opportunity was much larger, attacker had hundreds and hundreds of cycles to catch it the act. |
|
Another open question is what happens if |I| argument to rsa_ossl_mod_exp is not fully reduced. In normal case, even more specifically when called from libssl it just can't happen, because value will be rejected by [rsa_ossl_mod_exp's] caller. Otherwise it would take specially crafted program. Should we care? I mean about specially crafter program failing and resorting to I^d? On side note verify pass could use a cleanup, some time later... |
crypto/rsa/rsa_ossl.c
Outdated
There was a problem hiding this comment.
Wow! This costs 14% in performance. My thought that it would if(BN_FLG_CONSTTIME-here-n-there) would drive it to BN_mod_exp_mont_consttime, but apparently not. Calling BN_mod_exp_mont (which can handle fixed-top)...
|
BN_bn2pinpad conceals memory access pattern for corrected tops, so that only possible variations are bn_correct_top themselves. Unlike as they all are, there is only one actually sensitive, at exit from unblinding. |
|
And the verdict is... |
|
Red cross seems to be unrelated. It's not clear to me what's going on, 1st red cross at https://travis-ci.org/openssl/openssl/branches is for commit that I don't have in this request... |
|
Ping. |
|
P-p-p-i-n-n-n-g-g-g-g-g... |
|
P-i-i-i-i-i-i-i-i-i-i-n-g then? |
Add bn_{mul|sqr}_fixed_top, bn_from_mont_fixed_top, bn_mod_sub_fixed_top.
Switch to bn_{mul|sqr}_fixed_top in bn_mul_mont_fixed_top and remove
memset in bn_from_montgomery_word.
In [most common] case of p and q being of same width, it's possible to replace CRT modulo operations with Montgomery reductions. And those are even fixed-length Montgomery reductions...
[extended tests]
| if (!BN_copy(r, b->Ai)) | ||
| ret = 0; | ||
| } | ||
| if (r != NULL && (BN_copy(r, b->Ai) == NULL)) |
There was a problem hiding this comment.
Are the brackets around BN_copy() == NULL necessary?
There was a problem hiding this comment.
No, but in my mind it's more readable. I mean when one part of expression has parenthesis and something "complex" in parenthesis by itself. For example I [for one] wouldn't bother with extra parenthesis around func() == NULL, or even func(arg) == NULL. But in this case two arguments with one being member reference is "complex" enough to grant extra parenthesis...
crypto/bn/bn_mod.c
Outdated
| * | ||
| * -2*m < r = a - b < m | ||
| * | ||
| * Thus is takes up to two conditional additions to make |r| positive. |
There was a problem hiding this comment.
Fixed. Since it's commentary-only modification I consider that approval still stands [unless someone claims otherwise].
Add bn_{mul|sqr}_fixed_top, bn_from_mont_fixed_top, bn_mod_sub_fixed_top.
Switch to bn_{mul|sqr}_fixed_top in bn_mul_mont_fixed_top and remove
memset in bn_from_montgomery_word.
Reviewed-by: Paul Dale <[email protected]>
(Merged from openssl#6915)
In [most common] case of p and q being of same width, it's possible to replace CRT modulo operations with Montgomery reductions. And those are even fixed-length Montgomery reductions... Reviewed-by: Paul Dale <[email protected]> (Merged from openssl#6915)
Reviewed-by: Paul Dale <[email protected]> (Merged from openssl#6915)
Reviewed-by: Paul Dale <[email protected]> (Merged from openssl#6915)
In [most common] case of p and q being of same width, it's possible to
replace CRT modulo operations with Montgomery reductions. And those are
even fixed-length Montgomery reductions...
1.0.2 and 1.1.0 labels are speculative in sense that they indicated intention to back-port this [as it won't cherry-pick]...