Skip to content

rsa/rsa_ossl.c: implement variant of "Smooth CRT-RSA."#6915

Closed
dot-asm wants to merge 5 commits intoopenssl:masterfrom
dot-asm:smooth_rsa
Closed

rsa/rsa_ossl.c: implement variant of "Smooth CRT-RSA."#6915
dot-asm wants to merge 5 commits intoopenssl:masterfrom
dot-asm:smooth_rsa

Conversation

@dot-asm
Copy link
Contributor

@dot-asm dot-asm commented Aug 10, 2018

In [most common] case of p and q being of same width, it's possible to
replace CRT modulo operations with Montgomery reductions. And those are
even fixed-length Montgomery reductions...

1.0.2 and 1.1.0 labels are speculative in sense that they indicated intention to back-port this [as it won't cherry-pick]...

@dot-asm dot-asm added branch: master Applies to master branch branch: 1.0.2 Applies to OpenSSL_1_0_2-stable branch (EOL) 1.1.0 branch: 1.1.1 Applies to OpenSSL_1_1_1-stable branch (EOL) labels Aug 10, 2018
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parenthesis around & operation ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@dot-asm
Copy link
Contributor Author

dot-asm commented Aug 10, 2018

Red cross from travis was unrelated, it was failure to pull a package.

Copy link
Contributor Author

@dot-asm dot-asm Aug 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case question is popped. Outputs from BN_mod_exp_mont_consttime are not fixed-length, but they go straight to length-invariant bn_mod_sub_fixed_top, so that [the] only variation would be bn_correct_top [itself] at the ends of BN_mod_exp_mont_consttime, Variation is very unlike and it's not unreasonable to assume that even them then it won't be measurable in the context.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It occurs to me that, unlike the issue below with r0, this bn_correct_top is not obviously saved by blinding because the distribution involves p and q separately.

I would have considered the bn_correct_top one to clear out, but I also favor a more complete non-minimal BIGNUM story, which would allow BN_mod_exp_mont_consttime to just return non-minimal values. How much work to put into the interim fix, I don't have strong views on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite follow. (It's late here:-) You mean to introduce bn_mod_exp_fixed_length? It's not very much trouble...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not just an equality check at this point? This code is checking that vrfy and I are equal mod rsa->n, but uses BN_mod_sub_quick which, as a precondition, requires that vrfy and I are fully reduced. That means that, if they were equal mod rsa->n, they're simply equal.

(Our version of this logic just does an equality check.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commentary above suggests that I being larger than rsa->n is supposed to fly. As for question whether or not BN_mod_sub_quick does the job. Note that goal is not to produce real result modulo rsa->n, but to compare if verfy equals to I or I-rsa->n. Everything else, be it positive or negative, means that verification failed. I can add commentary...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm. If we allow I being larger than rsa->n, then the preconditions for the reduction in the smooth path don't hold and we have worse problems. It seems you'd want to fix up I earlier.

The callers of rsa_mod_exp within OpenSSL already check the input is fully-reduced (RSA_R_DATA_TOO_LARGE_FOR_MODULUS), so it is impossible there. But I suppose if I claim that RSA_meth_get_mod_exp is a concern above, I must also claim RSA_meth_get_mod_exp is a concern here. (Though it is slightly different. Saying you cannot use the result of RSA_meth_get_mod_exp at all is different from saying a mod-exp function had a previously unenforced requirement of reduced inputs.)

How much the project choses to care about RSA_meth_get_mod_exp I don't have opinions on. Personally I think it is a ridiculous function and should never have been included. :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Perhaps s/width/bit width/, so be clear you're not talking about word width.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably worth elaborating on in the comment since it's not obvious. While r1 is fully reduced mod p, m1 is only fully reduced mod q, which may be larger than p. This doesn't fit in the "obvious" modular subtraction precondition, but bn_mod_sub_fixed_top does explicitly allows for this.

(We instead canonicalized things so that p > q, which is differently complicated; we end up storing extra copies of several fields to deal with this and other input width issues. But we also removed RSA_FLAG_CACHE_PRIVATE and always cache stuff, so that option was easier for us. But it did allow a minor optimization: cache iqmp in Montgomery form to save a Montgomery reduction. I should note that the more general bn_mod_sub_fixed_top is then less efficient for the case when both are guaranteed to be reduced. Notably, the EC code needs a lot of modular subtractions, when it comes time to fix its timing issues.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't fit in the "obvious" modular subtraction precondition, but bn_mod_sub_fixed_top does explicitly allows for this.

Just in case, I did verify q>p to work. I'll add commentary [tomorrow]...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r0 here is fixed-top and yet leaks out of the public API, by way of RSA_meth_get_mod_exp or 1.0.x's public structs. Even separate from the public API, it leaks into BN_sub via the unblinded RSA_X931_PADDING path of rsa_ossl_private_encrypt. It leaks into BN_bn2binpad when unblinded which does not actually tolerate non-minimal things yet. (It calls BN_num_bytes and the like.) It also leaks into BN_BLINDING_invert_ex which currently uses plain BN_mod_mul (though we want Montgomery reduction). I think that tolerates it because BN_mul and BN_sqr calls bn_correct_top (though we don't want a bn_correct_top in there), but that's a tenuous set of requirements...

This is unfortunate since it is indeed one of the places where we don't want to force minimal-width BIGNUMs, but that may require a more complete solution to #6640.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unfortunate since it is indeed one of the places where we don't want to force minimal-width BIGNUMs, but that may require a more complete solution to #6640.

(I guess, when blinding is on, we're vaguely okay with forcing minimal-width here. It's the blinding removal that's sensitive. Anyway, point still stands that we probably need to force minimal-width for now, but we don't want to long-term.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking into blinding is the tomorrow's exercise...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with the similar reduction below, rely on the looser bounds for Montgomery reduction, that m1 is bounded by R*q, not q^2. m1 is bounded by p*q, so for the smaller prime, this only works because the two primes are the same size. Might be worth a comment to that effect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that if the smooth codepath runs on a key that RSA_FLAG_CACHE_PRIVATE was cleared, we change behavior and the RSA now remembers bits of itself, which I believe is user-visible if the caller changes the key after use. Perhaps smooth needs to require RSA_FLAG_CACHE_PRIVATE, or perhaps the non-RSA_FLAG_CACHE_PRIVATE case needs to create local BN_MONT_CTX and pass that in.

This complexity, incidentally, is a big part of why I filed #5158.

(We took away RSA_FLAG_CACHE_PRIVATE and always act as if it is on.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was anticipating this question... I mean I know that it's kind of an edge and still am thinking about it...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly checking this out loud, though I wouldn't object if you put it in a comment:

a is reduced and b is the same bit width, so we know:
0 <= a < m
0 <= b < 2^w <= 2*m

Thus the first subtraction gives:
2*m < r = a - b < m - 0

Thus two conditional additions does the job.

@dot-asm dot-asm force-pushed the smooth_rsa branch 4 times, most recently from 94d2cdb to ea49309 Compare August 13, 2018 15:25
@dot-asm
Copy link
Contributor Author

dot-asm commented Aug 13, 2018

Ready.

@dot-asm
Copy link
Contributor Author

dot-asm commented Aug 13, 2018

On general note. There are few bn_correct_top-s on the path. Generally speaking probability that top would actually be corrected is rather low. One has to keep in mind that attacks are customarily about gathering statistics. And since we are operating on blinded values that periodically change, question would be if attack can be actually mounted fulfill in that window. There is sensitive point outside blinded operation, and that is unblinding in private_decrypt when it deduces plaintext. Variations are bn_correct_top itself, and then possible variation in BN_bn2pinpad. It would take ability to track access pattern with limb granularity to capture relevant variations. Well, that's what cachebleed was doing (on specific processors), but in that case the window of opportunity was much larger, attacker had hundreds and hundreds of cycles to catch it the act.

@dot-asm
Copy link
Contributor Author

dot-asm commented Aug 13, 2018

Another open question is what happens if |I| argument to rsa_ossl_mod_exp is not fully reduced. In normal case, even more specifically when called from libssl it just can't happen, because value will be rejected by [rsa_ossl_mod_exp's] caller. Otherwise it would take specially crafted program. Should we care? I mean about specially crafter program failing and resorting to I^d? On side note verify pass could use a cleanup, some time later...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow! This costs 14% in performance. My thought that it would if(BN_FLG_CONSTTIME-here-n-there) would drive it to BN_mod_exp_mont_consttime, but apparently not. Calling BN_mod_exp_mont (which can handle fixed-top)...

@dot-asm
Copy link
Contributor Author

dot-asm commented Aug 15, 2018

BN_bn2pinpad conceals memory access pattern for corrected tops, so that only possible variations are bn_correct_top themselves. Unlike as they all are, there is only one actually sensitive, at exit from unblinding.

@dot-asm
Copy link
Contributor Author

dot-asm commented Aug 16, 2018

And the verdict is...

@dot-asm
Copy link
Contributor Author

dot-asm commented Aug 16, 2018

Red cross seems to be unrelated. It's not clear to me what's going on, 1st red cross at https://travis-ci.org/openssl/openssl/branches is for commit that I don't have in this request...

@dot-asm
Copy link
Contributor Author

dot-asm commented Aug 17, 2018

Ping.

@dot-asm
Copy link
Contributor Author

dot-asm commented Aug 20, 2018

P-p-p-i-n-n-n-g-g-g-g-g...

@mattcaswell mattcaswell added this to the Assessed milestone Aug 20, 2018
@dot-asm
Copy link
Contributor Author

dot-asm commented Aug 22, 2018

P-i-i-i-i-i-i-i-i-i-i-n-g then?

Andy Polyakov added 4 commits August 22, 2018 21:57
Add bn_{mul|sqr}_fixed_top, bn_from_mont_fixed_top, bn_mod_sub_fixed_top.
Switch to bn_{mul|sqr}_fixed_top in bn_mul_mont_fixed_top and remove
memset in bn_from_montgomery_word.
In [most common] case of p and q being of same width, it's possible to
replace CRT modulo operations with Montgomery reductions. And those are
even fixed-length Montgomery reductions...
Copy link
Contributor

@paulidale paulidale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two trivial nits.

if (!BN_copy(r, b->Ai))
ret = 0;
}
if (r != NULL && (BN_copy(r, b->Ai) == NULL))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the brackets around BN_copy() == NULL necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but in my mind it's more readable. I mean when one part of expression has parenthesis and something "complex" in parenthesis by itself. For example I [for one] wouldn't bother with extra parenthesis around func() == NULL, or even func(arg) == NULL. But in this case two arguments with one being member reference is "complex" enough to grant extra parenthesis...

*
* -2*m < r = a - b < m
*
* Thus is takes up to two conditional additions to make |r| positive.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/is/it/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Since it's commentary-only modification I consider that approval still stands [unless someone claims otherwise].

@t-j-h t-j-h added the approval: done This pull request has the required number of approvals label Aug 23, 2018
dot-asm pushed a commit to dot-asm/openssl that referenced this pull request Aug 23, 2018
Add bn_{mul|sqr}_fixed_top, bn_from_mont_fixed_top, bn_mod_sub_fixed_top.
Switch to bn_{mul|sqr}_fixed_top in bn_mul_mont_fixed_top and remove
memset in bn_from_montgomery_word.

Reviewed-by: Paul Dale <[email protected]>
(Merged from openssl#6915)
dot-asm pushed a commit to dot-asm/openssl that referenced this pull request Aug 23, 2018
In [most common] case of p and q being of same width, it's possible to
replace CRT modulo operations with Montgomery reductions. And those are
even fixed-length Montgomery reductions...

Reviewed-by: Paul Dale <[email protected]>
(Merged from openssl#6915)
dot-asm pushed a commit to dot-asm/openssl that referenced this pull request Aug 23, 2018
dot-asm pushed a commit to dot-asm/openssl that referenced this pull request Aug 23, 2018
@dot-asm dot-asm closed this Aug 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approval: done This pull request has the required number of approvals branch: master Applies to master branch branch: 1.0.2 Applies to OpenSSL_1_0_2-stable branch (EOL) branch: 1.1.1 Applies to OpenSSL_1_1_1-stable branch (EOL)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants