PEM read-side refactor by kaduk · Pull Request #1700 · openssl/openssl

kaduk · 2016-10-12T16:19:58Z

WIP partially since we still have to make up our mind how strict we want the parser to be (and with flags, we even can offer multiple options).

See also RT4698

tlhackque · 2016-10-13T18:06:33Z

My comments on RT4698 still apply - I don't quite understand the state change to "rejected but issue tracked on (this) PR".

Whitespace (anywhere) and odd line lengths are common UI issues. Non-base64 chars have pluses and minuses. For the former, consider an e-mail that gets quoted with '>' when forwarded. The latter are described in the RFC.

As I mentioned in the RT, the OpenXPKI project has encountered this issue; see openxpki/openxpki/issues/437.

I maintain the Perl CPAN module Crypt::PKCS10, where I parse and output CSRs and public keys.

When parsing, I remove all whitespace, optionally remove non-base64 characters, and treat the result as one long string. That's a lot simpler than insisting on 64 character lines (except the last). And it accepts all valid input. Of course, that's Perl. But in C code it's easy to do the same thing by processing one character at a time.

Errors are best caught when parsing the asn.1 (and validating signatures), not by enforcing a strict punch-card era text format. I recommend this.

For output, of course, I produce strictly compliant PEM.

In other applications, it's a bit more involved to detect the encryption header - but once past that, I use the same approach.

Bottom line, I encourage the Unix "accept liberal, produce conservative". I'd settle for an option, but I don't think it's necessary - and OpenSSL has enough of those as is.

richsalz · 2016-10-13T18:16:24Z

First off, I never said there weren’t interop concerns. I am sorry if my quibbles/jokes over MUST sent the conversation down the wrong chute.

The ticket is closed because we’re moving to GitHub. Please raise your concerns here, or perhaps Ben has addressed them.

tlhackque · 2016-10-13T18:39:38Z

I understood the spirit of your comments, no problem. I literally didn't understand why 'rejected' was the chosen state for the RT - but that's why I commented here. 'rejected' is usually a strong dismissive statement from a maintainer.

The original purpose of this PR was addressing refactoring and choice of memory pools, not the issue I raised. I'm certainly OK with merging them for convenience; I'm grateful that Ben volunteered.

At first glance, Ben has re-worked the existing code, but not liberalized the parsing. His comment at the top of this thread says "WIP", "we still need to decide..."

The code seems to have a lot of legacy and complexity; I haven't done a detailed review but applaud the courage in touching it. If you take my advice, it could well get simpler.

I think my concerns are adequately listed here (incorporating by reference the RT thread).

I'll watch this thread and am willing to participate if that's useful.

richsalz · 2016-10-13T18:41:23Z

I suppose I will use "resolved" in closing the others. Thanks!

kaduk · 2016-10-13T18:43:54Z

It is WIP largely because we need to decide what we want the behavior to be.
Having flags gives us options; there could be a flag for bug-for-bug compatibility, for ignoring all whitespace, all non-b64 characters, for enforcing line lengths, etc.
I would like some feedback from the team about which features are desired and what the defaults should be.
Unfortunately, I injured my hand yesterday and cannot type out the full list of open questions as I see it.

tlhackque · 2016-10-13T19:19:44Z

Yes, flags provide options - but also complexity for users, maintenance and testing.

I settled on one option - whether to ignore non-b64. While I don't think using them as a covert channel is a serious risk in OpenSSL data, it's a simple accommodation and sometimes catches corruption. e.g. '=' in the middle of a string can happen when someone copies quoted printable...

Whitespace is harmless. So are random line lengths.

I don't have a case for bug-for-bug compatibility here. Currently, valid data aborts processing.

I think I saw that OpenSSL will sometimes accept PEM data without the END label. If someone relies on that, I'd have little sympathy. That's the only case that comes to mind of OpenSSL's PEM processing being too liberal.

One could easily implement a separate tool for 'is this strict PEM?'. Anyone who might want to validate that they're producing strict PEM would be better served by such a tool than by OpenSSL's idiosyncrasies... And I think that reducing the options in OpenSSL code would be a good thing.

Of course once you accept liberal input, openssl x -in foo -out bar will produce strictly compliant PEM.

Hope your injury heals soon.

richsalz · 2016-10-13T19:23:13Z

There is code in openssl that depends on some of the current behavior. I forget details. See the issue/PR from AGL that code a beer-mug in it.

FdaSilvaYY · 2016-10-13T19:34:50Z

It is #1209 "Rewrite base64 code."

tlhackque · 2016-10-13T20:08:35Z

Cute, but I know of no RFC that considers '-' as a comment character. That's the 'covert channel' argument from the RFC. Note that Google reports 'no ill effects' from its removal.

Most of the image could be retained with "ignore Base64" - only 7 distinct characters in the image are valid Base64. oOCQCG0 There are plenty of other graphics with which to replace them.

Why would asn1parse care about the PEM? It should get binary the same way that all PEM consumers do, and parse the binary. If it doesn't, it should - if for no other reason than that as a debug parser, it should see the same bits as everything else.

richsalz · 2016-10-13T20:25:18Z

Should doesn't always equal the way things work, and considerations of the deployed based.

tlhackque · 2016-10-13T20:44:25Z

By 'should', I meant that if necessary, simplifying asn1parse to use the new input routines becomes part of this work - or maybe it's a separate bug against asn1parse.

A diagnostic tool can be evil if it sees different bits from the real consumer.

Is the '-' comment line behavior documented? Is that really what allows the image - it looks as though it may be relying on the 'if the line is not 64 characters long, the PEM is over' logic?

Anyhow, my 3 cents is that if someone relies on this, it's not a hardship to deal with it externally
e.g. sed -e'/^-/d' file.pem | openssl foo

Compatibility is important - but it's always a trade-off.

I've said enough. I'll yield to your team's judgement - my primary concern is that OpenSSL accept RFC-compliant input.

richsalz · 2016-11-13T03:16:14Z

what's going on with this now?

kaduk · 2016-11-15T05:59:46Z

I timed out on waiting for feedback about what flags the dev team was interested in seeing, and was going to default to implementing what @tlhackque wanted and the secure memory flag and remove the WIP label, but then started blocking on time to actually do that work.
It seems somewhat unlikely that time to do that work will appear this week while I'm at the IETF.

richsalz · 2016-11-15T06:49:35Z

Might be worth trying to start discussion on the tls-dev list and perhaps close this.

levitte · 2016-12-06T20:55:06Z

crypto/pem/pem_lib.c

What about tabs? Maybe use an appropriate ctype function... isspace() or isblank()?

This part is for "bug-for-bug" compatibility with the historical code, and is in fact part of what inspired me to do the wholesale refactor.

Note also that <= ' ' in ASCII includes all control characters: NUL (000) thru US (0x1F). So it does include tab [HT]. (and FF, and VT and CR and LF ...) In EBCDIC, it also happens to do more or less the same thing, though it's a toss-up as to whether that was in the mind of the original author...

ctype would be the right thing for new (sane) code that wants to detect whitespace.

kaduk · 2016-12-07T16:52:48Z

Rebased to lateset master, with some updates: renamed flag from "weak eol" to "eay compatible", since that's what it's intended to be, and added a b64-only (i.e., "strict") option, with the default behavior being to treat control characters as whitespace (i.e., ignored) ... which means that ASCII non-base64 characters get passed through to EVP_Decode, for now.
This is all up for discussion, of course, but I removed the WIP label since I am actually proposing this change.

kaduk

A few questions for reviewers noted inline.

kaduk · 2016-12-07T16:54:27Z

crypto/pem/pem_lib.c

This TODO should be resolved before this gets merged ... thoughts?

You might want to have a look in crypto/evp/encode.c...

I wasn't super keen on exposing that stuff to the whole library and leaving it in encode.c, nor was I super enthusiastic about splitting it out into a more generic place. But, if that's how it has to be, I can do it...

Not sure why you would want a table; in all codesets those ranges are contiguous. The issues come when you assume the relative placements of upper vs lower case, or digits vs letters, or where other graphic (e.g. + and /) appear relative to other characters.

Not that this is particularly performance sensitive, but a table (256 bytes? bigger?) would require memory and data cache; the range tests are compact (minimum instruction and no data cache impact) and cover the cases. I don't think a table adds clarity to the code. The ranges capture it. This isn't crypto, so no reason to care about constant time. So, I'd stick with what you have.

You might return -1 for invalid, and the encoded value for OK (e.g. 0 for 'A' & 26 for 'a'); that would let the caller do something like if( (nibble = isb64(in)) < 0 ) return error (or ignore); val |= nibble << state; It's easy to do: if( c >= 'a' && c <= 'z' ) return (c - 'a') + 26; etc.

You might want a state machine somewhere.; that might work as a (small) table. (I haven't reviewed the whole commit, and don't plan to. But noticed the questions.)

The ranges look right to me, assuming you handle pad ('=') elsewhere. Ref: rfc4648, which has a table.

My inclination for default is to be lax; interpret B64 chars, ignore everything else. Pad only at the end, and only the permissible cases (0,1,2). There's a case for exception if non-B64 other than whitespace (which is sp,ht,lf,cr,ff, & often vt; sometimes NUL) or proper padding. The EAY line length-sensitive weirdness is at best a desperate fallback for compatibility; should not be a default. (Even though it is old behavior, it causes reasonable input to be rejected.)

I'm assuming that the apps / consumers will be updated as needed to resolve the external manifestations. Hopefully a consistent option: -pemfmt [obsolete|strict] with default = lax ?

Hope this helps. Thanks for working on it.

kaduk · 2016-12-07T16:55:11Z

crypto/pem/pem_lib.c

Is this a reasonable error reason?

kaduk · 2016-12-07T16:55:33Z

crypto/pem/pem_lib.c

Input requested on which BIO_get routine is better to use here.

Both do more or less the same thing, the crucial thing is how you want the data.

levitte · 2017-02-27T06:39:12Z

I've just had another look at this work (prompted by @kaduk's comment in #2746)... and I wonder, are there thoughts of having the resulting EVP_PKEY et al from PEM_read_bio_PrivateKey and friends in secure memory as well?

kaduk · 2017-02-27T17:01:43Z

are there thoughts of having the resulting EVP_PKEY et al from PEM_read_bio_PrivateKey and friends in secure memory as well?

Probably makes sense as a later addition ... there are several codepaths that could be dispatched to by that function and it's not clear that it would need to be done at the same time. Thanks for pointing it out!

levitte · 2017-02-27T23:28:55Z

Later addition is fine, I was only asking about the intention going forward.

kaduk · 2017-04-03T21:54:53Z

Rebased and fixed a couple needless behavior changes exposed by 04-test_pem.t.

levitte · 2017-04-12T19:19:29Z

crypto/pem/pem_lib.c

Internal functions should have lower case prefix

levitte · 2017-04-12T19:27:11Z

crypto/pem/pem_lib.c

This is repeated in all condition branches, so I's suggest replacing these two lines with len++;...

levitte · 2017-04-12T19:27:43Z

crypto/pem/pem_lib.c

These four lines with len = i;...

levitte · 2017-04-12T19:28:02Z

crypto/pem/pem_lib.c

These four lines with len = i;...

levitte · 2017-04-12T19:29:20Z

crypto/pem/pem_lib.c

And finally insert these two lines here:

linebuf[len++] = '\n'; linebuf[len] = '\0';

levitte · 2017-04-12T19:32:36Z

crypto/pem/pem_lib.c

Our C block comments should look like this:

/* * text * text */

levitte · 2017-04-12T19:33:22Z

crypto/pem/pem_lib.c

We'd like our block comments to look like this:

/* * text * text */

levitte · 2017-04-12T19:49:43Z

include/openssl/pem.h

Move this one down with PEM_bytes_read_bio

levitte · 2017-04-12T19:53:17Z

Apart from my remarks of the changes, you need to fix the documentation further, as noted by find-doc-nits:

(cd .; /usr/bin/perl util/find-doc-nits -n ) >doc-nits
if [ -s doc-nits ] ; then cat doc-nits; rm doc-nits ; exit 1; fi
doc/man3/PEM_bytes_read_bio.pod:1: missing copyright
doc/man3/PEM_read_bio_flags.pod:1: PEM_FLAG_SECURE missing from NAME section
doc/man3/PEM_read_bio_flags.pod:1: PEM_FLAG_EAY_COMPATIBLE missing from NAME section
doc/man3/PEM_read_bio_flags.pod:1: PEM_FLAG_ONLY_B64 missing from NAME section
doc/man3/PEM_read_bio_flags.pod:1: missing copyright
doc/man3/PEM_read_bio_flags.pod:1: period in NAME section
make: *** [doc-nits] Error 1

Also, you need to rebase on a fresher master

kaduk · 2017-04-12T22:03:23Z

Looking at the commits again caused me to feel uneasy about the lack of test coverage for PEM_FLAG_ONLY_B64, which turns out to have been justified.
As it stands, that flag is not usable with encrypted files, though it would take some thinking to determine if ther's a way to fix that that does not involve a second pass over things.

levitte · 2017-04-12T22:31:28Z

Looking at the commits again caused me to feel uneasy about the lack of test coverage for PEM_FLAG_ONLY_B64, which turns out to have been justified.
As it stands, that flag is not usable with encrypted files, though it would take some thinking to determine if ther's a way to fix that that does not involve a second pass over things.

Depends what you mean with "encrypted". If "encrypted by PEM headers", then yes, you're right, but if it's encrypted in a PKCS#8 container, then PEM_FLAG_ONLY_B64 should be possible to use, no?

richsalz · 2017-04-28T17:56:52Z

As for the elephant, remember when I tried to remove unused function codes and we decided we can't?

kaduk · 2017-04-28T17:57:50Z

Removing the function codes entirely would break people's build.
Just not returning them anymore is somewhat different.

richsalz

more feedback coming.

richsalz · 2017-04-28T18:10:14Z

doc/man3/PEM_bytes_read_bio.pod

RFC reference?

richsalz · 2017-04-28T18:11:42Z

doc/man3/PEM_read_bio_flags.pod

we tend to use _ex suffix for this kind of new API.

kaduk · 2017-04-28T21:17:33Z

(The many red 'x'es were due to a change in my local copy that didn't make it into a commit.)
Rebased, squashed, and attempted to take all feedback so far.

richsalz · 2017-05-04T15:11:00Z

crypto/pem/pem_lib.c

remove the curly and the return and just use if/else.

richsalz · 2017-05-04T15:11:10Z

crypto/pem/pem_lib.c

use the ?: operator

richsalz · 2017-05-04T15:11:36Z

crypto/pem/pem_lib.c

these are new functions? get rid of the _flag part from the name.

Done, both pem_malloc() and pem_free() now.

richsalz · 2017-05-04T15:12:26Z

crypto/pem/pem_lib.c

use the ctype functions and then this probably becomes a define:

/* NOTE |c| is evaluated multiple times! */ #define b64(c) (islanum(c) || c == '+' || c == '/' || c == '=')

Done.
Put parens around the bare 'c's out of habit; hopefully that's not problematic.

richsalz · 2017-05-04T15:15:42Z

crypto/pem/pem_lib.c

are you sure that shouldn't be "const char" ?

There is no semantic difference between "const char" and "char const", only stylistic.

good. then use 'const char' which we use everywhere but once:

; g grep 'char const' | grep -v constant_ crypto/evp/bio_md.c:static int md_write(BIO *h, char const *buf, int num);

Which, of course, gives new readers the wrong instincts when confronted with char * const *c, but that's not a discussion to have here.
(Done.)

richsalz · 2017-05-04T15:19:32Z

crypto/pem/pem_lib.c

move the || to the next line with the evp-decode call. and indent that call an extra shift-width (4)

richsalz · 2017-05-04T15:20:01Z

crypto/pem/pem_lib.c

use the ?: operator.

richsalz · 2017-05-04T15:20:52Z

crypto/pem/pem_lib.c

we almost never use 'cleanup' as a goto target. it's either end or err more than 90% of the time.

Okay. Went with 'end', as the normal/success control flow also goes through the code after the label, which makes it not exactly the error-handling as would be implied by 'err'.

richsalz · 2017-05-04T15:22:01Z

crypto/pem/pem_lib.c

memleak of ctx. use a goto. or move this test before the ctx assignment.

Done (goto).

richsalz · 2017-05-04T15:22:51Z

crypto/pem/pem_lib.c

how can this be true? taillen is negative?

If both the starting len and taillen are zero, i.e., there was no content in the PEM file, just the BEGIN/END lines.
Mostly, IIRC, it was intended to avoid malloc(0), but I originally wrote this quite some time ago and could be misremembering.

add a comment for that. assuming you do so, and since someone else will review, +1 from me.

The extended function includes a 'flags' argument to allow callers to specify different requested behaviors. In particular, callers can request that temporary storage buffers are allocated from the secure heap, which could be relevant when loading private key material. Refactor PEM_read_bio to use BIO_mems instead of BUFs directly, use some helper routines to reduce the overall function length, and make some of the checks more reasonable.

Split the PEM_bytes_read_bio() implementation out into a pem_bytes_read_bio_flags() helper, to allow it to pass PEM_FLAG_SECURE as needed. Adjust the cleanup to properly use OPENSSL_secure_free() when needed, and reimplement PEM_bytes_read() as a wrapper around the _flags helper. Add documentation for PEM_bytes_read_bio() and the new secmem variant.

We now have a version of PEM_read_bytes that can use temporary buffers allocated from the secure heap; use them to handle this sensitive information. Note that for PEM_read_PrivateKey, the i/o still goes through stdio since the input is a FILE pointer. Standard I/O performs additional buffering, which cannot be changed to use the OpenSSL secure heap for temporary storage. As such, it is recommended to use BIO_new_file() and PEM_read_bio_PrivateKey() instead.

Get some trivial test coverage that this flag does what it claims to. [extended tests]

[extended tests]

kaduk · 2017-05-04T16:18:22Z

And rebased again so the CI will actually take a look.

levitte · 2017-05-04T17:19:49Z

This should be taken out of WIP at this point, no?

kaduk · 2017-05-04T17:22:24Z

This should be taken out of WIP at this point, no?

I could have sworn that I already had done so; maybe it reverted back somehow (on rebase?).
Regardless, removed WIP from the title.

kaduk · 2017-05-04T17:22:58Z

Ah, the history feed has

 richsalz changed the title from PEM read-side refactor to WIP PEM read-side refactor 6 days ago

Shrug.

kaduk · 2017-05-08T16:52:35Z

@levitte it seems that Rich is happy now, and the WIP is gone; can you take another look?

levitte

Looks good to me

levitte · 2017-05-08T18:25:46Z

@richsalz, wanna merge or shall I?

richsalz · 2017-05-08T18:34:42Z

go ahead.

levitte · 2017-05-08T19:24:19Z

Merged. 204afd8 to fa3ed5b

The extended function includes a 'flags' argument to allow callers to specify different requested behaviors. In particular, callers can request that temporary storage buffers are allocated from the secure heap, which could be relevant when loading private key material. Refactor PEM_read_bio to use BIO_mems instead of BUFs directly, use some helper routines to reduce the overall function length, and make some of the checks more reasonable. Reviewed-by: Rich Salz <[email protected]> Reviewed-by: Richard Levitte <[email protected]> (Merged from #1700)

Split the PEM_bytes_read_bio() implementation out into a pem_bytes_read_bio_flags() helper, to allow it to pass PEM_FLAG_SECURE as needed. Adjust the cleanup to properly use OPENSSL_secure_free() when needed, and reimplement PEM_bytes_read() as a wrapper around the _flags helper. Add documentation for PEM_bytes_read_bio() and the new secmem variant. Reviewed-by: Rich Salz <[email protected]> Reviewed-by: Richard Levitte <[email protected]> (Merged from #1700)

We now have a version of PEM_read_bytes that can use temporary buffers allocated from the secure heap; use them to handle this sensitive information. Note that for PEM_read_PrivateKey, the i/o still goes through stdio since the input is a FILE pointer. Standard I/O performs additional buffering, which cannot be changed to use the OpenSSL secure heap for temporary storage. As such, it is recommended to use BIO_new_file() and PEM_read_bio_PrivateKey() instead. Reviewed-by: Rich Salz <[email protected]> Reviewed-by: Richard Levitte <[email protected]> (Merged from #1700)

Get some trivial test coverage that this flag does what it claims to. [extended tests] Reviewed-by: Rich Salz <[email protected]> Reviewed-by: Richard Levitte <[email protected]> (Merged from #1700)

levitte reviewed Dec 6, 2016

View reviewed changes

kaduk force-pushed the pem branch from a29cad5 to 367be29 Compare December 7, 2016 16:50

kaduk changed the title ~~WIP PEM read-side refactor~~ PEM read-side refactor Dec 7, 2016

kaduk commented Dec 7, 2016

View reviewed changes

kaduk mentioned this pull request Feb 27, 2017

Refactor PEM_read_bio_{PrivateKey,Parameters,DHparams} #2746

Closed

kaduk force-pushed the pem branch from 367be29 to eb854ce Compare April 3, 2017 21:53

levitte requested changes Apr 12, 2017

View reviewed changes

kaduk force-pushed the pem branch from eb854ce to 3c05440 Compare April 12, 2017 21:59

richsalz suggested changes Apr 28, 2017

View reviewed changes

kaduk force-pushed the pem branch from 0839deb to ad3fdbb Compare April 28, 2017 21:16

richsalz suggested changes May 4, 2017

View reviewed changes

kaduk added 5 commits May 4, 2017 11:17

Add unit test for PEM_FLAG_ONLY_B64

0cf036e

Get some trivial test coverage that this flag does what it claims to. [extended tests]

Adopt to review comments from Rich

1307674

[extended tests]

kaduk force-pushed the pem branch from bc75a80 to 1307674 Compare May 4, 2017 16:18

richsalz approved these changes May 4, 2017

View reviewed changes

Add comment explaining strange conditional

41e085d

kaduk changed the title ~~WIP PEM read-side refactor~~ PEM read-side refactor May 4, 2017

levitte approved these changes May 8, 2017

View reviewed changes

levitte closed this May 8, 2017

kaduk mentioned this pull request Nov 27, 2019

Parsing PEM Too Restrictive For RFC7468 #10529

Closed

Uh oh!

Conversation

kaduk commented Oct 12, 2016

Uh oh!

tlhackque commented Oct 13, 2016

Uh oh!

richsalz commented Oct 13, 2016

Uh oh!

tlhackque commented Oct 13, 2016

Uh oh!

richsalz commented Oct 13, 2016

Uh oh!

kaduk commented Oct 13, 2016

Uh oh!

tlhackque commented Oct 13, 2016

Uh oh!

richsalz commented Oct 13, 2016

Uh oh!

FdaSilvaYY commented Oct 13, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tlhackque commented Oct 13, 2016

Uh oh!

richsalz commented Oct 13, 2016

Uh oh!

tlhackque commented Oct 13, 2016

Uh oh!

richsalz commented Nov 13, 2016

Uh oh!

kaduk commented Nov 15, 2016

Uh oh!

richsalz commented Nov 15, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kaduk commented Dec 7, 2016

Uh oh!

kaduk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tlhackque Dec 7, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

levitte commented Feb 27, 2017

Uh oh!

kaduk commented Feb 27, 2017

Uh oh!

levitte commented Feb 27, 2017

Uh oh!

kaduk commented Apr 3, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

FdaSilvaYY commented Oct 13, 2016 •

edited

Loading

tlhackque Dec 7, 2016 •

edited

Loading