Conversation
|
My comments on RT4698 still apply - I don't quite understand the state change to "rejected but issue tracked on (this) PR". Whitespace (anywhere) and odd line lengths are common UI issues. Non-base64 chars have pluses and minuses. For the former, consider an e-mail that gets quoted with '>' when forwarded. The latter are described in the RFC. As I mentioned in the RT, the OpenXPKI project has encountered this issue; see openxpki/openxpki/issues/437. I maintain the Perl CPAN module Crypt::PKCS10, where I parse and output CSRs and public keys. When parsing, I remove all whitespace, optionally remove non-base64 characters, and treat the result as one long string. That's a lot simpler than insisting on 64 character lines (except the last). And it accepts all valid input. Of course, that's Perl. But in C code it's easy to do the same thing by processing one character at a time. Errors are best caught when parsing the asn.1 (and validating signatures), not by enforcing a strict punch-card era text format. I recommend this. For output, of course, I produce strictly compliant PEM. In other applications, it's a bit more involved to detect the encryption header - but once past that, I use the same approach. Bottom line, I encourage the Unix "accept liberal, produce conservative". I'd settle for an option, but I don't think it's necessary - and OpenSSL has enough of those as is. |
|
First off, I never said there weren’t interop concerns. I am sorry if my quibbles/jokes over MUST sent the conversation down the wrong chute. The ticket is closed because we’re moving to GitHub. Please raise your concerns here, or perhaps Ben has addressed them. |
|
I understood the spirit of your comments, no problem. I literally didn't understand why 'rejected' was the chosen state for the RT - but that's why I commented here. 'rejected' is usually a strong dismissive statement from a maintainer. The original purpose of this PR was addressing refactoring and choice of memory pools, not the issue I raised. I'm certainly OK with merging them for convenience; I'm grateful that Ben volunteered. At first glance, Ben has re-worked the existing code, but not liberalized the parsing. His comment at the top of this thread says "WIP", "we still need to decide..." The code seems to have a lot of legacy and complexity; I haven't done a detailed review but applaud the courage in touching it. If you take my advice, it could well get simpler. I think my concerns are adequately listed here (incorporating by reference the RT thread). I'll watch this thread and am willing to participate if that's useful. |
|
I suppose I will use "resolved" in closing the others. Thanks! |
|
It is WIP largely because we need to decide what we want the behavior to be. |
|
Yes, flags provide options - but also complexity for users, maintenance and testing. I settled on one option - whether to ignore non-b64. While I don't think using them as a covert channel is a serious risk in OpenSSL data, it's a simple accommodation and sometimes catches corruption. e.g. '=' in the middle of a string can happen when someone copies quoted printable... Whitespace is harmless. So are random line lengths. I don't have a case for bug-for-bug compatibility here. Currently, valid data aborts processing. I think I saw that OpenSSL will sometimes accept PEM data without the END label. If someone relies on that, I'd have little sympathy. That's the only case that comes to mind of OpenSSL's PEM processing being too liberal. One could easily implement a separate tool for 'is this strict PEM?'. Anyone who might want to validate that they're producing strict PEM would be better served by such a tool than by OpenSSL's idiosyncrasies... And I think that reducing the options in OpenSSL code would be a good thing. Of course once you accept liberal input, Hope your injury heals soon. |
|
There is code in openssl that depends on some of the current behavior. I forget details. See the issue/PR from AGL that code a beer-mug in it. |
|
It is #1209 "Rewrite base64 code." |
|
Cute, but I know of no RFC that considers '-' as a comment character. That's the 'covert channel' argument from the RFC. Note that Google reports 'no ill effects' from its removal. Most of the image could be retained with "ignore Base64" - only 7 distinct characters in the image are valid Base64. oOCQCG0 There are plenty of other graphics with which to replace them. Why would asn1parse care about the PEM? It should get binary the same way that all PEM consumers do, and parse the binary. If it doesn't, it should - if for no other reason than that as a debug parser, it should see the same bits as everything else. |
|
Should doesn't always equal the way things work, and considerations of the deployed based. |
|
By 'should', I meant that if necessary, simplifying asn1parse to use the new input routines becomes part of this work - or maybe it's a separate bug against asn1parse. A diagnostic tool can be evil if it sees different bits from the real consumer. Is the '-' comment line behavior documented? Is that really what allows the image - it looks as though it may be relying on the 'if the line is not 64 characters long, the PEM is over' logic? Anyhow, my 3 cents is that if someone relies on this, it's not a hardship to deal with it externally Compatibility is important - but it's always a trade-off. I've said enough. I'll yield to your team's judgement - my primary concern is that OpenSSL accept RFC-compliant input. |
|
what's going on with this now? |
|
I timed out on waiting for feedback about what flags the dev team was interested in seeing, and was going to default to implementing what @tlhackque wanted and the secure memory flag and remove the WIP label, but then started blocking on time to actually do that work. |
|
Might be worth trying to start discussion on the tls-dev list and perhaps close this. |
crypto/pem/pem_lib.c
Outdated
There was a problem hiding this comment.
What about tabs? Maybe use an appropriate ctype function... isspace() or isblank()?
There was a problem hiding this comment.
This part is for "bug-for-bug" compatibility with the historical code, and is in fact part of what inspired me to do the wholesale refactor.
There was a problem hiding this comment.
Note also that <= ' ' in ASCII includes all control characters: NUL (000) thru US (0x1F). So it does include tab [HT]. (and FF, and VT and CR and LF ...) In EBCDIC, it also happens to do more or less the same thing, though it's a toss-up as to whether that was in the mind of the original author...
ctype would be the right thing for new (sane) code that wants to detect whitespace.
|
Rebased to lateset master, with some updates: renamed flag from "weak eol" to "eay compatible", since that's what it's intended to be, and added a b64-only (i.e., "strict") option, with the default behavior being to treat control characters as whitespace (i.e., ignored) ... which means that ASCII non-base64 characters get passed through to EVP_Decode, for now. |
kaduk
left a comment
There was a problem hiding this comment.
A few questions for reviewers noted inline.
crypto/pem/pem_lib.c
Outdated
There was a problem hiding this comment.
This TODO should be resolved before this gets merged ... thoughts?
There was a problem hiding this comment.
You might want to have a look in crypto/evp/encode.c...
There was a problem hiding this comment.
I wasn't super keen on exposing that stuff to the whole library and leaving it in encode.c, nor was I super enthusiastic about splitting it out into a more generic place. But, if that's how it has to be, I can do it...
There was a problem hiding this comment.
Not sure why you would want a table; in all codesets those ranges are contiguous. The issues come when you assume the relative placements of upper vs lower case, or digits vs letters, or where other graphic (e.g. + and /) appear relative to other characters.
Not that this is particularly performance sensitive, but a table (256 bytes? bigger?) would require memory and data cache; the range tests are compact (minimum instruction and no data cache impact) and cover the cases. I don't think a table adds clarity to the code. The ranges capture it. This isn't crypto, so no reason to care about constant time. So, I'd stick with what you have.
You might return -1 for invalid, and the encoded value for OK (e.g. 0 for 'A' & 26 for 'a'); that would let the caller do something like if( (nibble = isb64(in)) < 0 ) return error (or ignore); val |= nibble << state; It's easy to do: if( c >= 'a' && c <= 'z' ) return (c - 'a') + 26; etc.
You might want a state machine somewhere.; that might work as a (small) table. (I haven't reviewed the whole commit, and don't plan to. But noticed the questions.)
The ranges look right to me, assuming you handle pad ('=') elsewhere. Ref: rfc4648, which has a table.
My inclination for default is to be lax; interpret B64 chars, ignore everything else. Pad only at the end, and only the permissible cases (0,1,2). There's a case for exception if non-B64 other than whitespace (which is sp,ht,lf,cr,ff, & often vt; sometimes NUL) or proper padding. The EAY line length-sensitive weirdness is at best a desperate fallback for compatibility; should not be a default. (Even though it is old behavior, it causes reasonable input to be rejected.)
I'm assuming that the apps / consumers will be updated as needed to resolve the external manifestations. Hopefully a consistent option: -pemfmt [obsolete|strict] with default = lax ?
Hope this helps. Thanks for working on it.
crypto/pem/pem_lib.c
Outdated
There was a problem hiding this comment.
Is this a reasonable error reason?
crypto/pem/pem_lib.c
Outdated
There was a problem hiding this comment.
Input requested on which BIO_get routine is better to use here.
There was a problem hiding this comment.
Both do more or less the same thing, the crucial thing is how you want the data.
Probably makes sense as a later addition ... there are several codepaths that could be dispatched to by that function and it's not clear that it would need to be done at the same time. Thanks for pointing it out! |
|
Later addition is fine, I was only asking about the intention going forward. |
|
Rebased and fixed a couple needless behavior changes exposed by 04-test_pem.t. |
crypto/pem/pem_lib.c
Outdated
There was a problem hiding this comment.
Internal functions should have lower case prefix
crypto/pem/pem_lib.c
Outdated
There was a problem hiding this comment.
This is repeated in all condition branches, so I's suggest replacing these two lines with len++;...
crypto/pem/pem_lib.c
Outdated
There was a problem hiding this comment.
These four lines with len = i;...
crypto/pem/pem_lib.c
Outdated
There was a problem hiding this comment.
These four lines with len = i;...
crypto/pem/pem_lib.c
Outdated
There was a problem hiding this comment.
And finally insert these two lines here:
linebuf[len++] = '\n';
linebuf[len] = '\0';
crypto/pem/pem_lib.c
Outdated
There was a problem hiding this comment.
Our C block comments should look like this:
/*
* text
* text
*/
crypto/pem/pem_lib.c
Outdated
There was a problem hiding this comment.
We'd like our block comments to look like this:
/*
* text
* text
*/
include/openssl/pem.h
Outdated
There was a problem hiding this comment.
Move this one down with PEM_bytes_read_bio
|
Apart from my remarks of the changes, you need to fix the documentation further, as noted by find-doc-nits: Also, you need to rebase on a fresher master |
|
Looking at the commits again caused me to feel uneasy about the lack of test coverage for PEM_FLAG_ONLY_B64, which turns out to have been justified. |
Depends what you mean with "encrypted". If "encrypted by PEM headers", then yes, you're right, but if it's encrypted in a PKCS#8 container, then PEM_FLAG_ONLY_B64 should be possible to use, no? |
|
As for the elephant, remember when I tried to remove unused function codes and we decided we can't? |
|
Removing the function codes entirely would break people's build. |
doc/man3/PEM_bytes_read_bio.pod
Outdated
doc/man3/PEM_read_bio_flags.pod
Outdated
There was a problem hiding this comment.
we tend to use _ex suffix for this kind of new API.
|
(The many red 'x'es were due to a change in my local copy that didn't make it into a commit.) |
crypto/pem/pem_lib.c
Outdated
There was a problem hiding this comment.
remove the curly and the return and just use if/else.
crypto/pem/pem_lib.c
Outdated
crypto/pem/pem_lib.c
Outdated
There was a problem hiding this comment.
these are new functions? get rid of the _flag part from the name.
There was a problem hiding this comment.
Done, both pem_malloc() and pem_free() now.
crypto/pem/pem_lib.c
Outdated
There was a problem hiding this comment.
use the ctype functions and then this probably becomes a define:
/* NOTE |c| is evaluated multiple times! */
#define b64(c) (islanum(c) || c == '+' || c == '/' || c == '=')
There was a problem hiding this comment.
Done.
Put parens around the bare 'c's out of habit; hopefully that's not problematic.
crypto/pem/pem_lib.c
Outdated
There was a problem hiding this comment.
are you sure that shouldn't be "const char" ?
There was a problem hiding this comment.
There is no semantic difference between "const char" and "char const", only stylistic.
There was a problem hiding this comment.
good. then use 'const char' which we use everywhere but once:
; g grep 'char const' | grep -v constant_
crypto/evp/bio_md.c:static int md_write(BIO *h, char const *buf, int num);
There was a problem hiding this comment.
Which, of course, gives new readers the wrong instincts when confronted with char * const *c, but that's not a discussion to have here.
(Done.)
crypto/pem/pem_lib.c
Outdated
There was a problem hiding this comment.
move the || to the next line with the evp-decode call. and indent that call an extra shift-width (4)
crypto/pem/pem_lib.c
Outdated
crypto/pem/pem_lib.c
Outdated
There was a problem hiding this comment.
we almost never use 'cleanup' as a goto target. it's either end or err more than 90% of the time.
There was a problem hiding this comment.
Okay. Went with 'end', as the normal/success control flow also goes through the code after the label, which makes it not exactly the error-handling as would be implied by 'err'.
crypto/pem/pem_lib.c
Outdated
There was a problem hiding this comment.
memleak of ctx. use a goto. or move this test before the ctx assignment.
crypto/pem/pem_lib.c
Outdated
There was a problem hiding this comment.
how can this be true? taillen is negative?
There was a problem hiding this comment.
If both the starting len and taillen are zero, i.e., there was no content in the PEM file, just the BEGIN/END lines.
Mostly, IIRC, it was intended to avoid malloc(0), but I originally wrote this quite some time ago and could be misremembering.
There was a problem hiding this comment.
add a comment for that. assuming you do so, and since someone else will review, +1 from me.
The extended function includes a 'flags' argument to allow callers to specify different requested behaviors. In particular, callers can request that temporary storage buffers are allocated from the secure heap, which could be relevant when loading private key material. Refactor PEM_read_bio to use BIO_mems instead of BUFs directly, use some helper routines to reduce the overall function length, and make some of the checks more reasonable.
Split the PEM_bytes_read_bio() implementation out into a pem_bytes_read_bio_flags() helper, to allow it to pass PEM_FLAG_SECURE as needed. Adjust the cleanup to properly use OPENSSL_secure_free() when needed, and reimplement PEM_bytes_read() as a wrapper around the _flags helper. Add documentation for PEM_bytes_read_bio() and the new secmem variant.
We now have a version of PEM_read_bytes that can use temporary buffers allocated from the secure heap; use them to handle this sensitive information. Note that for PEM_read_PrivateKey, the i/o still goes through stdio since the input is a FILE pointer. Standard I/O performs additional buffering, which cannot be changed to use the OpenSSL secure heap for temporary storage. As such, it is recommended to use BIO_new_file() and PEM_read_bio_PrivateKey() instead.
Get some trivial test coverage that this flag does what it claims to. [extended tests]
[extended tests]
|
And rebased again so the CI will actually take a look. |
|
This should be taken out of WIP at this point, no? |
I could have sworn that I already had done so; maybe it reverted back somehow (on rebase?). |
|
Ah, the history feed has Shrug. |
|
@levitte it seems that Rich is happy now, and the WIP is gone; can you take another look? |
|
@richsalz, wanna merge or shall I? |
|
go ahead. |
The extended function includes a 'flags' argument to allow callers to specify different requested behaviors. In particular, callers can request that temporary storage buffers are allocated from the secure heap, which could be relevant when loading private key material. Refactor PEM_read_bio to use BIO_mems instead of BUFs directly, use some helper routines to reduce the overall function length, and make some of the checks more reasonable. Reviewed-by: Rich Salz <[email protected]> Reviewed-by: Richard Levitte <[email protected]> (Merged from #1700)
Split the PEM_bytes_read_bio() implementation out into a pem_bytes_read_bio_flags() helper, to allow it to pass PEM_FLAG_SECURE as needed. Adjust the cleanup to properly use OPENSSL_secure_free() when needed, and reimplement PEM_bytes_read() as a wrapper around the _flags helper. Add documentation for PEM_bytes_read_bio() and the new secmem variant. Reviewed-by: Rich Salz <[email protected]> Reviewed-by: Richard Levitte <[email protected]> (Merged from #1700)
We now have a version of PEM_read_bytes that can use temporary buffers allocated from the secure heap; use them to handle this sensitive information. Note that for PEM_read_PrivateKey, the i/o still goes through stdio since the input is a FILE pointer. Standard I/O performs additional buffering, which cannot be changed to use the OpenSSL secure heap for temporary storage. As such, it is recommended to use BIO_new_file() and PEM_read_bio_PrivateKey() instead. Reviewed-by: Rich Salz <[email protected]> Reviewed-by: Richard Levitte <[email protected]> (Merged from #1700)
Get some trivial test coverage that this flag does what it claims to. [extended tests] Reviewed-by: Rich Salz <[email protected]> Reviewed-by: Richard Levitte <[email protected]> (Merged from #1700)
WIP partially since we still have to make up our mind how strict we want the parser to be (and with flags, we even can offer multiple options).
See also RT4698