Skip to content

Comments

CORE & PROV: Add a generic callback structure and upcall#10390

Closed
levitte wants to merge 3 commits intoopenssl:masterfrom
levitte:300-callback
Closed

CORE & PROV: Add a generic callback structure and upcall#10390
levitte wants to merge 3 commits intoopenssl:masterfrom
levitte:300-callback

Conversation

@levitte
Copy link
Member

@levitte levitte commented Nov 8, 2019

The idea is that libcrypto code will be able to build up the structure
it needs for any callback, pass that to the provider where the provider
will accept it, and the provider passes that back along with an
OSSL_PARAM array with data from the provider to the core_generic_callback
upcall, which knows how to handle the passed structure.

The structure is very simple, and includes a function pointer to a
libcrypto callback, as well as a pointer to some libcrypto data. That
libcrypto callback needs to know what to do with that data to call the
application callback with the arguments it requires.

This method permits the provider to call application callbacks without
having any information of what signature they have, and caters to those
who may want to build a provider module into a secure environment where
all the calls simply can't be any random function call, but rather must
be in form of up- or downcalls in the libcrypto<->provider interface.

On the provider side, support is added, and is fairly minimal

It contains only one function, which should only get added to non-FIPS
providers.
The idea is that libcrypto code will be able to build up the structure
it needs for any callback, pass that to the provider where the provider
will accept it, and the provider passes that back along with an
OSSL_PARAM array with data from the provider to the core_generic_callback
upcall, which knows how to handle the passed structure.

The structure is very simple, and includes a function pointer to a
libcrypto callback, as well as a pointer to some libcrypto data.  That
libcrypto callback needs to know what to do with that data to call the
application callback with the arguments it requires.

This method permits the provider to call application callbacks without
having any information of what signature they have, and caters to those
who may want to build a provider module into a secure environment where
all the calls simply can't be any random function call, but rather must
be in form of up- or downcalls in the libcrypto<->provider interface.
This is a very simple change, just make sure to get the function,
and add general provider support for calling it.
@levitte
Copy link
Member Author

levitte commented Nov 8, 2019

This is an extract from #10289, done when I realised I have a few more PRs in line that use this functionality. There are some slight modifications compared to the commits in #10289, to make more like a provider side library call.

@levitte
Copy link
Member Author

levitte commented Nov 8, 2019

For the curious, the way this is structured is inspired from BN_GENCB, see the manual for BN_generate_prime_ex

@levitte levitte added the branch: master Applies to master branch label Nov 8, 2019
@levitte
Copy link
Member Author

levitte commented Nov 8, 2019

List of other PRs waiting on this: #10391, #10289

@richsalz
Copy link
Contributor

richsalz commented Nov 8, 2019

For what it's worth, I don't like this very much, for a couple of reasons. In no particular order

  • It requires non-provider-releated code to know about OSSL_PARAM
  • There is only one callback upcall, which means that function has to be able, somehow, to dispatch to the right function -- do git grep callback doc/man3 for example to see how many there could be.
  • I do not see how it's modelled after BN_GENCB, which is a specific call back type and not at all generic. My issue here might just be documentation.

I would prefer "upcall" to be added as a new primitive OSSL_PARAM type. The provider would know the calling sequence and return value of what the upcall returns, if anything. It would have to be documented. Anything other than a simple int return would need careful handling to avoid heap issues.

@levitte
Copy link
Member Author

levitte commented Nov 8, 2019

  • It requires non-provider-releated code to know about OSSL_PARAM

It depends, actually. The OSSL_CALLBACK structure is opaque, which means that libcrypto must somehow deal with the call. We may simply have it connect back to an application function, which must then know about OSSL_PARAM, but considering that's a public structure, along with the OSSL_PARAM API, the application should be able to deal with it. Besides, we are already exposing OSSL_PARAM to the application through other stuff.

  • There is only one callback upcall, which means that function has to be able, somehow, to dispatch to the right function -- do git grep callback doc/man3 for example to see how many there could be.

I you look, you'll see this in include/internal/core.h:

/* Typedef in openssl/core.h */
struct ossl_callback_st {
    int (*callback)(const OSSL_PARAM params[], void *arg);
    void *arg;
};

See the callback function pointer? That is how core_generic_callback knows what to call back to. Point is, that callback could be a direct application function, but it could also be an internal libcrypto function, with arg pointing to an internal structure with more internal info (such as the application callback with whatever signature we choose). As a matter of fact, if you stufy #10289, you will see that it does exactly the latter to connect back to the application-provided key generation feedback callback.

  • I do not see how it's modelled after BN_GENCB, which is a specific call back type and not at all generic. My issue here might just be documentation.

Er...... you really need to study that structure a little better.

I would prefer "upcall" to be added as a new primitive OSSL_PARAM type. The provider would know the calling sequence and return value of what the upcall returns, if anything. It would have to be documented. Anything other than a simple int return would need careful handling to avoid heap issues.

And therein lies the issue that I see, that the provider must know all those things that may be internal, or depend on the application (if we choose to pass functions directly), which also potentially means that applications must know the provider's exact expectations.

@levitte
Copy link
Member Author

levitte commented Nov 8, 2019

Another thing that we don't talk much about but that I factor in is that there are architectures out there that provide secure environment with a very sharp edge between the application environment and the secure environment, and basically need to have all calls between the two as dispatches along the boundary between the two (i.e. downcalls and upcalls). In such an environment, a call from the secure environment directly into a random spot in application space is absolutely forbidden.

So imagine someone wanting to place, say, the FIPS module in such a secure environment while keeping libcrypto in application space. If we allow passing random function pointers, and even worse, calling them across the application<->secure environment boundary with no chance of control on the boundary, then we're making that kind of job impossibly hard.

@richsalz
Copy link
Contributor

richsalz commented Nov 8, 2019

In such an environment, a call from the secure environment directly into a random spot in application space is absolutely forbidden.

Which is what this generic callback adds :) My #10390 (comment) would require documentation of the calling sequence (and not said, but implied) semantics of the upcalls.

BN_GENCB is not generic, it's called for keygen whenever big numbers are generated. It just so happens that this is important in crypto, and therefore called from RSA, etc. But like i said, that's just because BN is involved. How would this mechanism be used for BIO_set_callback or the DRBG callbacks, or SSL verify?

Having one single function which then dispatches (if it knows how to do so) is less secure. It requires the application to trust non-FIPS code. It is safer to put the "BN_GENCB callback" function in the FIPS module call that "generates a BN" and then have that validated code do an upcall. A FIPS application might, or might not, want to do that. Having a generic upcall receiver in the non-validated crypto code is a security issue.

A single upcall wrapping all possible upcalls is wrong. Invidiual provider operations that could make an upcall should specify the upcall (and a void*) at the time the function is invoked.

@levitte
Copy link
Member Author

levitte commented Nov 8, 2019

It requires the application to trust non-FIPS code.

Er...... the application calls libcrypto functions. That's non-FIPS code... so I don't quite understand how this particular bit of libcrypto code is suddenly less trustable than the rest...

@levitte
Copy link
Member Author

levitte commented Nov 8, 2019

Either way, documentation! You're quite correct, I do need to add that.

@levitte
Copy link
Member Author

levitte commented Nov 8, 2019

Another way to see this is simply that OSSL_CALLBACK is an indirect reference to the actual callback (sometimes via an internal libcrypto call that "translates" the received OSSL_PARAM array to whatever the actual callback wants)

@richsalz
Copy link
Contributor

richsalz commented Nov 8, 2019

Another way to see this ...

Yes, and I think that is the wrong approach.

Providers can know the exact parameters and make the call directly. They shouldn't have to wrap it up as PARAMS and then call a generic upcall-dispatch function to get to the right call in the application.

@levitte
Copy link
Member Author

levitte commented Nov 8, 2019

Providers can know the exact parameters and make the call directly. They shouldn't have to wrap it up as PARAMS and then call a generic upcall-dispatch function to get to the right call in the application.

Not doing this requires that both ends must share exact function signature knowledge, which may be quite internal. We wanted to avoid exactly that. Also, such direct calls is directly harmful in the secure environment scenario I talked about in #10390 (comment)


int ossl_prov_callback_from_dispatch(const OSSL_DISPATCH *fns)
{
for (; fns->function_id != 0; fns++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it kind of strange that you loop over all the dispatch functions twice. Once here, and once in defltprov.c/fipsprov.c/legacyprov.c

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah well... it's essentially to avoid copying the same set of cases all over the place (and potentially getting it wrong somewhere). It's quite possible that we should just have a central generic get-me-everything type of call and be done with it. I can certainly make such a thing, question is if that should happen in this PR or in a separate one... it's kind of off-topic for this one...

@richsalz
Copy link
Contributor

richsalz commented Nov 8, 2019

I do not see how a callback can be internal. More importantly, I do not see why a callback should be internal. This is NOT the same as "hide details of an RSA key" This is "I will call this function when generating a BIGNUM" Why does that last part need to have details obscured?

As for direct calls being harmful, I believe you are wrong. (This is a stronger statement than I disagree.) From a security perspective trusted->untrusted is the same as trusted->runtime->untrusted. Because the runtime is not inside the trust boundary. In fact, it is less secure than the one-step call because the runtime is now involved in EVERY CALLBACK which it doesn't have to be.

@levitte
Copy link
Member Author

levitte commented Nov 8, 2019

As for internal callbacks, see #10391 to see an example.

@levitte
Copy link
Member Author

levitte commented Nov 8, 2019

As for secure environments, you and I may not talk about the same thing.
SGX is an open (Intel) platform I've had a look at, which offers trusted environments (they call them enclaves if I remember correctly), where all interaction between the trusted and the untrusted sides must happen via well defined up- & downcalls. All these up- & downcalls are dispatched along the boundary between the trusted and the untrusted sided. A direct call from inside the trusted environment to some random function on the untrusted side is a big no-no (I think it's actually forbidden).
If we imagine a provider module sitting in a trusted enclave and libcrypto+application sitting in the untrusted environment, you might understand how a direct call just won't work...

@richsalz
Copy link
Contributor

richsalz commented Nov 8, 2019

If you have an SGX provider, you'll need some runtime to go with that, and that can do the dispatch as such if needed; I don't see a need to make every upcall go through a runtime dispatcher.

And if someone is trusting SGX to be secure, they should need to do more research :(

@levitte
Copy link
Member Author

levitte commented Nov 8, 2019

... and that can do the dispatch as such if needed

That means that every application that sets up a callback will have to do that.

I don't see a need to make every upcall go through a runtime dispatcher.

Do note that using OSSL_CALLBACK is participatory. If others choose to do it differently, I'll probably have opinions, but it's not like can ultimately force anyone to do my bidding. The strongest thing I can do is to block and provoke a project-wide discussion on how things should be done, and it'll be a stretch before I take things that far.

However, I will certainly use this mechanism in code to come, unless someone is absolutely adamant at forcing my hand.

@richsalz
Copy link
Contributor

richsalz commented Nov 8, 2019

That means that every application that sets up a callback will have to do that.

No. Pass in the function pointer when you set up the rest of the parameters. And then the provider does an upcall to your function. I just cannot understand why you think provider->runtime->application is better than just provider->application. I guess I'm just not seeing something that you do.

@levitte
Copy link
Member Author

levitte commented Nov 8, 2019

No.

Please remain in context, that I said was for an SGX environment, where your "No" is simply not correct.

@richsalz
Copy link
Contributor

richsalz commented Nov 8, 2019

No, my no is correct :) http://www.sgx101.com/portfolio/enclave_lifecycle/ an SGX ocall maps exactly to what I am proposing: a per-operation function callback without going through a generic runtime dispatcher.

@levitte
Copy link
Member Author

levitte commented Nov 8, 2019

[sigh]

From the page you linked to:

Your program does not call the ECALL and OCALL functions directly; it calls the proxy functions. When you make an ECALL, you call the untrusted proxy function for the ECALL, which in turn calls the trusted proxy function inside the enclave. That proxy then calls the “real” ECALL and the return value propagates back to the untrusted function. This sequence is shown in the figure above. When you make an OCALL, the sequence is reversed: you call the trusted proxy function for the OCALL, which calls an untrusted proxy function outside the enclave that, in turn, invokes the “real” OCALL.

Which part of the "calling by proxy function" tells you that you can make a direct call from inside the trusted environment into some random spot in the application?
So then, whatever callback will need to have a proxy function for the OCALL, and if libcrypto won't on behalf of the application, the application will somehow have to do that itself.

@richsalz
Copy link
Contributor

richsalz commented Nov 8, 2019

Yes the SXG lib does it. Not the openssl runtime.

@levitte
Copy link
Member Author

levitte commented Nov 8, 2019

Yes the SXG lib does it.

Not automagically. The application will have to actively set up that proxy function, and somehow, the module will have to know to call that (via its own proxy function).

@richsalz
Copy link
Contributor

richsalz commented Nov 8, 2019

It's automatic in the same way RPCGEN or IDL is automatic. It generates stubs that are ocall-specific and that are linked into your enclave code and your application. This is different from the single re-dispatch at runtime proposed here.

@paulidale
Copy link
Contributor

I'll need to think on this one after reading it more deeply. Call backs should know what call back that are -- I don't think this should be a param, it could be an argument (which leaves one big switch statement to decode the reason) or it could be multiple distinct call backs. I'm not obviously seeing which way is the intention here.

@levitte
Copy link
Member Author

levitte commented Nov 8, 2019

(which leaves one big switch statement to decode the reason)

Why would that be necessary?

@slontis
Copy link
Member

slontis commented Nov 11, 2019

I want a seperate optional callback that I can test if it is NULL in the fips provider. If it is null then this optional callback will not try to call back into the app. A single generic callback is not a good way of doing this.

@levitte
Copy link
Member Author

levitte commented Nov 11, 2019

@slontis... with this PR, the pointer to a OSSL_CALLBACK isn't something waiting to be fetched by the provider from libcrypto, it's a pointer that gets actively passed to the provider functions in specific calls. So the check of NULL is actually very easy to do, the provider function that cares simply checks if it was passed NULL.

But, do note that I'm no trying to force your hand. If you want your very own callback specifically for FIPS self test stuff, by all means do. There's also a difference between that stuff and the stuff I deal with... yours is a global kind of callback, that an application sets and then forgets about, which mine are callbacks that must be considered unique every time they are passed (such as the callback for EVP_PKEY_keygen)

@levitte
Copy link
Member Author

levitte commented Nov 11, 2019

So here's another thing I was thinking of this morning; there's an objective with the libcrypto<->provider interface that we don't mention very often and maybe should mention more, and that's compatibility, both upward and downward. In other words, providers should not have to become sensitive to the libcrypto version.

So then, let's say that we do allow direct calls from the provider into an application callback, using a callback signature that we have defined. EVP_PKEY_CTX_gen_cb is such a signature, defined like this:

typedef int EVP_PKEY_gen_cb(EVP_PKEY_CTX *ctx);

Obviously, libcrypto will have to pass that callback pointer to the appropriate provider side function, along with the pointer to the EVP_PKEY_CTX, which the provider doesn't need to know anything (it can be a void * on the provider side), just that it need to pass that pointer when calling the callback.

Along comes an update, a breaking change, where that callback gains another argument. That means that in the libcrypto<->provider interface, a new FUNC number and associated downcall will have to be added, and dealt with.
That suddenly gives libcrypto two downcalls to deal with, depending on the age of the provider, and it means that any updated provider will also have to deal with either of those two callbacks, depending on what libcrypto version has loaded it. OR, we suddenly get to deal with version checking pain (and that is a pain, we have that with engines, and we have grossly mishandled it).

The way this PR defines things, the exact function signature remains with libcrypto, and can be handled in such a way that of an older provider doesn't pass some data, libcrypto can apply a "I dunno" default, and if a provider made for a newer libcrypto passes some extra unknown param, it will simply be ignored.

So at the very least, I remain convinced of this way of doing things for stuff I do.

@richsalz
Copy link
Contributor

It was stated old providers work with new code, not that new providers work with old runtimes; is that now a requirement?

Either way, the way to handle it is simple: you treat the provider-initiated upcall/callback, the same as any other callback -- changing the argument list is a change in the ABI.

So at the very least, I remain convinced of this way of doing things for stuff I do.

Consistency across the code base is important.

@mattcaswell
Copy link
Member

The way this PR defines things, the exact function signature remains with libcrypto, and can be handled in such a way that of an older provider doesn't pass some data, libcrypto can apply a "I dunno" default, and if a provider made for a newer libcrypto passes some extra unknown param, it will simply be ignored.

This benefit comes from using OSSL_PARAMs to pass data between libcrypto and a provider - it doesn't come from having a single upcall between the provider and libcrypto. We could equally have a separate callback everytime we need one that uses OSSL_PARAMs to pass the data and it would have the same end result.

In fact we're going to need to pass OSSL_PARAM type data between libcrypto and providers in any case, where we have to pass complex data back and forth. For example if an end user callback expects a BIGNUM, then the providers will not understand that datatype. Therefore we're going to have to pass OSSL_PARAM data and there will have to be some translation callback in libcrypto which converts the OSSL_PARAM data understood by the providers into callbacks used by end applications.

@levitte
Copy link
Member Author

levitte commented Nov 11, 2019

Consistency across the code base is important.

Looking at the diversity of callback types we have, "consistency" isn't something that comes to my mind.

@levitte
Copy link
Member Author

levitte commented Nov 11, 2019

So what I hear from you, @mattcaswell, is that instead of passing a OSSL_CALLBACK pointer to providers and expect them to call core_generic_callback with it and a OSSL_PARAM array, you're asking for a generic callback function like this:

typedef int (*OSSL_CALLBACK_fn)(const OSSL_PARAM params[], void *arg);

and have the provider functions take that plus a void arg pointer... so something like this for key generation:

OSSL_CORE_MAKE_FUNC(void *, OP_keymgmt_genkey,
                    (void *genctx, void *domparams, OSSL_CALLBACK_fn *cb, void *cbarg))

@levitte
Copy link
Member Author

levitte commented Nov 11, 2019

It was stated old providers work with new code, not that new providers work with old runtimes; is that now a requirement?

It has been mentioned in passing more than made an actual requirement... but something we have said is that the providers shouldn't need to care about versions, they should just offer functionality, and use whatever upcalls they can get. By consequence, running a newer provider with an older libcrypto should be perfectly valid, as long as the necessary up- and down-calls are present.

@richsalz
Copy link
Contributor

Consistency across the code base is important.

Looking at the diversity of callback types we have, "consistency" isn't something that comes to my mind.

That is not what i meant. You said "I am going to do this" Even if the rest of the code goes another way, such as by using per-function upcalls.

@richsalz
Copy link
Contributor

I think if the project wants to make new providers work with old runtimes, they should vote and say that explicitly.

@levitte
Copy link
Member Author

levitte commented Nov 11, 2019

I've heard all argument, and am shifting to specify a "standard" callback signature, define like this:

typedef int (*OSSL_CALLBACK)(const OSSL_PARAM params[], void *arg);

With the expectation that provider side functions that want to use it take a pointer to that type as well as a void argument that's simply passed to the callback function.
The main reason for the OSSL_PARAM array of stuff to pass back is libcrypto<->provider ABI stability, even in the face of changes in libcrypto's public API, and also because it is our standard way to pass random data between libcrypto and providers.

@levitte
Copy link
Member Author

levitte commented Nov 11, 2019

(this will become a very small PR)

UPDATE: or I'll make a separate PR

@richsalz
Copy link
Contributor

Thanks for adapting part of the way.

Next question: why is a provider->app code upcall different form, say OpenSSL malloc upcalls?

@levitte
Copy link
Member Author

levitte commented Nov 11, 2019

libcrypto<->provider interface stabilty. The callbacks in the public API are ad-hoc in nature so far. There's no reason why the providers should be exposed to that when they can use a much more generic interface and leave it to libcrypto to translate as needed to the application callbacks.

I see it as entirely possible to expose the applications directly to this generic callback form, but that lies in the future.

@richsalz
Copy link
Contributor

The reason is compile-time type safety. This is C, not a duck-typing language like Python or perl or php. you refer to the current callbacks as ad hoc, I prefer to call them specific. This PR, while better, still treats C like a scripting language because of the misplaced concern that things will evolve.

Sigh. maybe someone else shares my view.

@levitte
Copy link
Member Author

levitte commented Nov 11, 2019

Side note: I've been reminded that the compatibility objective is a requirement, and has been voiced during earlier meetings.

@richsalz
Copy link
Contributor

Old providers work with new code, yes. New providers (using new callbacks) work with old code, was not discussed. That is either compatible or an API/ABI break, just like if someone changes the calling sequence (or semantics) of an existing callback, like the X509 verify callback.

@levitte
Copy link
Member Author

levitte commented Nov 12, 2019

misplaced concern that things will evolve

Dude, I take offence at that statement.

If you look at OpenSSL's history, from real old RSA_generate_key

DEPRECATEDIN_0_9_8(RSA *RSA_generate_key(int bits, unsigned long e,
                                         void (*callback) (int, int, void *),
                                         void *cb_arg))

... through an attempt to deal with evolution (the BN_GENCB structure, which deals with exactly that, but was also a first attempt at paving for something more generic):

int RSA_generate_key_ex(RSA *rsa, int bits, BIGNUM *e, BN_GENCB *cb);
struct bn_gencb_st {
    unsigned int ver;           /* To handle binary (in)compatibility */
    void *arg;                  /* callback-specific data */
    union {
        /* if (ver==1) - handles old style callbacks */
        void (*cb_1) (int, int, void *);
        /* if (ver==2) - new callback style */
        int (*cb_2) (int, int, BN_GENCB *);
    } cb;
};

... and all the way to the EVP_PKEY_CTX key generation callback, which is an attempt at parametrized callbacks (side note: which relies on the fact that the EVP_PKEY_CTX instance is shared between the EVP layer and the implementation):

typedef int EVP_PKEY_gen_cb(EVP_PKEY_CTX *ctx);

void EVP_PKEY_CTX_set_cb(EVP_PKEY_CTX *ctx, EVP_PKEY_gen_cb *cb);

So er, evolution has already happened... and a generic callback that takes an OSSL_PARAM array is a next step.

My concern is having the libcrypto<->provider getting stuck by the public API. We've had that problem before, and you might remember that making structures opaque, i.e. hiding the inner workings from the public eye was our way out of being stuck.
My concern is having shared intimate knowledge between application and provider (the exact callback function signature in this case), when we've already stated that we should avoid that, at least for things that may vary from one operation to another (considering how varied the different specific callbacks are, that is exactly the case here).
My concern is also with any provider author, who must deal with all these variations already. There's unfortunately no escaping that, but we've made a real effort to parametrise what we cannot know very specifically, so generally dealing with OSSL_PARAM arrays isn't much of a mystery. I don't see why provider authors must then also deal with varying callback signatures.

@levitte
Copy link
Member Author

levitte commented Nov 12, 2019

All that being said, are we still discussing the OSSL_CALLBACK structure defined here, or are we really discussing #10412? Quite frankly, I would rather not get stuck here. Can we move forward, please?

I plan on closing this PR fairly soon anyway.

@levitte
Copy link
Member Author

levitte commented Nov 12, 2019

Since #10412 just got approved, it's time to close this one.

@levitte levitte closed this Nov 12, 2019
@richsalz
Copy link
Contributor

Sorry for offending. My concern is that losing type-safety to handle evolution is a misplaced concern. Oh well. I lost.

@levitte levitte mentioned this pull request Jan 7, 2020
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

branch: master Applies to master branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants