Skip to content

Add TLS verify callback for Python and Ruby#12656

Closed
JackOfMostTrades wants to merge 8 commits intogrpc:masterfrom
JackOfMostTrades:verify-callback
Closed

Add TLS verify callback for Python and Ruby#12656
JackOfMostTrades wants to merge 8 commits intogrpc:masterfrom
JackOfMostTrades:verify-callback

Conversation

@JackOfMostTrades
Copy link
Copy Markdown
Contributor

Release 1.6.0 of grpc-java added support for using a custom TLS hostname verifier with clients. This feature parity doesn't exist for gRPC clients in other languages, which this PR tries to address (although an "override hostname" can be specified, support for an arbitrary callback is missing). In particular this adds support for a verification callback in JavaScript and Python. As an example, callbacks in JS look like:

grpc.credentials.createSsl(
    fs.readFileSync("truststore.pem"),
    fs.readFileSync("client.pem.key"),
    fs.readFileSync("clent.pem.crt"),
    {
      "insecureSkipHostnameVerify": true,
      "checkServerIdentity": function(host, cert) {
        authContext = newAuthContextFromPem(cert);
        if (authContext.name != expectedName || (authContext.environment != myEnvironment) {
          throw "Invalid server certificate presented";
        }
      }
    });

The "checkServerIdentity" idiom was chosen to match the same option available to TLS clients with the built-in node library. (InsecurySkipHostnameVerify was inspired by golang's InsecureSkipVerify option.)

In Python it looks like this:

def checkServerIdentity(host, cert):
     authContext = newAuthContextFromPem(cert)
     if authContext.name != expectedName or authContext.environment != myEnvironment:
        return False
     return True

grpc.ssl_channel_credentials(
    clientTrust.encode('utf-8'),
    open(clientKeystore[1]).read().encode('utf-8'),
    open(clientKeystore[0]).read().encode('utf-8'),
    {
      "insecureSkipHostnameVerify": True,
      "checkServerIdentity": checkServerIdentity
    })

For better context, the motivation for having callbacks like this stems from our internal PKI infrastructure where multidimensional identity information is added to various proprietary X.509 extensions. The policy on exactly which dimensions are relevant and should be checked varies from client to client so it's important that each client be able to supply verification logic.
In a less proprietary context, custom verifiers are often used as a supplemental check to do things like certificate or CA pinning.

We've been using an internal fork based on this PR for a few months now, so at least for JS and Python I have some practical confidence that this works as intended.

@grpc-kokoro
Copy link
Copy Markdown
Collaborator

Thanks for your pull request. The automated tests will run as soon as one of the admins verifies this change is ok for us to run on our infrastructure.

14 similar comments
@grpc-kokoro
Copy link
Copy Markdown
Collaborator

Thanks for your pull request. The automated tests will run as soon as one of the admins verifies this change is ok for us to run on our infrastructure.

@grpc-testing
Copy link
Copy Markdown

Thanks for your pull request. The automated tests will run as soon as one of the admins verifies this change is ok for us to run on our infrastructure.

@grpc-kokoro
Copy link
Copy Markdown
Collaborator

Thanks for your pull request. The automated tests will run as soon as one of the admins verifies this change is ok for us to run on our infrastructure.

@grpc-kokoro
Copy link
Copy Markdown
Collaborator

Thanks for your pull request. The automated tests will run as soon as one of the admins verifies this change is ok for us to run on our infrastructure.

@grpc-testing
Copy link
Copy Markdown

Thanks for your pull request. The automated tests will run as soon as one of the admins verifies this change is ok for us to run on our infrastructure.

@grpc-testing
Copy link
Copy Markdown

Thanks for your pull request. The automated tests will run as soon as one of the admins verifies this change is ok for us to run on our infrastructure.

@grpc-kokoro
Copy link
Copy Markdown
Collaborator

Thanks for your pull request. The automated tests will run as soon as one of the admins verifies this change is ok for us to run on our infrastructure.

@grpc-testing
Copy link
Copy Markdown

Thanks for your pull request. The automated tests will run as soon as one of the admins verifies this change is ok for us to run on our infrastructure.

@grpc-kokoro
Copy link
Copy Markdown
Collaborator

Thanks for your pull request. The automated tests will run as soon as one of the admins verifies this change is ok for us to run on our infrastructure.

@grpc-kokoro
Copy link
Copy Markdown
Collaborator

Thanks for your pull request. The automated tests will run as soon as one of the admins verifies this change is ok for us to run on our infrastructure.

@grpc-testing
Copy link
Copy Markdown

Thanks for your pull request. The automated tests will run as soon as one of the admins verifies this change is ok for us to run on our infrastructure.

@grpc-kokoro
Copy link
Copy Markdown
Collaborator

Thanks for your pull request. The automated tests will run as soon as one of the admins verifies this change is ok for us to run on our infrastructure.

@grpc-kokoro
Copy link
Copy Markdown
Collaborator

Thanks for your pull request. The automated tests will run as soon as one of the admins verifies this change is ok for us to run on our infrastructure.

@grpc-kokoro
Copy link
Copy Markdown
Collaborator

Thanks for your pull request. The automated tests will run as soon as one of the admins verifies this change is ok for us to run on our infrastructure.

@murgatroid99
Copy link
Copy Markdown
Member

This looks like a breaking change in the Node API, so we probably wouldn't accept it without a proposal for an API change to be included in the next major version bump. Also, development on the node library has been moved to https://github.com/grpc/grpc-node.

@JackOfMostTrades
Copy link
Copy Markdown
Contributor Author

Ah, good to know, I'll check out that repo. That said, I don't think it's a breaking change; it adds an optional 4th parameter (JS object), so if left undefined existing behavior would be unchanged.

@murgatroid99
Copy link
Copy Markdown
Member

You're right, it's not a breaking change. I mixed it up with the optional parameter at the end of the server credentials SSL factory function.

Copy link
Copy Markdown
Contributor

@jboeuf jboeuf Oct 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinburke FYI.
It doesn't look like this is consistent with what we discussed here: #10721
Am I missing something?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, you're right. First pass at implementing this we were aiming to drop in identical code logic that we had in Java where specifying a custom hostname verifier completely replaces the original hostname verification.

But I'll concede that's a separate (and possibly undesirable) concern. Let me take another pass at this PR with the skip option pulled out.

@thelinuxfoundation
Copy link
Copy Markdown

Thank you for your pull request. Before we can look at your contribution, we need to ensure all contributors are covered by a Contributor License Agreement.

After the following items are addressed, please respond with a new comment here, and the automated system will re-verify.

Regards,
The Linux Foundation CLA GitHub bot

1 similar comment
@thelinuxfoundation
Copy link
Copy Markdown

Thank you for your pull request. Before we can look at your contribution, we need to ensure all contributors are covered by a Contributor License Agreement.

After the following items are addressed, please respond with a new comment here, and the automated system will re-verify.

Regards,
The Linux Foundation CLA GitHub bot

@JackOfMostTrades
Copy link
Copy Markdown
Contributor Author

Ok, I finally got around to pulling out the skip_hostname_verify option, so what's left is just the checkServerIdentity callback which is strictly supplemental to hostname verification.

Copy link
Copy Markdown
Contributor

@mehrdada mehrdada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Review specific to Python/Cython files]
Thanks! Looks mostly good, with minor nits.

private_key=None,
certificate_chain=None):
certificate_chain=None,
verify_options=None):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments about verify_options:

  • Please add API documentation.
  • It seems there is only one option available at the moment, correct? What are the additional options that might be potentially added in the future. Should they go under verify_options or as an additional named argument? Should the singular option become the argument instead? (e.g. verify_callback=callback_fn)
  • I prefer validation_options or verification_options or just options if this is supposed to be more generic (based on the answer to the above question).

Copy link
Copy Markdown
Contributor Author

@JackOfMostTrades JackOfMostTrades Jul 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a bit of churn before the PR was opened that led to this being an object. At first there were more options (like assert fingerprint, skip hostname verification, ...) but all that got pulled out. Given that, I think it would be perfectly fine just to have a verify_callback=None argument that could be passed in directly. It could be helpful to forward-plan for additional options inside the verify_options, but I can't name any off the top of my head I think should be added that the callback couldn't implement. I also think it's reasonably pythonic if we wanted to support more options in the future to just to add those as additional optional named arguments.

So given that context, unless you think otherwise I'll just change this to a verify_callback parameter.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with verify_callback!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! PR updates pushed.

@mehrdada mehrdada dismissed nathanielmanistaatgoogle’s stale review July 27, 2018 14:51

Changes requested have been made

@mehrdada
Copy link
Copy Markdown
Contributor

Python/Cython changes LGTM. Pinging @apolcyn for Ruby review & @jiangtaoli2016 for security review.

@grpc-testing
Copy link
Copy Markdown

Objective-C binary sizes
*****************STATIC******************
  New size                      Old size
 1,976,610      Total (=)      1,976,610

 No significant differences in binary sizes

***************FRAMEWORKS****************
  New size                      Old size
10,771,521      Total (>)     10,771,519

 No significant differences in binary sizes


@mehrdada
Copy link
Copy Markdown
Contributor

@JackOfMostTrades Please take a look at Sanity test failures. (pylint is whining in particular). You can run them locally too, with tools/run_tests/run_tests.py -lsanity --use_docker -j 10

@mehrdada mehrdada changed the title Add TLS verify callback for Python and JS Add TLS verify callback for Python and Ruby Jul 27, 2018
@JackOfMostTrades
Copy link
Copy Markdown
Contributor Author

Thanks for the pointer. Looks like pylint didn't like my use of nonlocal in the test, so I just refactored that a tiny bit. Apparently I also needed to run generate_projects, yapf, and clang_format. So hopefully it should be all fixed up properly now!

@grpc-testing
Copy link
Copy Markdown

Objective-C binary sizes
*****************STATIC******************
  New size                      Old size
 1,976,654      Total (=)      1,976,654

 No significant differences in binary sizes

***************FRAMEWORKS****************
  New size                      Old size
10,775,735      Total (<)     10,775,739

 No significant differences in binary sizes


@mehrdada
Copy link
Copy Markdown
Contributor

Python tests are now failing (sounds like an argument mismatch). Those tests are runnable locally too, with tools/run_tests/run_tests.py -lpython --use_docker -j 10 --newline_on_success.

params.servername = servername;
params.cert = cert;

return rb_thread_call_with_gvl(invoke_rb_verify_callback_with_gvl, &params) ==
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the way this callback is invoked with respect to concurrency needs to be changed.

The C-Core is completely non-blocking and asynchronous. Since some C-Core thread (which may or may not be a ruby thread), is directly acquiring the interpreter lock and calling in to this ruby application handler, this has the potential to indefinitely block a C-Core thread and prevent intended work from being done.

One fix for this could be to dedicate a ruby thread to the task of invoking these callbacks. Such a thread could poll for work on a queue of "ready-to-run" "veroify peer callbacks". The actual verify peer callback handler that the C-Core invokes would just add to that queue, signal conditional variable, etc.. We already have such a thread for the task is invoking ruby-defined "call credentials" callbacks (See https://github.com/grpc/grpc/blob/master/src/ruby/ext/grpc/rb_event_thread.c#L126), so I think that adding another would could be reasonable. An optimization could be to combine both of these thread in to one, but I'm not sure that's worth doing.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ on second though, it might actually be simpler to combiner invocation of these handlers in to the call credentials callback thread, by using e.g. a "queue node tag type", but I'm not sure

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^^ Oooh... good point!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, just to add, granted an indefinitely blocking application callback would also deadlock a dedicated callback thread and prevent more callbacks from being fired, but IMO overall the dedicated thread is simpler; it's difficult to reason about what happens when a c-core thread acquires a ruby lock and runs ruby code.

Copy link
Copy Markdown
Contributor Author

@JackOfMostTrades JackOfMostTrades Jul 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, what you're saying makes total sense, but the verify callback exposed by C-Core at present expects the callback to be synchronous so it doesn't pass in the on_peer_checked closure that would allow the callback to be asynchronous.
I don't think it would be a big deal to tweak the C-Core API so that callbacks are asynchronous. Which sounds like it might be important given interpreter locks. Is there anyone that should weight in before I go down the route of changing that core API?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @jiangtaoli2016 @markdroth for this core API change proposal (to allow the verify peer callback to be async)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it's not okay to block in a callback from a C-core thread.

I'm not thrilled about adding yet another case where we need an asynchronous callback from C-core to the application. We already have a few of these, and they've caused a number of headaches over time; it turns out to be fairly tricky to get the synchronization and state machinery right when leaving and re-entering C-core.

That having been said, I don't have a better suggestion, and the C-core API is not a public API, so we do have the flexibility to change this later if we can come up with something better. So I'd say go ahead and add the callback to the API.

I suggest that you model the changes after one of the existing implementations. The best one to look at is probably the metadata credentials plugin, whose API is defined here:

https://github.com/grpc/grpc/blob/master/include/grpc/grpc_security.h#L304

The implementation is here:

https://github.com/grpc/grpc/blob/master/src/core/lib/security/credentials/plugin/plugin_credentials.cc
https://github.com/grpc/grpc/blob/master/src/core/lib/security/credentials/plugin/plugin_credentials.h

Note that this API allows both sync and async returns. The sync return is much simpler and more efficient in the case where no blocking is needed; the async return requires a callback and a thread hop, so it's more complex and less efficient.

One thing that is not handled correctly in the metadata credentials plugin API is cancellation. Unfortunately, this API didn't originally support cancellation, and all of the wrapped language APIs were added without cancellation support. When we later realized that we needed to support this, we were able to change the C-core API, but it was too late to fix the wrapped language APIs. So we did a poor man's approach: we have C-core mark the request as cancelled, and when the wrapped language eventually invokes the callback, nothing actually happens. But this means that we still wind up doing a lot of processing in the application that isn't actually necessary, because we have no way to inform it that the work is no longer needed. I highly recommend not making that mistake with this new API: let's make sure that we expose a way of propagating cancelations to the wrapped language, so that this can be handled as a first-class citizen.

Please let me know if you have any questions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the pointers! I'll work on putting together an update to the core API and put that in a new PR. I'll come back to the python/ruby bindings in this PR once we get that ironed out. It'll probably be a little bit before I can get through that, so I'll flag you on it when it's ready.

@GuillaumeCisco
Copy link
Copy Markdown

GuillaumeCisco commented Feb 5, 2019

Hey there, thank you for creating this PR. I have an issue where I get a CERTIFICATE_VERIFY_FAILED.

I've tried this branch in my project, but it never run into the verify_callback method.
Is this branch working right now?

Thank you,

EDIT: In fact it works if there is no CERTIFICATE_VERIFY_FAILED, otherwise I cannot know why the certificate verify failed. :/

EDIT 2: I succeeded making my network works. It was a config error.

@sheenaqotj
Copy link
Copy Markdown
Contributor

Thanks for this change. It looks like it's in flux with a dependency on another PR. Feel free to create a new request or reopen if this needs more attention.

@sheenaqotj sheenaqotj closed this Feb 12, 2019
@lock lock bot locked as resolved and limited conversation to collaborators May 13, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.