-
Notifications
You must be signed in to change notification settings - Fork 173
feat(csharp/src/Drivers/Databricks): Databricks Proxy Configurator #2789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7218116 to
41fa3e5
Compare
8bb4bc0 to
d2e84db
Compare
| /// Whether to use a proxy for HTTP connections. | ||
| /// Default value is 0 (disabled) if not specified. | ||
| /// </summary> | ||
| public const string UseProxy = "UseProxy"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a general option setting for all hive traffic. Conside something like adbc.http_options.proxy.use_proxy, added to here:
| public const string DisableServerCertificateValidation = "adbc.http_options.tls.disable_server_certificate_validation"; |
| /// <summary> | ||
| /// Factory for creating HTTP clients with proper configuration | ||
| /// </summary> | ||
| public class SparkHttpMessageHandlerFactory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not need this, here is already returning a handler:
| return HiveServer2TlsImpl.NewHttpClientHandler(TlsOptions); |
We can just call the proxy configurator on top of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern is that if we don't have a factory, we will have to configure proxy every where http client is used (cloud-fetch, oauth)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think it's fine, there is only cloud fetch and oauth case.
Actually for oauth you probably can just pass the handler to the httpClient constructor:
| _httpClient = new HttpClient(); |
Try to see how bad it is, I think it's not that too bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I think it's feasible either way, but yea maybe client factory is a bit heavy weight right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm taking a look at the implementation; other concern is the NewHttpClientHandler method is static, and is called by different httpconnections (SparkHttpConnection, ImpalaHttpConnection, HiveServer2HttpConnection). Since these classes all inherit from HiveServer2Connection (and we probably don't want to expose http logic at this layer), there is no obvious place to initialize http client with proxy logic for all child http connections. I think it actually makes some sense to use factory here, then
Edit, Discussed off-line: We could still put the logic in NewHttpClientHandler method (or new method), since that class is performing many different tls configurations, anyways.
@CurtHagenlocher
e9b9272 to
f5bb996
Compare
move to spark layer unit tests fixes message handler factory
tls refactor delete old files
352fa0a to
ce171fb
Compare
| handler.UseProxy = false; | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
http client bypass list in c# expects regex strings, hence why some handling is done to make hosts in regex format. I assume we don't want to expect users to pass in regex strings (though we still allow for wildcard pattern here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add this as comments?
70d916a to
72c54ca
Compare
72c54ca to
7b79fd5
Compare
| handler.UseProxy = false; | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add this as comments?
CurtHagenlocher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I seconded the feedback about the "regex" comment. It would also be good to use "true"/"false" for enabling/disabling a feature instead of "0"/"1".
| /// <summary> | ||
| /// Interface for proxy configuration | ||
| /// </summary> | ||
| public interface IProxyConfigurator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intended to be part of the public surface area of the driver or is it "just" public for testing? If the former, does it perhaps make sense to move many proxy-related definitions into the core ADBC assembly where they could be used by multiple drivers? And if the latter, then please make them internal and use the InternalsVisibleTo attribute to expose them to tests.
We could certainly choose to make them internal now and do something different later, but public API surface should be deliberate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see - no, it's not meant to be externally used. I'll set it to internal, then.
However, we could still move the proxy parameters to ADBC-level. Would that make sense (even if there's no implementation for proxy by the other drivers)? Otherwise, let's just do that work later? @CurtHagenlocher
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Later is fine. CC @davidhcoe because I know we're looking at adding proxy support to the BigQuery driver.
| public static HiveServer2ProxyConfigurator FromProperties(IReadOnlyDictionary<string, string> properties) | ||
| { | ||
| bool useProxy = properties.TryGetValue(HttpProxyOptions.UseProxy, out string? useProxyStr) && | ||
| useProxyStr == "1"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ADBC convention is to use "true" and "false" for boolean-valued options.
f5368dc to
1f6a45a
Compare
1f6a45a to
a89b058
Compare
CurtHagenlocher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
First PR for Proxy support for Databricks Driver. Includes a SparkProxyConfigurator, HttpClientFactory that contains the Configurator, and necessary SparkParameters. Also includes some unit tests.
Tested integration with rest of driver in follow up PR.
Follow-up: Integrate with the rest of the driver. This will also include behavior for Oauth and CloudFetch
Other off-line discussions: