-
Notifications
You must be signed in to change notification settings - Fork 173
feat(csharp/src/Drivers/Databricks): Integrate ProxyConfigurations with Drivers #2794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(csharp/src/Drivers/Databricks): Integrate ProxyConfigurations with Drivers #2794
Conversation
dae4d22 to
2647db0
Compare
move to spark layer unit tests fixes message handler factory refactor to hiveserver2 tls refactor delete old files revert http client factory fix proxy option names feedback fixes - internal access implement proxy remove proxy integration test
2647db0 to
0509832
Compare
| if (proxyPort == null) | ||
| throw new ArgumentNullException(nameof(proxyPort)); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small improvement; forgot to check for useProxy here
2b7af16 to
5a2b044
Compare
| { | ||
| parameters.Add(HttpTlsOptions.TrustedCertificatePath, tlsOptions.TrustedCertificatePath!); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a seperate PR, Will take a look into creating some BaseTestEnvironment to reduce the code re-use here
5a2b044 to
def410c
Compare
def410c to
5f12763
Compare
5f12763 to
59c6175
Compare
| | `adbc.proxy_options.proxy_auth` | Whether to enable proxy authentication. Only feature-complete in Spark driver. One of `True`, `False` | `False` | | ||
| | `adbc.proxy_options.proxy_uid` | Username for proxy authentication. Only feature-complete in Spark driver. Required when proxy_auth is True | | | ||
| | `adbc.proxy_options.proxy_pwd` | Password for proxy authentication. Only feature-complete in Spark driver. Required when proxy_auth is True | | | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will work out of the box for impala - but since I have not tested it, hence "Only feature-complete in Spark driver"
eric-wang-1990
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
CurtHagenlocher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks; looks good overall. What do you want to do about the cloud fetch timeout?
| { | ||
| Timeout = TimeSpan.FromMinutes(timeoutMinutes) | ||
| }; | ||
| _httpClient = httpClient; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this change, it appears that the CloudFetchTimeoutMinutes parameter isn't used any more. Should it be removed entirely? I don't have enough experience with HttpClient to know how (or whether) a single client can be used with different timeouts in different contexts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that's a good point. I'm assuming we want CloudFetch timeout to stay uniquely configurable because it can take a long time to consume large results. Let me update the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we change the timeout on the shared client, it will also affect the other users of the client -- including, I think, the polling to keep the session alive and any subsequent commands issued for this connection. Does that seem okay? Should it be reset once the cloud fetch is done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think to support different timeouts on the same HttpClient they would need to use the cancellation token on GetAsync, but if both HttpClient.Timeout and a CancellationToken are used then whichever one triggers first will cancel the request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the CloudFetch Http Client and the other http clients are still different HttpClients - so this commit cd636cd should only affect the CloudFetch http timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, okay, I missed that.
| using Xunit; | ||
| using Xunit.Abstractions; | ||
| using Apache.Arrow.Adbc.Tests.Drivers.Databricks; | ||
| using System.Net.Http; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: sort usings
7f7a345 to
cd636cd
Compare
CurtHagenlocher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Integrates proxy logic with the rest of the Spark-Driver. Also applies to Cloud-fetch and OAuth2 fetch.
Other drivers, like Impala driver, should be compatible with the change. However, it is not tested - so I marked in the README that it's not fully complete
To test, add the following to
DATABRICKS_TEST_CONFIG_FILEenvironment variable:And inspect proxy traffic. Verified that it works in driver testing and Powerbi testing