Skip to content

Conversation

@toddmeng-db
Copy link
Contributor

@toddmeng-db toddmeng-db commented May 6, 2025

First PR for Proxy support for Databricks Driver. Includes a SparkProxyConfigurator, HttpClientFactory that contains the Configurator, and necessary SparkParameters. Also includes some unit tests.

Tested integration with rest of driver in follow up PR.

Follow-up: Integrate with the rest of the driver. This will also include behavior for Oauth and CloudFetch

Other off-line discussions:

  • Kerberos proxy login not currently supported in JDBC OSS, so we're not currently supporting here.

@toddmeng-db toddmeng-db force-pushed the toddmeng-db/proxy branch 5 times, most recently from 7218116 to 41fa3e5 Compare May 6, 2025 23:00
@toddmeng-db toddmeng-db changed the title databricks proxy configuration feat(csharp/src/Drivers/Databricks) Databricks Proxy Configurator May 6, 2025
@toddmeng-db toddmeng-db force-pushed the toddmeng-db/proxy branch 2 times, most recently from 8bb4bc0 to d2e84db Compare May 7, 2025 00:02
/// Whether to use a proxy for HTTP connections.
/// Default value is 0 (disabled) if not specified.
/// </summary>
public const string UseProxy = "UseProxy";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a general option setting for all hive traffic. Conside something like adbc.http_options.proxy.use_proxy, added to here:

public const string DisableServerCertificateValidation = "adbc.http_options.tls.disable_server_certificate_validation";

/// <summary>
/// Factory for creating HTTP clients with proper configuration
/// </summary>
public class SparkHttpMessageHandlerFactory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not need this, here is already returning a handler:

return HiveServer2TlsImpl.NewHttpClientHandler(TlsOptions);
.
We can just call the proxy configurator on top of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern is that if we don't have a factory, we will have to configure proxy every where http client is used (cloud-fetch, oauth)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think it's fine, there is only cloud fetch and oauth case.
Actually for oauth you probably can just pass the handler to the httpClient constructor:

Try to see how bad it is, I think it's not that too bad.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I think it's feasible either way, but yea maybe client factory is a bit heavy weight right now.

Copy link
Contributor Author

@toddmeng-db toddmeng-db May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm taking a look at the implementation; other concern is the NewHttpClientHandler method is static, and is called by different httpconnections (SparkHttpConnection, ImpalaHttpConnection, HiveServer2HttpConnection). Since these classes all inherit from HiveServer2Connection (and we probably don't want to expose http logic at this layer), there is no obvious place to initialize http client with proxy logic for all child http connections. I think it actually makes some sense to use factory here, then

Edit, Discussed off-line: We could still put the logic in NewHttpClientHandler method (or new method), since that class is performing many different tls configurations, anyways.
@CurtHagenlocher

@toddmeng-db toddmeng-db force-pushed the toddmeng-db/proxy branch 2 times, most recently from e9b9272 to f5bb996 Compare May 7, 2025 06:37
move to spark layer

unit tests

fixes

message handler factory
tls

refactor

delete old files
@toddmeng-db toddmeng-db force-pushed the toddmeng-db/proxy branch 4 times, most recently from 352fa0a to ce171fb Compare May 7, 2025 17:02
handler.UseProxy = false;
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http client bypass list in c# expects regex strings, hence why some handling is done to make hosts in regex format. I assume we don't want to expect users to pass in regex strings (though we still allow for wildcard pattern here)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add this as comments?

@toddmeng-db toddmeng-db force-pushed the toddmeng-db/proxy branch 2 times, most recently from 70d916a to 72c54ca Compare May 7, 2025 17:38
@toddmeng-db toddmeng-db force-pushed the toddmeng-db/proxy branch from 72c54ca to 7b79fd5 Compare May 7, 2025 17:39
@toddmeng-db toddmeng-db changed the title feat(csharp/src/Drivers/Databricks) Databricks Proxy Configurator feat(csharp/src/Drivers/Databricks): Databricks Proxy Configurator May 7, 2025
@toddmeng-db toddmeng-db marked this pull request as ready for review May 7, 2025 17:52
@github-actions github-actions bot added this to the ADBC Libraries 19 milestone May 7, 2025
@toddmeng-db toddmeng-db requested a review from eric-wang-1990 May 8, 2025 18:16
handler.UseProxy = false;
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add this as comments?

Copy link
Contributor

@CurtHagenlocher CurtHagenlocher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I seconded the feedback about the "regex" comment. It would also be good to use "true"/"false" for enabling/disabling a feature instead of "0"/"1".

/// <summary>
/// Interface for proxy configuration
/// </summary>
public interface IProxyConfigurator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended to be part of the public surface area of the driver or is it "just" public for testing? If the former, does it perhaps make sense to move many proxy-related definitions into the core ADBC assembly where they could be used by multiple drivers? And if the latter, then please make them internal and use the InternalsVisibleTo attribute to expose them to tests.

We could certainly choose to make them internal now and do something different later, but public API surface should be deliberate.

Copy link
Contributor Author

@toddmeng-db toddmeng-db May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see - no, it's not meant to be externally used. I'll set it to internal, then.

However, we could still move the proxy parameters to ADBC-level. Would that make sense (even if there's no implementation for proxy by the other drivers)? Otherwise, let's just do that work later? @CurtHagenlocher

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Later is fine. CC @davidhcoe because I know we're looking at adding proxy support to the BigQuery driver.

public static HiveServer2ProxyConfigurator FromProperties(IReadOnlyDictionary<string, string> properties)
{
bool useProxy = properties.TryGetValue(HttpProxyOptions.UseProxy, out string? useProxyStr) &&
useProxyStr == "1";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ADBC convention is to use "true" and "false" for boolean-valued options.

Copy link
Contributor

@CurtHagenlocher CurtHagenlocher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@CurtHagenlocher CurtHagenlocher merged commit 27c68c0 into apache:main May 14, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants