Skip to content

Conversation

@toddmeng-db
Copy link
Contributor

@toddmeng-db toddmeng-db commented May 8, 2025

Integrates proxy logic with the rest of the Spark-Driver. Also applies to Cloud-fetch and OAuth2 fetch.

Other drivers, like Impala driver, should be compatible with the change. However, it is not tested - so I marked in the README that it's not fully complete

To test, add the following to DATABRICKS_TEST_CONFIG_FILE environment variable:

  "http_options": {
    "proxy": {
      "use_proxy": "true",
      "proxy_host": "...",
      "proxy_port":..,
      "proxy_auth": "...", // if using basic auth
      "proxy_uid": "...",
      "proxy_pwd": "..."
    },
    "tls": { // you may need to enable depending on your proxy, or you need to trust the cert
      "enabled": ,
      "allow_self_signed": 
    }
  }

And inspect proxy traffic. Verified that it works in driver testing and Powerbi testing

@toddmeng-db toddmeng-db force-pushed the toddmeng-db/proxy-integrate branch from dae4d22 to 2647db0 Compare May 8, 2025 16:52
@toddmeng-db toddmeng-db reopened this May 14, 2025
move to spark layer

unit tests

fixes

message handler factory

refactor to hiveserver2

tls

refactor

delete old files

revert http client factory

fix proxy option names

feedback fixes - internal access

implement proxy

remove proxy integration test
@toddmeng-db toddmeng-db force-pushed the toddmeng-db/proxy-integrate branch from 2647db0 to 0509832 Compare May 14, 2025 21:14
if (proxyPort == null)
throw new ArgumentNullException(nameof(proxyPort));
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small improvement; forgot to check for useProxy here

@toddmeng-db toddmeng-db force-pushed the toddmeng-db/proxy-integrate branch 5 times, most recently from 2b7af16 to 5a2b044 Compare May 14, 2025 22:47
{
parameters.Add(HttpTlsOptions.TrustedCertificatePath, tlsOptions.TrustedCertificatePath!);
}
}
Copy link
Contributor Author

@toddmeng-db toddmeng-db May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a seperate PR, Will take a look into creating some BaseTestEnvironment to reduce the code re-use here

@toddmeng-db toddmeng-db force-pushed the toddmeng-db/proxy-integrate branch from 5a2b044 to def410c Compare May 14, 2025 22:54
@toddmeng-db toddmeng-db changed the title Toddmeng db/proxy integrate feat(csharp/src/Drivers/Databricks): Integrate ProxyConfigurations with Drivers May 15, 2025
@toddmeng-db toddmeng-db force-pushed the toddmeng-db/proxy-integrate branch from def410c to 5f12763 Compare May 15, 2025 01:54
@toddmeng-db toddmeng-db force-pushed the toddmeng-db/proxy-integrate branch from 5f12763 to 59c6175 Compare May 15, 2025 02:05
@toddmeng-db toddmeng-db marked this pull request as ready for review May 15, 2025 02:13
@github-actions github-actions bot added this to the ADBC Libraries 19 milestone May 15, 2025
@toddmeng-db toddmeng-db marked this pull request as draft May 15, 2025 02:15
@toddmeng-db toddmeng-db marked this pull request as ready for review May 15, 2025 02:20
| `adbc.proxy_options.proxy_auth` | Whether to enable proxy authentication. Only feature-complete in Spark driver. One of `True`, `False` | `False` |
| `adbc.proxy_options.proxy_uid` | Username for proxy authentication. Only feature-complete in Spark driver. Required when proxy_auth is True | |
| `adbc.proxy_options.proxy_pwd` | Password for proxy authentication. Only feature-complete in Spark driver. Required when proxy_auth is True | |

Copy link
Contributor Author

@toddmeng-db toddmeng-db May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will work out of the box for impala - but since I have not tested it, hence "Only feature-complete in Spark driver"

Copy link
Contributor

@eric-wang-1990 eric-wang-1990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@CurtHagenlocher CurtHagenlocher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks; looks good overall. What do you want to do about the cloud fetch timeout?

{
Timeout = TimeSpan.FromMinutes(timeoutMinutes)
};
_httpClient = httpClient;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this change, it appears that the CloudFetchTimeoutMinutes parameter isn't used any more. Should it be removed entirely? I don't have enough experience with HttpClient to know how (or whether) a single client can be used with different timeouts in different contexts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that's a good point. I'm assuming we want CloudFetch timeout to stay uniquely configurable because it can take a long time to consume large results. Let me update the PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we change the timeout on the shared client, it will also affect the other users of the client -- including, I think, the polling to keep the session alive and any subsequent commands issued for this connection. Does that seem okay? Should it be reset once the cloud fetch is done?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think to support different timeouts on the same HttpClient they would need to use the cancellation token on GetAsync, but if both HttpClient.Timeout and a CancellationToken are used then whichever one triggers first will cancel the request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the CloudFetch Http Client and the other http clients are still different HttpClients - so this commit cd636cd should only affect the CloudFetch http timeout?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay, I missed that.

using Xunit;
using Xunit.Abstractions;
using Apache.Arrow.Adbc.Tests.Drivers.Databricks;
using System.Net.Http;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: sort usings

@toddmeng-db toddmeng-db force-pushed the toddmeng-db/proxy-integrate branch from 7f7a345 to cd636cd Compare May 19, 2025 17:14
Copy link
Contributor

@CurtHagenlocher CurtHagenlocher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@CurtHagenlocher CurtHagenlocher merged commit 148bc17 into apache:main May 19, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants