Resolve token_audience from host metadata for account hosts#1321
Merged
hectorcast-db merged 1 commit intomainfrom Mar 12, 2026
Merged
Conversation
This was referenced Mar 11, 2026
ebf41bd to
5c26e24
Compare
5c26e24 to
d97bd75
Compare
github-merge-queue bot
pushed a commit
that referenced
this pull request
Mar 11, 2026
…e is_account_client docstrings (#1317) ## 🥞 Stacked PR Use this [link](https://github.com/databricks/databricks-sdk-py/pull/1317/files) to review incremental changes. - [**stack/use-test-environment-type-in-integration-tests**](#1317) [[Files changed](https://github.com/databricks/databricks-sdk-py/pull/1317/files)] - [stack/resolve-host-metadata-on-init](#1318) [[Files changed](https://github.com/databricks/databricks-sdk-py/pull/1318/files/0eff155d0ca7fc796d878cf9ee332dac5149be6a..63156f63eb80992d0c25a7b341d0b19b77dc9527)] - [stack/cloud-from-host-metadata](#1320) [[Files changed](https://github.com/databricks/databricks-sdk-py/pull/1320/files/63156f63eb80992d0c25a7b341d0b19b77dc9527..f9b0c3e2df72ddecb0fccb0d59af46f3a06846d2)] - [stack/resolve-token-audience-from-host-metadata](#1321) [[Files changed](https://github.com/databricks/databricks-sdk-py/pull/1321/files/f9b0c3e2df72ddecb0fccb0d59af46f3a06846d2..d97bd75a5692faddc36d51d46208513e442e7c87)] - [stack/google-auth-use-host-metadata](#1322) [[Files changed](https://github.com/databricks/databricks-sdk-py/pull/1322/files/d97bd75a5692faddc36d51d46208513e442e7c87..87b4c1a35e3fe1abfb125069ba1590eeaecf34e9)] --------- ## Summary Replace `is_account_client` heuristics and `DATABRICKS_ACCOUNT_ID` presence checks in integration test fixtures with explicit `TEST_ENVIRONMENT_TYPE`-based filtering. Also deprecate `is_account_client` in docstrings across `Config` and `ApiClient`. ## Why Integration test fixtures in `tests/integration/conftest.py` used two different heuristics to decide whether a test should run: 1. `account_client.config.is_account_client` — inspects the host URL to infer whether the SDK is talking to an account-level API. This is unreliable now that unified hosts can serve both account and workspace APIs, and the property already raises `ValueError` when `experimental_is_unified_host` is set. 2. `"DATABRICKS_ACCOUNT_ID" in os.environ` — a proxy check used to skip workspace fixtures on account environments. This is fragile and doesn't express intent clearly. The test orchestrator now sets `TEST_ENVIRONMENT_TYPE` explicitly to one of `ACCOUNT`, `UC_ACCOUNT`, `WORKSPACE`, or `UC_WORKSPACE`. Using this value directly makes fixture skip-guards authoritative and easy to reason about. ## What changed ### Interface changes None. No public SDK API changes. ### Behavioral changes None for SDK users. Integration test fixture skip conditions are now driven by `TEST_ENVIRONMENT_TYPE` instead of `is_account_client` and `DATABRICKS_ACCOUNT_ID`. ### Internal changes - **`tests/integration/conftest.py`**: All four client fixtures (`a`, `ucacct`, `w`, `ucws`) now check `TEST_ENVIRONMENT_TYPE` against an explicit allowlist instead of using `is_account_client` or `DATABRICKS_ACCOUNT_ID`. - **`databricks/sdk/config.py`**: Updated `is_account_client` docstring to note deprecation — clients can now support both workspace and account APIs from a single host. - **`databricks/sdk/core.py`**: Added matching deprecation note to `ApiClient.is_account_client`. ## How is this tested? The changed code is test infrastructure itself. Correctness will be validated when integration tests run against environments that set `TEST_ENVIRONMENT_TYPE`. NO_CHANGELOG=true
d97bd75 to
51c62a9
Compare
github-merge-queue bot
pushed a commit
that referenced
this pull request
Mar 11, 2026
#1318) ## 🥞 Stacked PR Use this [link](https://github.com/databricks/databricks-sdk-py/pull/1318/files) to review incremental changes. - [**stack/resolve-host-metadata-on-init**](#1318) [[Files changed](https://github.com/databricks/databricks-sdk-py/pull/1318/files)] - [stack/cloud-from-host-metadata](#1320) [[Files changed](https://github.com/databricks/databricks-sdk-py/pull/1320/files/cd35488c886dc7a57ff3b1f79d70d1089a3ab6cf..e1872e347ea0b17197154389a80381d84d99a210)] - [stack/resolve-token-audience-from-host-metadata](#1321) [[Files changed](https://github.com/databricks/databricks-sdk-py/pull/1321/files/e1872e347ea0b17197154389a80381d84d99a210..51c62a915c2a3d0bd3c5ad720c26c5c79647976e)] - [stack/google-auth-use-host-metadata](#1322) [[Files changed](https://github.com/databricks/databricks-sdk-py/pull/1322/files/51c62a915c2a3d0bd3c5ad720c26c5c79647976e..9898fc156ef3d0cadc3c254e033cfaf20caa537d)] --------- ## Summary Call `_resolve_host_metadata()` automatically during `Config.__init__` so that `account_id`, `workspace_id`, `discovery_url`, and `cloud` are populated from the `/.well-known/databricks-config` endpoint before auth is initialized. Metadata fetch failures are non-fatal — a warning is logged and initialization continues with whatever the user has explicitly configured. ## Why `_resolve_host_metadata()` already existed but had to be called explicitly by callers who wanted auto-discovery. This meant that `WorkspaceClient` and `AccountClient` did not benefit from it out of the box. Beyond the missing call, the old implementation was also strictly fatal: any HTTP error or missing field (`account_id`, `oidc_endpoint`) raised a `ValueError`, making it unsafe to run automatically since not all hosts expose the endpoint. This PR wires up the call during `Config.__init__` (between `_fix_host_if_needed` and `_validate`) and hardens the implementation so it degrades gracefully: HTTP failures produce a warning, and missing optional fields are simply left unset. The only remaining hard failure is when a `{account_id}` template in `oidc_endpoint` cannot be substituted because no `account_id` is available — this is a configuration error that should still surface. Auto-discovery is currently gated to hosts with `experimental_is_unified_host=True` (a `# TODO: Enable this everywhere` marks the guard) to avoid unintended network calls on standard workspace and account deployments until the endpoint is broadly available. ## What changed ### Interface changes None. `_resolve_host_metadata()` remains private. ### Behavioral changes For hosts with `experimental_is_unified_host=True`: `Config.__init__` now makes an HTTP request to `{host}/.well-known/databricks-config` and silently back-fills `account_id`, `workspace_id`, `discovery_url`, and `cloud` from the response. Previously this did not happen. All other hosts: no change. ### Internal changes - **`databricks/sdk/config.py`**: Added `_resolve_host_metadata()` call to `Config.__init__`. Added `HostType.UNIFIED` guard. Made HTTP failures non-fatal (warning + return). Removed mandatory-field assertions (`account_id is required`, `discovery_url is required`) — missing fields are now silently skipped. - **`tests/conftest.py`**: Added autouse `stub_host_metadata` fixture that mocks `get_host_metadata` for all unit tests, preventing unexpected network calls. - **`tests/test_config.py`**: Replaced the old `requests_mock`-based tests (which called `_resolve_host_metadata()` directly) with `mocker.patch`-based tests that exercise the full `Config.__init__` path. Added `test_resolve_host_metadata_skipped_for_non_unified` to assert the guard works. ## How is this tested? Unit tests in `tests/test_config.py` cover: successful resolution of all fields, no-overwrite of pre-configured values, HTTP failure tolerance, missing `oidc_endpoint`, and the guard that skips non-unified hosts. NO_CHANGELOG=true --------- Co-authored-by: Claude Sonnet 4.6 <[email protected]>
51c62a9 to
56f3705
Compare
github-merge-queue bot
pushed a commit
that referenced
this pull request
Mar 12, 2026
## 🥞 Stacked PR Use this [link](https://github.com/databricks/databricks-sdk-py/pull/1320/files) to review incremental changes. - [**stack/cloud-from-host-metadata**](#1320) [[Files changed](https://github.com/databricks/databricks-sdk-py/pull/1320/files)] - [stack/resolve-token-audience-from-host-metadata](#1321) [[Files changed](https://github.com/databricks/databricks-sdk-py/pull/1321/files/e846874ec446c75225cfb26864de20a8906b34fd..56f3705b9513f9b1fe43ea27ffacc7463865b1fe)] - [stack/google-auth-use-host-metadata](#1322) [[Files changed](https://github.com/databricks/databricks-sdk-py/pull/1322/files/56f3705b9513f9b1fe43ea27ffacc7463865b1fe..cae02e48930199b9ebb3bc37fe34f9a849c6708d)] --------- ## Summary Port of [databricks-sdk-go#1512](databricks/databricks-sdk-go#1512). Adds an explicit `cloud` field to `Config` (and `HostMetadata`) that is populated from `/.well-known/databricks-config`. `is_aws`, `is_azure`, and `is_gcp` now prefer this field over DNS-based hostname inference. ## Why `is_aws()`, `is_azure()`, and `is_gcp()` previously inferred the cloud provider by suffix-matching the workspace hostname against a hardcoded list of known DNS zones (e.g. `.azuredatabricks.net`, `.gcp.databricks.com`). This works for standard deployments but silently misclassifies non-standard hostnames — vanity domains, unified hosts, or any future topology where the hostname doesn't encode the cloud provider. In those cases the SDK fell back to `DefaultEnvironment` (AWS), which is wrong. The `/.well-known/databricks-config` endpoint is already the authoritative source for host metadata and returns a `cloud` field, but the SDK was discarding it. Threading that value through to `Config` lets the SDK make correct cloud-detection decisions without relying on hostname patterns. ## What changed ### Interface changes - **`Config.cloud: Cloud`** — new optional `ConfigAttribute` (env: `DATABRICKS_CLOUD`). Accepts `"AWS"`, `"AZURE"`, or `"GCP"` (case-insensitive). When set — either explicitly or auto-populated from metadata — `is_aws`, `is_azure`, and `is_gcp` use this value directly. - **`Cloud.parse(value: str) -> Optional[Cloud]`** — new classmethod on the `Cloud` enum for case-insensitive parsing. Returns `None` for empty or unrecognized values (forward-compatible with future cloud values the API may introduce). - **`HostMetadata.cloud: Optional[Cloud]`** — new field, populated from the `"cloud"` key in the discovery response. ### Behavioral changes - `is_aws`, `is_azure`, `is_gcp` now check `Config.cloud` first. If unset they fall back to DNS-based detection — no change for existing deployments with standard hostnames. - `azure_workspace_resource_id` continues to take precedence over `Config.cloud` for `is_azure`, preserving existing Azure MSI behavior. - Auto-discovery note: the new endpoint has not been rolled out to all hosts yet, so `cloud` will remain unset for most deployments until broader rollout. ### Internal changes - **`databricks/sdk/environments.py`**: Added `Cloud.parse()` classmethod. - **`databricks/sdk/oauth.py`**: Added `cloud` field to `HostMetadata`; `from_dict` and `as_dict` updated accordingly. - **`databricks/sdk/config.py`**: Added `_parse_cloud` helper, `cloud` `ConfigAttribute`, updated `is_aws/is_azure/is_gcp` to check `self.cloud` first, added `cloud` back-fill in `_resolve_host_metadata`. - **`tests/test_config.py`**: 10 new tests covering case-insensitive parsing, explicit `cloud` overriding DNS detection for each provider, DNS fallback, `azure_workspace_resource_id` precedence, and metadata population/non-overwrite. ## How is this tested? Unit tests in `tests/test_config.py` and `tests/test_oauth.py`. No integration tests — the endpoint is not yet available on all hosts. NO_CHANGELOG=true Co-authored-by: Claude Sonnet 4.6 <[email protected]>
56f3705 to
ea2976e
Compare
hectorcast-db
commented
Mar 12, 2026
|
|
||
| # Audience for OIDC ID token source accepting an audience as a parameter. | ||
| # For example, the GitHub action ID token source. | ||
| token_audience: str = ConfigAttribute(env="DATABRICKS_TOKEN_AUDIENCE", auth="github-oidc") |
Contributor
Author
There was a problem hiding this comment.
This is not really specific to github-oidc. But having the auth annotation makes it that the validation fails if token_audience is set for any other auth type.
When /.well-known/databricks-config returns no workspace_id the host is acting as an account-level endpoint. Set token_audience to account_id so OIDC auth flows use the correct audience without requiring explicit config. User-provided token_audience is never overwritten. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
ea2976e to
191ec90
Compare
|
If integration tests don't run automatically, an authorized user can run them manually by following the instructions below: Trigger: Inputs:
Checks will be approved automatically on success. |
github-merge-queue bot
pushed a commit
to databricks/databricks-sdk-go
that referenced
this pull request
Mar 17, 2026
## 🥞 Stacked PR Use this [link](https://github.com/databricks/databricks-sdk-go/pull/1543/files) to review incremental changes. - [**stack/port/resolve-token-audience-from-metadata**](#1543) [[Files changed](https://github.com/databricks/databricks-sdk-go/pull/1543/files)] - [stack/port/gcp-sa-token-non-blocking](#1544) [[Files changed](https://github.com/databricks/databricks-sdk-go/pull/1544/files/01d8ab7988cf049ece2bf295acc9218f6dd82e07..ab7e3d841888a136a21a2c95549392f137d6b523)] - [stack/port/test-environment-type](#1545) [[Files changed](https://github.com/databricks/databricks-sdk-go/pull/1545/files/ab7e3d841888a136a21a2c95549392f137d6b523..0e20fea87196f5b177313bfb6c3ef5c8ec678bf0)] - [stack/port/host-metadata-integration-test](#1546) [[Files changed](https://github.com/databricks/databricks-sdk-go/pull/1546/files/0e20fea87196f5b177313bfb6c3ef5c8ec678bf0..f9d041dbd73e531215a30dab6b0441d192f4bfb5)] - [stack/port/remove-unified-flag](#1547) [[Files changed](https://github.com/databricks/databricks-sdk-go/pull/1547/files/f9d041dbd73e531215a30dab6b0441d192f4bfb5..086adf0511ffbff5cc9edb9dc5159b5dad9b7299)] - [stack/port/gcp-sa-from-metadata](#1548) [[Files changed](https://github.com/databricks/databricks-sdk-go/pull/1548/files/086adf0511ffbff5cc9edb9dc5159b5dad9b7299..c8a54deafc36e70c86f869231f8b5d09b044f28f)] --------- ## Summary - Port of Python SDK PR databricks/databricks-sdk-py#1321 - When host metadata indicates an account host (no workspace_id) and account_id is present, automatically set `TokenAudience` to the account_id if not already configured ## Test plan - `TestApplyHostMetadata_SetsTokenAudienceForAccountHost` — no workspace_id, has account_id → set - `TestApplyHostMetadata_NoTokenAudienceForWorkspaceHost` — has workspace_id → not set - `TestApplyHostMetadata_DoesNotOverrideExistingTokenAudience` — pre-set value preserved NO_CHANGELOG=true This pull request was AI-assisted by Isaac.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🥞 Stacked PR
Use this link to review incremental changes.
Summary
When
/.well-known/databricks-configreturns noworkspace_id, the host is acting as an account-level endpoint._resolve_host_metadatanow automatically setstoken_audiencetoaccount_idin that case, so OIDC-based auth flows use the correct audience without requiring explicit user configuration.Why
Account-level endpoints require the OIDC token audience to be the
account_idrather than the token endpoint URL. Currentlycredentials_provider.pyhandles this dynamically by checkingcfg.client_type == ClientType.ACCOUNTat token-exchange time. However, this inference only works reliably afteraccount_idandworkspace_idare known — for unified hosts in account mode, where those values come from metadata, the audience needs to be determined from the same metadata response.Making
_resolve_host_metadatasettoken_audienceexplicitly when the host is in account mode (noworkspace_idin the response) means the audience decision is made early, co-located with the rest of the host metadata resolution, and doesn't depend onclient_typebeing correctly evaluated later in the init flow.User-provided
token_audience(via config file, env var, or constructor kwarg) is never overwritten.Also fixes a latent issue:
token_audiencepreviously carriedauth="github-oidc"on itsConfigAttribute, which caused_validate()to treat it as a standalone auth credential. This meant settingtoken_audiencealongside any other auth method (e.g. PAT) would incorrectly trigger the "more than one authorization method configured" error. Sincetoken_audienceis a parameter to OIDC flows — not a credential by itself — theauthtag has been removed.What changed
Interface changes
Config.token_audience: Removedauth="github-oidc"from theConfigAttribute. Settingtoken_audienceno longer causes multi-auth validation errors when combined with other auth methods.Behavioral changes
For unified hosts where
/.well-known/databricks-configreturns anaccount_idbut noworkspace_id:token_audienceis now automatically set toaccount_idduringConfig.__init__. Previously this was left unset and inferred at token-exchange time.For all other hosts and configurations: no change.
Internal changes
databricks/sdk/config.py: Removedauth="github-oidc"fromtoken_audienceConfigAttribute. Addedtoken_audienceback-fill logic to_resolve_host_metadata.tests/test_config.py: 3 new tests covering: audience set for account host (noworkspace_idin metadata), audience not set for workspace host (workspace_idpresent), and no overwrite of user-providedtoken_audience.How is this tested?
Unit tests in
tests/test_config.py.NO_CHANGELOG=true