Skip to content

Use host metadata to determine GCP SA token requirement#1322

Closed
hectorcast-db wants to merge 2 commits intomainfrom
hectorcast-db/stack/google-auth-use-host-metadata
Closed

Use host metadata to determine GCP SA token requirement#1322
hectorcast-db wants to merge 2 commits intomainfrom
hectorcast-db/stack/google-auth-use-host-metadata

Conversation

@hectorcast-db
Copy link
Copy Markdown
Contributor

@hectorcast-db hectorcast-db commented Mar 11, 2026

🥞 Stacked PR

Use this link to review incremental changes.


Summary

Replace cfg.client_type checks in google_credentials and google_id with
_requires_gcp_sa_access_token, which consults /.well-known/databricks-config
directly. This fixes SPOG hosts, which are account-level endpoints even when
the user config contains a workspace_id — cfg.client_type would return WORKSPACE
in that case, causing the X-Databricks-GCP-SA-Access-Token header to be omitted.

Falls back to bool(cfg.account_id) when the metadata endpoint is unavailable:
in the legacy deployment model workspace hosts were always reached without an
account_id, while account-console hosts required one.

Co-Authored-By: Claude Sonnet 4.6 [email protected]

Why

What changed

Interface changes

Behavioral changes

Internal changes

How is this tested?

@github-actions
Copy link
Copy Markdown

Please ensure that the NEXT_CHANGELOG.md file is updated with any relevant changes.
If this is not necessary for your PR, please include the following in your PR description:
NO_CHANGELOG=true
and rerun the job.

github-merge-queue bot pushed a commit that referenced this pull request Mar 11, 2026
…e is_account_client docstrings (#1317)

## 🥞 Stacked PR
Use this
[link](https://github.com/databricks/databricks-sdk-py/pull/1317/files)
to review incremental changes.
-
[**stack/use-test-environment-type-in-integration-tests**](#1317)
[[Files
changed](https://github.com/databricks/databricks-sdk-py/pull/1317/files)]
-
[stack/resolve-host-metadata-on-init](#1318)
[[Files
changed](https://github.com/databricks/databricks-sdk-py/pull/1318/files/0eff155d0ca7fc796d878cf9ee332dac5149be6a..63156f63eb80992d0c25a7b341d0b19b77dc9527)]
-
[stack/cloud-from-host-metadata](#1320)
[[Files
changed](https://github.com/databricks/databricks-sdk-py/pull/1320/files/63156f63eb80992d0c25a7b341d0b19b77dc9527..f9b0c3e2df72ddecb0fccb0d59af46f3a06846d2)]
-
[stack/resolve-token-audience-from-host-metadata](#1321)
[[Files
changed](https://github.com/databricks/databricks-sdk-py/pull/1321/files/f9b0c3e2df72ddecb0fccb0d59af46f3a06846d2..d97bd75a5692faddc36d51d46208513e442e7c87)]
-
[stack/google-auth-use-host-metadata](#1322)
[[Files
changed](https://github.com/databricks/databricks-sdk-py/pull/1322/files/d97bd75a5692faddc36d51d46208513e442e7c87..87b4c1a35e3fe1abfb125069ba1590eeaecf34e9)]

---------
## Summary

Replace `is_account_client` heuristics and `DATABRICKS_ACCOUNT_ID`
presence checks in integration test fixtures with explicit
`TEST_ENVIRONMENT_TYPE`-based filtering. Also deprecate
`is_account_client` in docstrings across `Config` and `ApiClient`.

## Why

Integration test fixtures in `tests/integration/conftest.py` used two
different heuristics to decide whether a test should run:

1. `account_client.config.is_account_client` — inspects the host URL to
infer whether the SDK is talking to an account-level API. This is
unreliable now that unified hosts can serve both account and workspace
APIs, and the property already raises `ValueError` when
`experimental_is_unified_host` is set.
2. `"DATABRICKS_ACCOUNT_ID" in os.environ` — a proxy check used to skip
workspace fixtures on account environments. This is fragile and doesn't
express intent clearly.

The test orchestrator now sets `TEST_ENVIRONMENT_TYPE` explicitly to one
of `ACCOUNT`, `UC_ACCOUNT`, `WORKSPACE`, or `UC_WORKSPACE`. Using this
value directly makes fixture skip-guards authoritative and easy to
reason about.

## What changed

### Interface changes

None. No public SDK API changes.

### Behavioral changes

None for SDK users. Integration test fixture skip conditions are now
driven by `TEST_ENVIRONMENT_TYPE` instead of `is_account_client` and
`DATABRICKS_ACCOUNT_ID`.

### Internal changes

- **`tests/integration/conftest.py`**: All four client fixtures (`a`,
`ucacct`, `w`, `ucws`) now check `TEST_ENVIRONMENT_TYPE` against an
explicit allowlist instead of using `is_account_client` or
`DATABRICKS_ACCOUNT_ID`.
- **`databricks/sdk/config.py`**: Updated `is_account_client` docstring
to note deprecation — clients can now support both workspace and account
APIs from a single host.
- **`databricks/sdk/core.py`**: Added matching deprecation note to
`ApiClient.is_account_client`.

## How is this tested?

The changed code is test infrastructure itself. Correctness will be
validated when integration tests run against environments that set
`TEST_ENVIRONMENT_TYPE`.

NO_CHANGELOG=true
@hectorcast-db hectorcast-db force-pushed the hectorcast-db/stack/google-auth-use-host-metadata branch from 87b4c1a to 9898fc1 Compare March 11, 2026 12:02
github-merge-queue bot pushed a commit that referenced this pull request Mar 11, 2026
#1318)

## 🥞 Stacked PR
Use this
[link](https://github.com/databricks/databricks-sdk-py/pull/1318/files)
to review incremental changes.
-
[**stack/resolve-host-metadata-on-init**](#1318)
[[Files
changed](https://github.com/databricks/databricks-sdk-py/pull/1318/files)]
-
[stack/cloud-from-host-metadata](#1320)
[[Files
changed](https://github.com/databricks/databricks-sdk-py/pull/1320/files/cd35488c886dc7a57ff3b1f79d70d1089a3ab6cf..e1872e347ea0b17197154389a80381d84d99a210)]
-
[stack/resolve-token-audience-from-host-metadata](#1321)
[[Files
changed](https://github.com/databricks/databricks-sdk-py/pull/1321/files/e1872e347ea0b17197154389a80381d84d99a210..51c62a915c2a3d0bd3c5ad720c26c5c79647976e)]
-
[stack/google-auth-use-host-metadata](#1322)
[[Files
changed](https://github.com/databricks/databricks-sdk-py/pull/1322/files/51c62a915c2a3d0bd3c5ad720c26c5c79647976e..9898fc156ef3d0cadc3c254e033cfaf20caa537d)]

---------
## Summary

Call `_resolve_host_metadata()` automatically during `Config.__init__`
so that `account_id`, `workspace_id`, `discovery_url`, and `cloud` are
populated from the `/.well-known/databricks-config` endpoint before auth
is initialized. Metadata fetch failures are non-fatal — a warning is
logged and initialization continues with whatever the user has
explicitly configured.

## Why

`_resolve_host_metadata()` already existed but had to be called
explicitly by callers who wanted auto-discovery. This meant that
`WorkspaceClient` and `AccountClient` did not benefit from it out of the
box. Beyond the missing call, the old implementation was also strictly
fatal: any HTTP error or missing field (`account_id`, `oidc_endpoint`)
raised a `ValueError`, making it unsafe to run automatically since not
all hosts expose the endpoint.

This PR wires up the call during `Config.__init__` (between
`_fix_host_if_needed` and `_validate`) and hardens the implementation so
it degrades gracefully: HTTP failures produce a warning, and missing
optional fields are simply left unset. The only remaining hard failure
is when a `{account_id}` template in `oidc_endpoint` cannot be
substituted because no `account_id` is available — this is a
configuration error that should still surface.

Auto-discovery is currently gated to hosts with
`experimental_is_unified_host=True` (a `# TODO: Enable this everywhere`
marks the guard) to avoid unintended network calls on standard workspace
and account deployments until the endpoint is broadly available.

## What changed

### Interface changes

None. `_resolve_host_metadata()` remains private.

### Behavioral changes

For hosts with `experimental_is_unified_host=True`: `Config.__init__`
now makes an HTTP request to `{host}/.well-known/databricks-config` and
silently back-fills `account_id`, `workspace_id`, `discovery_url`, and
`cloud` from the response. Previously this did not happen.

All other hosts: no change.

### Internal changes

- **`databricks/sdk/config.py`**: Added `_resolve_host_metadata()` call
to `Config.__init__`. Added `HostType.UNIFIED` guard. Made HTTP failures
non-fatal (warning + return). Removed mandatory-field assertions
(`account_id is required`, `discovery_url is required`) — missing fields
are now silently skipped.
- **`tests/conftest.py`**: Added autouse `stub_host_metadata` fixture
that mocks `get_host_metadata` for all unit tests, preventing unexpected
network calls.
- **`tests/test_config.py`**: Replaced the old `requests_mock`-based
tests (which called `_resolve_host_metadata()` directly) with
`mocker.patch`-based tests that exercise the full `Config.__init__`
path. Added `test_resolve_host_metadata_skipped_for_non_unified` to
assert the guard works.

## How is this tested?

Unit tests in `tests/test_config.py` cover: successful resolution of all
fields, no-overwrite of pre-configured values, HTTP failure tolerance,
missing `oidc_endpoint`, and the guard that skips non-unified hosts.

NO_CHANGELOG=true

---------

Co-authored-by: Claude Sonnet 4.6 <[email protected]>
@hectorcast-db hectorcast-db force-pushed the hectorcast-db/stack/google-auth-use-host-metadata branch from 9898fc1 to cae02e4 Compare March 11, 2026 15:33
github-merge-queue bot pushed a commit that referenced this pull request Mar 12, 2026
## 🥞 Stacked PR
Use this
[link](https://github.com/databricks/databricks-sdk-py/pull/1320/files)
to review incremental changes.
-
[**stack/cloud-from-host-metadata**](#1320)
[[Files
changed](https://github.com/databricks/databricks-sdk-py/pull/1320/files)]
-
[stack/resolve-token-audience-from-host-metadata](#1321)
[[Files
changed](https://github.com/databricks/databricks-sdk-py/pull/1321/files/e846874ec446c75225cfb26864de20a8906b34fd..56f3705b9513f9b1fe43ea27ffacc7463865b1fe)]
-
[stack/google-auth-use-host-metadata](#1322)
[[Files
changed](https://github.com/databricks/databricks-sdk-py/pull/1322/files/56f3705b9513f9b1fe43ea27ffacc7463865b1fe..cae02e48930199b9ebb3bc37fe34f9a849c6708d)]

---------
## Summary

Port of
[databricks-sdk-go#1512](databricks/databricks-sdk-go#1512).
Adds an explicit `cloud` field to `Config` (and `HostMetadata`) that is
populated from `/.well-known/databricks-config`. `is_aws`, `is_azure`,
and `is_gcp` now prefer this field over DNS-based hostname inference.

## Why

`is_aws()`, `is_azure()`, and `is_gcp()` previously inferred the cloud
provider by suffix-matching the workspace hostname against a hardcoded
list of known DNS zones (e.g. `.azuredatabricks.net`,
`.gcp.databricks.com`). This works for standard deployments but silently
misclassifies non-standard hostnames — vanity domains, unified hosts, or
any future topology where the hostname doesn't encode the cloud
provider. In those cases the SDK fell back to `DefaultEnvironment`
(AWS), which is wrong.

The `/.well-known/databricks-config` endpoint is already the
authoritative source for host metadata and returns a `cloud` field, but
the SDK was discarding it. Threading that value through to `Config` lets
the SDK make correct cloud-detection decisions without relying on
hostname patterns.

## What changed

### Interface changes

- **`Config.cloud: Cloud`** — new optional `ConfigAttribute` (env:
`DATABRICKS_CLOUD`). Accepts `"AWS"`, `"AZURE"`, or `"GCP"`
(case-insensitive). When set — either explicitly or auto-populated from
metadata — `is_aws`, `is_azure`, and `is_gcp` use this value directly.
- **`Cloud.parse(value: str) -> Optional[Cloud]`** — new classmethod on
the `Cloud` enum for case-insensitive parsing. Returns `None` for empty
or unrecognized values (forward-compatible with future cloud values the
API may introduce).
- **`HostMetadata.cloud: Optional[Cloud]`** — new field, populated from
the `"cloud"` key in the discovery response.

### Behavioral changes

- `is_aws`, `is_azure`, `is_gcp` now check `Config.cloud` first. If
unset they fall back to DNS-based detection — no change for existing
deployments with standard hostnames.
- `azure_workspace_resource_id` continues to take precedence over
`Config.cloud` for `is_azure`, preserving existing Azure MSI behavior.
- Auto-discovery note: the new endpoint has not been rolled out to all
hosts yet, so `cloud` will remain unset for most deployments until
broader rollout.

### Internal changes

- **`databricks/sdk/environments.py`**: Added `Cloud.parse()`
classmethod.
- **`databricks/sdk/oauth.py`**: Added `cloud` field to `HostMetadata`;
`from_dict` and `as_dict` updated accordingly.
- **`databricks/sdk/config.py`**: Added `_parse_cloud` helper, `cloud`
`ConfigAttribute`, updated `is_aws/is_azure/is_gcp` to check
`self.cloud` first, added `cloud` back-fill in `_resolve_host_metadata`.
- **`tests/test_config.py`**: 10 new tests covering case-insensitive
parsing, explicit `cloud` overriding DNS detection for each provider,
DNS fallback, `azure_workspace_resource_id` precedence, and metadata
population/non-overwrite.

## How is this tested?

Unit tests in `tests/test_config.py` and `tests/test_oauth.py`. No
integration tests — the endpoint is not yet available on all hosts.

NO_CHANGELOG=true

Co-authored-by: Claude Sonnet 4.6 <[email protected]>
@hectorcast-db hectorcast-db force-pushed the hectorcast-db/stack/google-auth-use-host-metadata branch from cae02e4 to a46de38 Compare March 12, 2026 14:57
hectorcast-db and others added 2 commits March 12, 2026 15:01
When /.well-known/databricks-config returns no workspace_id the host is
acting as an account-level endpoint. Set token_audience to account_id so
OIDC auth flows use the correct audience without requiring explicit config.
User-provided token_audience is never overwritten.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Replace cfg.client_type checks in google_credentials and google_id with
_requires_gcp_sa_access_token, which consults /.well-known/databricks-config
directly. This fixes SPOG hosts, which are account-level endpoints even when
the user config contains a workspace_id — cfg.client_type would return WORKSPACE
in that case, causing the X-Databricks-GCP-SA-Access-Token header to be omitted.

Falls back to bool(cfg.account_id) when the metadata endpoint is unavailable:
in the legacy deployment model workspace hosts were always reached without an
account_id, while account-console hosts required one.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
@hectorcast-db hectorcast-db force-pushed the hectorcast-db/stack/google-auth-use-host-metadata branch from a46de38 to b28609c Compare March 12, 2026 15:02
@github-actions
Copy link
Copy Markdown

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/sdk-py

Inputs:

  • PR number: 1322
  • Commit SHA: b28609cbc6e169076f64050451f45077dab4c44d

Checks will be approved automatically on success.

github-merge-queue bot pushed a commit that referenced this pull request Mar 12, 2026
## 🥞 Stacked PR
Use this
[link](https://github.com/databricks/databricks-sdk-py/pull/1321/files)
to review incremental changes.
-
[**stack/resolve-token-audience-from-host-metadata**](#1321)
[[Files
changed](https://github.com/databricks/databricks-sdk-py/pull/1321/files)]
-
[stack/google-auth-use-host-metadata](#1322)
[[Files
changed](https://github.com/databricks/databricks-sdk-py/pull/1322/files/191ec908b1218f45a64e4ab186c80dbaa64b9300..b28609cbc6e169076f64050451f45077dab4c44d)]

---------
## Summary

When `/.well-known/databricks-config` returns no `workspace_id`, the
host is acting as an account-level endpoint. `_resolve_host_metadata`
now automatically sets `token_audience` to `account_id` in that case, so
OIDC-based auth flows use the correct audience without requiring
explicit user configuration.

## Why

Account-level endpoints require the OIDC token audience to be the
`account_id` rather than the token endpoint URL. Currently
`credentials_provider.py` handles this dynamically by checking
`cfg.client_type == ClientType.ACCOUNT` at token-exchange time. However,
this inference only works reliably after `account_id` and `workspace_id`
are known — for unified hosts in account mode, where those values come
from metadata, the audience needs to be determined from the same
metadata response.

Making `_resolve_host_metadata` set `token_audience` explicitly when the
host is in account mode (no `workspace_id` in the response) means the
audience decision is made early, co-located with the rest of the host
metadata resolution, and doesn't depend on `client_type` being correctly
evaluated later in the init flow.

User-provided `token_audience` (via config file, env var, or constructor
kwarg) is never overwritten.

Also fixes a latent issue: `token_audience` previously carried
`auth="github-oidc"` on its `ConfigAttribute`, which caused
`_validate()` to treat it as a standalone auth credential. This meant
setting `token_audience` alongside any other auth method (e.g. PAT)
would incorrectly trigger the "more than one authorization method
configured" error. Since `token_audience` is a parameter to OIDC flows —
not a credential by itself — the `auth` tag has been removed.

## What changed

### Interface changes

- **`Config.token_audience`**: Removed `auth="github-oidc"` from the
`ConfigAttribute`. Setting `token_audience` no longer causes multi-auth
validation errors when combined with other auth methods.

### Behavioral changes

For unified hosts where `/.well-known/databricks-config` returns an
`account_id` but no `workspace_id`: `token_audience` is now
automatically set to `account_id` during `Config.__init__`. Previously
this was left unset and inferred at token-exchange time.

For all other hosts and configurations: no change.

### Internal changes

- **`databricks/sdk/config.py`**: Removed `auth="github-oidc"` from
`token_audience` `ConfigAttribute`. Added `token_audience` back-fill
logic to `_resolve_host_metadata`.
- **`tests/test_config.py`**: 3 new tests covering: audience set for
account host (no `workspace_id` in metadata), audience not set for
workspace host (`workspace_id` present), and no overwrite of
user-provided `token_audience`.

## How is this tested?

Unit tests in `tests/test_config.py`.

NO_CHANGELOG=true

Co-authored-by: Claude Sonnet 4.6 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant