Skip to content

Add Cloud field to HostMetadata and Config#1320

Merged
hectorcast-db merged 1 commit intomainfrom
hectorcast-db/stack/cloud-from-host-metadata
Mar 12, 2026
Merged

Add Cloud field to HostMetadata and Config#1320
hectorcast-db merged 1 commit intomainfrom
hectorcast-db/stack/cloud-from-host-metadata

Conversation

@hectorcast-db
Copy link
Copy Markdown
Contributor

@hectorcast-db hectorcast-db commented Mar 11, 2026

🥞 Stacked PR

Use this link to review incremental changes.


Summary

Port of databricks-sdk-go#1512. Adds an explicit cloud field to Config (and HostMetadata) that is populated from /.well-known/databricks-config. is_aws, is_azure, and is_gcp now prefer this field over DNS-based hostname inference.

Why

is_aws(), is_azure(), and is_gcp() previously inferred the cloud provider by suffix-matching the workspace hostname against a hardcoded list of known DNS zones (e.g. .azuredatabricks.net, .gcp.databricks.com). This works for standard deployments but silently misclassifies non-standard hostnames — vanity domains, unified hosts, or any future topology where the hostname doesn't encode the cloud provider. In those cases the SDK fell back to DefaultEnvironment (AWS), which is wrong.

The /.well-known/databricks-config endpoint is already the authoritative source for host metadata and returns a cloud field, but the SDK was discarding it. Threading that value through to Config lets the SDK make correct cloud-detection decisions without relying on hostname patterns.

What changed

Interface changes

  • Config.cloud: Cloud — new optional ConfigAttribute (env: DATABRICKS_CLOUD). Accepts "AWS", "AZURE", or "GCP" (case-insensitive). When set — either explicitly or auto-populated from metadata — is_aws, is_azure, and is_gcp use this value directly.
  • Cloud.parse(value: str) -> Optional[Cloud] — new classmethod on the Cloud enum for case-insensitive parsing. Returns None for empty or unrecognized values (forward-compatible with future cloud values the API may introduce).
  • HostMetadata.cloud: Optional[Cloud] — new field, populated from the "cloud" key in the discovery response.

Behavioral changes

  • is_aws, is_azure, is_gcp now check Config.cloud first. If unset they fall back to DNS-based detection — no change for existing deployments with standard hostnames.
  • azure_workspace_resource_id continues to take precedence over Config.cloud for is_azure, preserving existing Azure MSI behavior.
  • Auto-discovery note: the new endpoint has not been rolled out to all hosts yet, so cloud will remain unset for most deployments until broader rollout.

Internal changes

  • databricks/sdk/environments.py: Added Cloud.parse() classmethod.
  • databricks/sdk/oauth.py: Added cloud field to HostMetadata; from_dict and as_dict updated accordingly.
  • databricks/sdk/config.py: Added _parse_cloud helper, cloud ConfigAttribute, updated is_aws/is_azure/is_gcp to check self.cloud first, added cloud back-fill in _resolve_host_metadata.
  • tests/test_config.py: 10 new tests covering case-insensitive parsing, explicit cloud overriding DNS detection for each provider, DNS fallback, azure_workspace_resource_id precedence, and metadata population/non-overwrite.

How is this tested?

Unit tests in tests/test_config.py and tests/test_oauth.py. No integration tests — the endpoint is not yet available on all hosts.

NO_CHANGELOG=true

@hectorcast-db hectorcast-db force-pushed the hectorcast-db/stack/cloud-from-host-metadata branch from cc5140d to d3a6a37 Compare March 11, 2026 09:42
@hectorcast-db hectorcast-db force-pushed the hectorcast-db/stack/cloud-from-host-metadata branch from d3a6a37 to f9b0c3e Compare March 11, 2026 09:44
@hectorcast-db hectorcast-db marked this pull request as ready for review March 11, 2026 09:46
github-merge-queue bot pushed a commit that referenced this pull request Mar 11, 2026
…e is_account_client docstrings (#1317)

## 🥞 Stacked PR
Use this
[link](https://github.com/databricks/databricks-sdk-py/pull/1317/files)
to review incremental changes.
-
[**stack/use-test-environment-type-in-integration-tests**](#1317)
[[Files
changed](https://github.com/databricks/databricks-sdk-py/pull/1317/files)]
-
[stack/resolve-host-metadata-on-init](#1318)
[[Files
changed](https://github.com/databricks/databricks-sdk-py/pull/1318/files/0eff155d0ca7fc796d878cf9ee332dac5149be6a..63156f63eb80992d0c25a7b341d0b19b77dc9527)]
-
[stack/cloud-from-host-metadata](#1320)
[[Files
changed](https://github.com/databricks/databricks-sdk-py/pull/1320/files/63156f63eb80992d0c25a7b341d0b19b77dc9527..f9b0c3e2df72ddecb0fccb0d59af46f3a06846d2)]
-
[stack/resolve-token-audience-from-host-metadata](#1321)
[[Files
changed](https://github.com/databricks/databricks-sdk-py/pull/1321/files/f9b0c3e2df72ddecb0fccb0d59af46f3a06846d2..d97bd75a5692faddc36d51d46208513e442e7c87)]
-
[stack/google-auth-use-host-metadata](#1322)
[[Files
changed](https://github.com/databricks/databricks-sdk-py/pull/1322/files/d97bd75a5692faddc36d51d46208513e442e7c87..87b4c1a35e3fe1abfb125069ba1590eeaecf34e9)]

---------
## Summary

Replace `is_account_client` heuristics and `DATABRICKS_ACCOUNT_ID`
presence checks in integration test fixtures with explicit
`TEST_ENVIRONMENT_TYPE`-based filtering. Also deprecate
`is_account_client` in docstrings across `Config` and `ApiClient`.

## Why

Integration test fixtures in `tests/integration/conftest.py` used two
different heuristics to decide whether a test should run:

1. `account_client.config.is_account_client` — inspects the host URL to
infer whether the SDK is talking to an account-level API. This is
unreliable now that unified hosts can serve both account and workspace
APIs, and the property already raises `ValueError` when
`experimental_is_unified_host` is set.
2. `"DATABRICKS_ACCOUNT_ID" in os.environ` — a proxy check used to skip
workspace fixtures on account environments. This is fragile and doesn't
express intent clearly.

The test orchestrator now sets `TEST_ENVIRONMENT_TYPE` explicitly to one
of `ACCOUNT`, `UC_ACCOUNT`, `WORKSPACE`, or `UC_WORKSPACE`. Using this
value directly makes fixture skip-guards authoritative and easy to
reason about.

## What changed

### Interface changes

None. No public SDK API changes.

### Behavioral changes

None for SDK users. Integration test fixture skip conditions are now
driven by `TEST_ENVIRONMENT_TYPE` instead of `is_account_client` and
`DATABRICKS_ACCOUNT_ID`.

### Internal changes

- **`tests/integration/conftest.py`**: All four client fixtures (`a`,
`ucacct`, `w`, `ucws`) now check `TEST_ENVIRONMENT_TYPE` against an
explicit allowlist instead of using `is_account_client` or
`DATABRICKS_ACCOUNT_ID`.
- **`databricks/sdk/config.py`**: Updated `is_account_client` docstring
to note deprecation — clients can now support both workspace and account
APIs from a single host.
- **`databricks/sdk/core.py`**: Added matching deprecation note to
`ApiClient.is_account_client`.

## How is this tested?

The changed code is test infrastructure itself. Correctness will be
validated when integration tests run against environments that set
`TEST_ENVIRONMENT_TYPE`.

NO_CHANGELOG=true
@hectorcast-db hectorcast-db force-pushed the hectorcast-db/stack/cloud-from-host-metadata branch from f9b0c3e to e1872e3 Compare March 11, 2026 12:02
github-merge-queue bot pushed a commit that referenced this pull request Mar 11, 2026
#1318)

## 🥞 Stacked PR
Use this
[link](https://github.com/databricks/databricks-sdk-py/pull/1318/files)
to review incremental changes.
-
[**stack/resolve-host-metadata-on-init**](#1318)
[[Files
changed](https://github.com/databricks/databricks-sdk-py/pull/1318/files)]
-
[stack/cloud-from-host-metadata](#1320)
[[Files
changed](https://github.com/databricks/databricks-sdk-py/pull/1320/files/cd35488c886dc7a57ff3b1f79d70d1089a3ab6cf..e1872e347ea0b17197154389a80381d84d99a210)]
-
[stack/resolve-token-audience-from-host-metadata](#1321)
[[Files
changed](https://github.com/databricks/databricks-sdk-py/pull/1321/files/e1872e347ea0b17197154389a80381d84d99a210..51c62a915c2a3d0bd3c5ad720c26c5c79647976e)]
-
[stack/google-auth-use-host-metadata](#1322)
[[Files
changed](https://github.com/databricks/databricks-sdk-py/pull/1322/files/51c62a915c2a3d0bd3c5ad720c26c5c79647976e..9898fc156ef3d0cadc3c254e033cfaf20caa537d)]

---------
## Summary

Call `_resolve_host_metadata()` automatically during `Config.__init__`
so that `account_id`, `workspace_id`, `discovery_url`, and `cloud` are
populated from the `/.well-known/databricks-config` endpoint before auth
is initialized. Metadata fetch failures are non-fatal — a warning is
logged and initialization continues with whatever the user has
explicitly configured.

## Why

`_resolve_host_metadata()` already existed but had to be called
explicitly by callers who wanted auto-discovery. This meant that
`WorkspaceClient` and `AccountClient` did not benefit from it out of the
box. Beyond the missing call, the old implementation was also strictly
fatal: any HTTP error or missing field (`account_id`, `oidc_endpoint`)
raised a `ValueError`, making it unsafe to run automatically since not
all hosts expose the endpoint.

This PR wires up the call during `Config.__init__` (between
`_fix_host_if_needed` and `_validate`) and hardens the implementation so
it degrades gracefully: HTTP failures produce a warning, and missing
optional fields are simply left unset. The only remaining hard failure
is when a `{account_id}` template in `oidc_endpoint` cannot be
substituted because no `account_id` is available — this is a
configuration error that should still surface.

Auto-discovery is currently gated to hosts with
`experimental_is_unified_host=True` (a `# TODO: Enable this everywhere`
marks the guard) to avoid unintended network calls on standard workspace
and account deployments until the endpoint is broadly available.

## What changed

### Interface changes

None. `_resolve_host_metadata()` remains private.

### Behavioral changes

For hosts with `experimental_is_unified_host=True`: `Config.__init__`
now makes an HTTP request to `{host}/.well-known/databricks-config` and
silently back-fills `account_id`, `workspace_id`, `discovery_url`, and
`cloud` from the response. Previously this did not happen.

All other hosts: no change.

### Internal changes

- **`databricks/sdk/config.py`**: Added `_resolve_host_metadata()` call
to `Config.__init__`. Added `HostType.UNIFIED` guard. Made HTTP failures
non-fatal (warning + return). Removed mandatory-field assertions
(`account_id is required`, `discovery_url is required`) — missing fields
are now silently skipped.
- **`tests/conftest.py`**: Added autouse `stub_host_metadata` fixture
that mocks `get_host_metadata` for all unit tests, preventing unexpected
network calls.
- **`tests/test_config.py`**: Replaced the old `requests_mock`-based
tests (which called `_resolve_host_metadata()` directly) with
`mocker.patch`-based tests that exercise the full `Config.__init__`
path. Added `test_resolve_host_metadata_skipped_for_non_unified` to
assert the guard works.

## How is this tested?

Unit tests in `tests/test_config.py` cover: successful resolution of all
fields, no-overwrite of pre-configured values, HTTP failure tolerance,
missing `oidc_endpoint`, and the guard that skips non-unified hosts.

NO_CHANGELOG=true

---------

Co-authored-by: Claude Sonnet 4.6 <[email protected]>
Port of databricks/databricks-sdk-go#1512. Adds an explicit `cloud`
field to `Config` populated from /.well-known/databricks-config (or
DATABRICKS_CLOUD env var). is_aws/is_azure/is_gcp now check this field
first, falling back to DNS-based hostname inference for unset cases.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
@hectorcast-db hectorcast-db force-pushed the hectorcast-db/stack/cloud-from-host-metadata branch from e1872e3 to e846874 Compare March 11, 2026 15:33
@github-actions
Copy link
Copy Markdown

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/sdk-py

Inputs:

  • PR number: 1320
  • Commit SHA: e846874ec446c75225cfb26864de20a8906b34fd

Checks will be approved automatically on success.

Copy link
Copy Markdown
Contributor

@tejaskochar-db tejaskochar-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@hectorcast-db hectorcast-db added this pull request to the merge queue Mar 12, 2026
Merged via the queue into main with commit 89ae980 Mar 12, 2026
17 checks passed
@hectorcast-db hectorcast-db deleted the hectorcast-db/stack/cloud-from-host-metadata branch March 12, 2026 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants