Skip to content

Add Cloud to HostMetadata#1512

Merged
hectorcast-db merged 1 commit intomainfrom
cloud-in-metadata
Mar 3, 2026
Merged

Add Cloud to HostMetadata#1512
hectorcast-db merged 1 commit intomainfrom
cloud-in-metadata

Conversation

@hectorcast-db
Copy link
Copy Markdown
Contributor

@hectorcast-db hectorcast-db commented Mar 3, 2026


Summary

Adds an explicit Cloud field to Config that is populated from the
/.well-known/databricks-config discovery endpoint (or overridden via config
file / DATABRICKS_CLOUD env var).

Why

Today, IsAws(), IsAzure(), and IsGcp() all delegate to c.Environment().Cloud,
which infers cloud type by suffix-matching the workspace hostname against a
hardcoded list of known DNS zones. This works for standard deployments but
fails silently for non-standard hostnames — for example, custom vanity
domains, unified hosts, or any future topology where the hostname doesn't
encode the cloud provider. In those cases the SDK falls back to
DefaultEnvironment() (AWS), potentially misclassifying the deployment.

The /.well-known/databricks-config discovery endpoint is the authoritative
source for host metadata. It already returns a cloud field, but the SDK was
discarding it. This PR threads that value through: the metadata response is
parsed into hostMetadata.Cloud, then back-filled into Config.Cloud if not
already set. DNS-based detection is retained as a fallback when the endpoint
doesn't return a cloud field.

The Cloud field can also be set directly in configuration or via
DATABRICKS_CLOUD, which is useful for testing and for environments where the
discovery endpoint is unreachable.

NOTE: Auto discovery is not yet enabled, since the new endpoint has not been rolled out to all hosts.

What changed

Interface changes

  • Config.Cloud environment.Cloud — new experimental field (name:"cloud",
    env:"DATABRICKS_CLOUD"). Takes precedence over DNS-based detection in IsAws,
    IsAzure, IsGcp.
  • environment.CloudUnknown Cloud = "" — new sentinel for an unset/empty cloud
    value.
  • Cloud.UnmarshalJSON — case-insensitive JSON deserialization ("aws", "AWS",
    "Azure", "AZURE" all normalize). Unknown values are passed through as-is for
    forward compatibility.
  • hostMetadata.Cloud environment.Cloud — new field parsed from the cloud key
    in the /.well-known/databricks-config response.
  • Config.Environment() — marked deprecated; use Config.Cloud + cloud-specific
    helpers instead.

Behavioral changes

  • IsAzure(), IsGcp(), IsAws() now check Config.Cloud first; DNS-based
    inference is used only when Cloud is unset.
  • resolveHostMetadata now populates Config.Cloud from the discovery endpoint,
    with a DNS-based fallback. Previously the cloud field in the response was
    silently ignored.

How is this tested?

Unit tests cover:

  • TestConfig_ResolveHostMetadata_PopulatesCloudFromAPI — Cloud is set from
    the API response.
  • TestConfig_ResolveHostMetadata_CloudFallbackToDNS — falls back to DNS when
    the response omits cloud.
  • TestConfig_ResolveHostMetadata_DoesNotOverwriteExistingCloud — explicit
    Config.Cloud is not overwritten.
  • TestConfig_ResolveHostMetadata_Clouds — table-driven tests covering all
    case variants ("aws", "AZURE", "gcp", forward-compat unknown, empty).
  • TestCloudField_* — IsAws/IsAzure/IsGcp respect the explicit Cloud field and
    fall back correctly when unset.
  • TestGetHostMetadata_WithCloudField — hostMetadata deserialization for all
    three cloud providers and missing field.

NO_CHANGELOG=true

@hectorcast-db hectorcast-db requested a review from tanmay-db March 3, 2026 09:00
@hectorcast-db hectorcast-db changed the title WIP Cloud in metadata Add Cloud to HostMetadata Mar 3, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 3, 2026

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/sdk-go

Inputs:

  • PR number: 1512
  • Commit SHA: 40a08c3055df89417e8ac64dcd36173ace34a832

Checks will be approved automatically on success.

@hectorcast-db hectorcast-db added this pull request to the merge queue Mar 3, 2026
Merged via the queue into main with commit 55e79c5 Mar 3, 2026
15 checks passed
@hectorcast-db hectorcast-db deleted the cloud-in-metadata branch March 3, 2026 11:26
hectorcast-db added a commit to databricks/databricks-sdk-py that referenced this pull request Mar 11, 2026
Port of databricks/databricks-sdk-go#1512. Adds an explicit `cloud`
field to `Config` populated from /.well-known/databricks-config (or
DATABRICKS_CLOUD env var). is_aws/is_azure/is_gcp now check this field
first, falling back to DNS-based hostname inference for unset cases.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
github-merge-queue bot pushed a commit to databricks/databricks-sdk-py that referenced this pull request Mar 12, 2026
## 🥞 Stacked PR
Use this
[link](https://github.com/databricks/databricks-sdk-py/pull/1320/files)
to review incremental changes.
-
[**stack/cloud-from-host-metadata**](#1320)
[[Files
changed](https://github.com/databricks/databricks-sdk-py/pull/1320/files)]
-
[stack/resolve-token-audience-from-host-metadata](#1321)
[[Files
changed](https://github.com/databricks/databricks-sdk-py/pull/1321/files/e846874ec446c75225cfb26864de20a8906b34fd..56f3705b9513f9b1fe43ea27ffacc7463865b1fe)]
-
[stack/google-auth-use-host-metadata](#1322)
[[Files
changed](https://github.com/databricks/databricks-sdk-py/pull/1322/files/56f3705b9513f9b1fe43ea27ffacc7463865b1fe..cae02e48930199b9ebb3bc37fe34f9a849c6708d)]

---------
## Summary

Port of
[databricks-sdk-go#1512](databricks/databricks-sdk-go#1512).
Adds an explicit `cloud` field to `Config` (and `HostMetadata`) that is
populated from `/.well-known/databricks-config`. `is_aws`, `is_azure`,
and `is_gcp` now prefer this field over DNS-based hostname inference.

## Why

`is_aws()`, `is_azure()`, and `is_gcp()` previously inferred the cloud
provider by suffix-matching the workspace hostname against a hardcoded
list of known DNS zones (e.g. `.azuredatabricks.net`,
`.gcp.databricks.com`). This works for standard deployments but silently
misclassifies non-standard hostnames — vanity domains, unified hosts, or
any future topology where the hostname doesn't encode the cloud
provider. In those cases the SDK fell back to `DefaultEnvironment`
(AWS), which is wrong.

The `/.well-known/databricks-config` endpoint is already the
authoritative source for host metadata and returns a `cloud` field, but
the SDK was discarding it. Threading that value through to `Config` lets
the SDK make correct cloud-detection decisions without relying on
hostname patterns.

## What changed

### Interface changes

- **`Config.cloud: Cloud`** — new optional `ConfigAttribute` (env:
`DATABRICKS_CLOUD`). Accepts `"AWS"`, `"AZURE"`, or `"GCP"`
(case-insensitive). When set — either explicitly or auto-populated from
metadata — `is_aws`, `is_azure`, and `is_gcp` use this value directly.
- **`Cloud.parse(value: str) -> Optional[Cloud]`** — new classmethod on
the `Cloud` enum for case-insensitive parsing. Returns `None` for empty
or unrecognized values (forward-compatible with future cloud values the
API may introduce).
- **`HostMetadata.cloud: Optional[Cloud]`** — new field, populated from
the `"cloud"` key in the discovery response.

### Behavioral changes

- `is_aws`, `is_azure`, `is_gcp` now check `Config.cloud` first. If
unset they fall back to DNS-based detection — no change for existing
deployments with standard hostnames.
- `azure_workspace_resource_id` continues to take precedence over
`Config.cloud` for `is_azure`, preserving existing Azure MSI behavior.
- Auto-discovery note: the new endpoint has not been rolled out to all
hosts yet, so `cloud` will remain unset for most deployments until
broader rollout.

### Internal changes

- **`databricks/sdk/environments.py`**: Added `Cloud.parse()`
classmethod.
- **`databricks/sdk/oauth.py`**: Added `cloud` field to `HostMetadata`;
`from_dict` and `as_dict` updated accordingly.
- **`databricks/sdk/config.py`**: Added `_parse_cloud` helper, `cloud`
`ConfigAttribute`, updated `is_aws/is_azure/is_gcp` to check
`self.cloud` first, added `cloud` back-fill in `_resolve_host_metadata`.
- **`tests/test_config.py`**: 10 new tests covering case-insensitive
parsing, explicit `cloud` overriding DNS detection for each provider,
DNS fallback, `azure_workspace_resource_id` precedence, and metadata
population/non-overwrite.

## How is this tested?

Unit tests in `tests/test_config.py` and `tests/test_oauth.py`. No
integration tests — the endpoint is not yet available on all hosts.

NO_CHANGELOG=true

Co-authored-by: Claude Sonnet 4.6 <[email protected]>
github-merge-queue bot pushed a commit to databricks/databricks-sdk-java that referenced this pull request Mar 19, 2026
## 🥞 Stacked PR

- [**#710 Add cloud field to
HostMetadata**](#710)
[[Files](https://github.com/databricks/databricks-sdk-java/pull/710/files)]
- [#711 Fix GetWorkspaceClient for unified account
hosts](#711)
[[Files](https://github.com/databricks/databricks-sdk-java/pull/711/files)]
- [#712 Add test for GetWorkspaceClient with SPOG
host](#712)
[[Files](https://github.com/databricks/databricks-sdk-java/pull/712/files)]
- [#713 Call resolveHostMetadata on Config
init](#713)
[[Files](https://github.com/databricks/databricks-sdk-java/pull/713/files)]
- [#714 Resolve TokenAudience from host metadata for account
hosts](#714)
[[Files](https://github.com/databricks/databricks-sdk-java/pull/714/files)]
- [#718 Make GCP SA token refresh
non-blocking](#718)
[[Files](https://github.com/databricks/databricks-sdk-java/pull/718/files)]
- [#719 Add integration test for host metadata
resolution](#719)
[[Files](https://github.com/databricks/databricks-sdk-java/pull/719/files)]
- [#720 Remove unified flag usage, rely on host
metadata](#720)
[[Files](https://github.com/databricks/databricks-sdk-java/pull/720/files)]

---------
## Summary

Port of Go SDK
[#1512](databricks/databricks-sdk-go#1512).

Adds a `cloud` field to `HostMetadata` that is populated from the
`/.well-known/databricks-config` discovery endpoint.

**Why:** Today, `isAws()`, `isAzure()`, and `isGcp()` infer cloud type
by suffix-matching the workspace hostname against a hardcoded list of
known DNS zones. This works for standard deployments but fails for
non-standard hostnames (custom vanity domains, unified hosts, etc.). The
discovery endpoint is the authoritative source and already returns a
`cloud` field, but the SDK was discarding it.

**Changes:**
- `HostMetadata`: new `cloud` field (`@JsonProperty("cloud")`), getter,
and 4-arg constructor
- `HostMetadataTest`: deserialization with/without cloud, constructor
tests

`NO_CHANGELOG=true`

## Test plan
- [x] `HostMetadataTest`: 4 tests for cloud field deserialization and
constructors
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants