Infer azure tenant ID.#482
Conversation
Signed-off-by: Sreekanth Vadigi <[email protected]>
Signed-off-by: Sreekanth Vadigi <[email protected]>
0426acf to
56859bf
Compare
Signed-off-by: Sreekanth Vadigi <[email protected]>
databricks-sdk-java/src/main/java/com/databricks/sdk/core/DatabricksConfig.java
Outdated
Show resolved
Hide resolved
databricks-sdk-java/src/main/java/com/databricks/sdk/core/DatabricksConfig.java
Outdated
Show resolved
Hide resolved
| } | ||
| AzureUtils.ensureHostPresent( | ||
| config, mapper, AzureServicePrincipalCredentialsProvider::tokenSourceFor); | ||
| config.loadAzureTenantId(); |
There was a problem hiding this comment.
can you return null if azure tenant id fails? to be consistent with line 26?
Also, can we keep the azureSPCredentials specific helpers for extracting tenant id here in credentialsProvider itself?
There was a problem hiding this comment.
Have updated the code to return null if loading tenant id fails.
I think we should keep loadAzureTenantId() in DatabricksConfig for reusability. Looking at the Python SDK's implementation, load_azure_tenant_id() is called from multiple Azure credential providers (azure_service_principal and azure_cli), which demonstrates it's designed as a shared utility in the main Config class. This approach ensures consistency across our SDKs and allows for future reuse by other Azure authentication methods.
|
|
||
| public DatabricksConfig clone() { | ||
| return clone(new HashSet<>()); | ||
| return clone(new HashSet<>(Collections.singletonList("logger"))); |
There was a problem hiding this comment.
The logger field is excluded from cloning because it's a static final field. The clone(Set<String> fieldsToSkip) method uses reflection to copy field values with f.set(newConfig, f.get(this)).
Attempting to clone a static field causes an IllegalAccessException with the message: "Can not set static final org.slf4j.Logger field com.databricks.sdk.core.DatabricksConfig.logger to org.slf4j.reload4j.Reload4jLoggerAdapter"
databricks-sdk-java/src/main/java/com/databricks/sdk/core/DatabricksConfig.java
Outdated
Show resolved
Hide resolved
databricks-sdk-java/src/main/java/com/databricks/sdk/core/DatabricksConfig.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Sreekanth Vadigi <[email protected]>
| @@ -726,7 +735,7 @@ private DatabricksConfig clone(Set<String> fieldsToSkip) { | |||
| } | |||
|
|
|||
| public DatabricksConfig clone() { | |||
There was a problem hiding this comment.
Do we have to clone this method for all our config classes?
There was a problem hiding this comment.
Could you elaborate on this comment? Are you asking whether we need to implement similar clone() methods in other configuration classes, or are you suggesting a different architectural approach for handling object cloning?
| return true; // Tenant ID already available - success | ||
| } | ||
|
|
||
| if (!isAzure() || host == null) { |
There was a problem hiding this comment.
Should not the isAzure check be the first check in this method & return false even if azureTenantId is having some value?
There was a problem hiding this comment.
The purpose of this method is to load the tenant ID, so if it's already set (explicitly by the user), we should respect that and return true. The isAzure() check is only needed when we need to discover the tenant ID from the workspace host. If the tenant ID is set and the host is not Azure, it should get caught at appropriate layers - this method should not be responsible for returning false and blocking the flow.
| * | ||
| * @return true if tenant ID is available (either was already set or successfully loaded), false otherwise | ||
| */ | ||
| public boolean loadAzureTenantId() { |
There was a problem hiding this comment.
I understand that there's a lot of exceptions in the current code but I would recommend treating the config as an immutable object. That is, do not change the tenantId in the config and rather have a mutable copy in the AzureServicePrincipalCredentialsProvider.
There was a problem hiding this comment.
Thanks for the suggestion, I have updated the code to not make any changes to the state of DatabricksConfig
| boolean tenantIdLoaded = config.loadAzureTenantId(); | ||
| if (!tenantIdLoaded) { | ||
| return null; | ||
| } |
There was a problem hiding this comment.
As mentioned in another comment, I would recommend implementing the loadAzureTenantId as an immutable function that would return the tenantId or fail. The tenantId can be stored in this class.
Something like this:
try {
this.tenantId = inferTenantId(config)
} catch (Exception e) {
logger.debug("Failed to extract tenant ID: {}", e.getMessage())
}There was a problem hiding this comment.
Updated the code as per your suggestions:
- Not changing the state of
DatabricksConfig - Maintaining local variable for
tenantIdinAzureServicePrincipalCredentialsProvider - Moved the tenant id inferring logic to
AzureUtilsfor reusability
Signed-off-by: Sreekanth Vadigi <[email protected]>
Signed-off-by: Sreekanth Vadigi <[email protected]>
| logger.warn("Failed to get redirect location from Azure auth endpoint: {}", loginUrl); | ||
| return null; |
There was a problem hiding this comment.
Let's throw an exception instead, the caller should be the one deciding whether these should be logged or not. Same in other places.
There was a problem hiding this comment.
Updated the code in AzureUtils to throw exception, caller will handle it now and logs the message
Signed-off-by: Sreekanth Vadigi <[email protected]>
| try { | ||
| String redirectLocation = getRedirectLocation(config, loginUrl); | ||
| String extractedTenantId = extractTenantIdFromUrl(redirectLocation); | ||
| logger.info("Successfully discovered Azure tenant ID: {}", extractedTenantId); |
There was a problem hiding this comment.
Also remove private static final Logger logger = LoggerFactory.getLogger(AzureUtils.class);
| logger.info("Successfully discovered Azure tenant ID: {}", extractedTenantId); |
Signed-off-by: Sreekanth Vadigi <[email protected]>
|
If integration tests don't run automatically, an authorized user can run them manually by following the instructions below: Trigger: Inputs:
Checks will be approved automatically on success. |
What changes are proposed in this pull request?
This PR modifies Azure service principal credential provider to attempt to load the tenant ID of the workspace if not provided before authenticating. Tenant ID is indirectly exposed via the redirect URL used when logging into a workspace. In this PR, we fetch the tenant ID from this endpoint and configure it if not already set.
Reference PR: databricks/databricks-sdk-py#638
Key changes:
inferTenantId()method inAzureUtilsthat makes an HTTP request to{host}/aad/authand extracts the tenant ID from the redirect URLAzureServicePrincipalCredentialsProviderto remove the explicitazureTenantIdrequirement and automatically call tenant ID discoveryTechnical implementation:
https://<workspace-host>/aad/authendpointhttps://login.microsoftonline.com/{tenant-id}/oauth2/authorizeWhy are these changes needed?
Currently, Azure Databricks users must manually specify the tenant-id when using Service Principal authentication. With this feature, users don't need to manually specify tenant ID, thus improving the user experience.
How is this tested?
Unit Tests
Comprehensive unit tests in
AzureUtilsTestcovering:Manual Testing
azure_tenant_id