Skip to content

Commit 7735a71

Browse files
committed
parent c866d42
author Vuong <[email protected]> 1709737244 +0000 committer Vuong <[email protected]> 1709739422 +0000 parent c866d42 author Vuong <[email protected]> 1709737244 +0000 committer Vuong <[email protected]> 1709739396 +0000 parent c866d42 author Vuong <[email protected]> 1709737244 +0000 committer Vuong <[email protected]> 1709739377 +0000 parent c866d42 author Vuong <[email protected]> 1709737244 +0000 committer Vuong <[email protected]> 1709739250 +0000 add trust relationship update Fix integration tests on AWS (#978) Update groups permissions validation to use Table ACL cluster (#979) Renamed columns in assessment SQL queries to use actual names, not aliases (#983) <!-- Summary of your changes that are easy to understand. Add screenshots when necessary --> Aliases are usually not allowed in projections (as they are replaced later in the query execution phases). While the DBSQL was smart enough to handle the references via aliases, for some setups this results in an error. Changing column references to use actual names fixes this. <!-- DOC: Link issue with a keyword: close, closes, closed, fix, fixes, fixed, resolve, resolves, resolved. See https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword --> Resolves #980 - [ ] added relevant user documentation - [ ] added new CLI command - [ ] modified existing command: `databricks labs ucx ...` - [ ] added a new workflow - [ ] modified existing workflow: `...` - [ ] added a new table - [ ] modified existing table: `...` <!-- How is this tested? Please see the checklist below and also describe any other relevant tests --> - [x] manually tested - [ ] added unit tests - [ ] added integration tests - [ ] verified on staging environment (screenshot attached) Fixed `config.yml` upgrade from very old versions (#984) Added `upgraded_from_workspace_id` property to migrated tables to indicated the source workspace. (#987) Added table parameter `upgraded_from_ws` to migrated tables. The parameters contains the sources workspace id. Resolves #899 - [ ] added relevant user documentation - [ ] added new CLI command - [ ] modified existing command: `databricks labs ucx ...` - [ ] added a new workflow - [ ] modified existing workflow: `...` - [ ] added a new table - [ ] modified existing table: `...` <!-- How is this tested? Please see the checklist below and also describe any other relevant tests --> - [x] manually tested - [x] added unit tests - [x] added integration tests - [x] verified on staging environment (screenshot attached) Added group members difference to the output of `validate-groups-membership` cli command (#995) The `validate-groups-membership` command has been updated to include a comparison of group memberships at both the account and workspace levels, displaying the difference in members between the two levels in a new column. This enhancement allows for a more detailed analysis of group memberships, with the added functionality implemented in the `validate_group_membership` function in the `groups.py` file located in the `databricks/labs/ucx/workspace_access` directory. A new output field, "group\_members\_difference," has been added to represent the difference in the number of members between a workspace group and an associated account group. The corresponding unit test file, "test\_groups.py," has been updated to include a new test case that verifies the calculation of the "group\_members\_difference" value. This change provides users with a more comprehensive view of their group memberships and allows them to easily identify any discrepancies between the account and workspace levels. The functionality of the other commands remains unchanged. Improved installation integration test flakiness (#998) - improved `_infer_error_from_job_run` and `_infer_error_from_task_run` to also catch `KeyError` and `ValueError` - removed retries for `Unknown` errors for installation tests Expanded end-user documentation with detailed descriptions for workflows and commands (#999) The Databricks Labs UCX project has been updated with several new features to assist in upgrading to Unity Catalog. These include various workflows and command-line utilities, such as an assessment workflow that generates a detailed compatibility report for workspace entities and a group migration workflow to upgrade all Databricks workspace assets. Additionally, new utility commands have been added for managing cross-workspace installations, and users can now view deployed workflows' status and repair failed workflows. A new end-user documentation has also been introduced, featuring comprehensive descriptions of workflows, commands, and an assessment report image. The Assessment Report, generated from UCX tools, now includes a more detailed summary of the assessment findings, table counts, database summaries, and external locations. Improved documentation for external Hive Metastore integration and a new debugging notebook are also included in this release. Lastly, the workspace group migration feature has been expanded to handle potential conflicts when migrating multiple workspaces with locally scoped group names. Release v0.14.0 (#1000) * Added `upgraded_from_workspace_id` property to migrated tables to indicated the source workspace ([#987](#987)). In this release, updates have been made to the `_migrate_external_table`, `_migrate_dbfs_root_table`, and `_migrate_view` methods in the `table_migrate.py` file to include a new parameter `upgraded_from_ws` in the SQL commands used to alter tables, views, or managed tables. This parameter is used to store the source workspace ID in the migrated tables, indicating the migration origin. A new utility method `sql_alter_from` has been added to the `Table` class in `tables.py` to generate the SQL command with the new parameter. Additionally, a new class-level attribute `UPGRADED_FROM_WS_PARAM` has been added to the `Table` class in `tables.py` to indicate the source workspace. A new property `upgraded_from_workspace_id` has been added to migrated tables to store the source workspace ID. These changes resolve issue [#899](#899) and are tested through manual testing, unit tests, and integration tests. No new CLI commands, workflows, or tables have been added or modified, and there are no changes to user documentation. * Added a command to create account level groups if they do not exist ([#763](#763)). This commit introduces a new feature that enables the creation of account-level groups if they do not already exist in the account. A new command, `create-account-groups`, has been added to the `databricks labs ucx` tool, which crawls all workspaces in the account and creates account-level groups if a corresponding workspace-local group is not found. The feature supports various scenarios, including creating account-level groups that exist in some workspaces but not in others, and creating multiple account-level groups with the same name but different members. Several new methods have been added to the `account.py` file to support the new feature, and the `test_account.py` file has been updated with new tests to ensure the correct behavior of the `create_account_level_groups` method. Additionally, the `cli.py` file has been updated to include the new `create-account-groups` command. With these changes, users can easily manage account-level groups and ensure that they are consistent across all workspaces in the account, improving the overall user experience. * Added assessment for the incompatible `RunSubmit` API usages ([#849](#849)). In this release, the assessment functionality for incompatible `RunSubmit` API usages has been significantly enhanced through various changes. The 'clusters.py' file has seen improvements in clarity and consistency with the renaming of private methods `check_spark_conf` to `_check_spark_conf` and `check_cluster_failures` to `_check_cluster_failures`. The `_assess_clusters` method has been updated to call the renamed `_check_cluster_failures` method for thorough checks of cluster configurations, resulting in better assessment functionality. A new `SubmitRunsCrawler` class has been added to the `databricks.labs.ucx.assessment.jobs` module, implementing `CrawlerBase`, `JobsMixin`, and `CheckClusterMixin` classes. This class crawls and assesses job runs based on their submitted runs, ensuring compatibility and identifying failure issues. Additionally, a new configuration attribute, `num_days_submit_runs_history`, has been introduced in the `WorkspaceConfig` class of the `config.py` module, controlling the number of days for which submission history of `RunSubmit` API calls is retained. Lastly, various new JSON files have been added for unit testing, assessing the `RunSubmit` API usages related to different scenarios like dbt task runs, Git source-based job runs, JAR file runs, and more. These tests will aid in identifying and addressing potential compatibility issues with the `RunSubmit` API. * Added group members difference to the output of `validate-groups-membership` cli command ([#995](#995)). The `validate-groups-membership` command has been updated to include a comparison of group memberships at both the account and workspace levels. This enhancement is implemented through the `validate_group_membership` function, which has been updated to calculate the difference in members between the two levels and display it in a new `group_members_difference` column. This allows for a more detailed analysis of group memberships and easily identifies any discrepancies between the account and workspace levels. The corresponding unit test file, "test_groups.py," has been updated to include a new test case that verifies the calculation of the `group_members_difference` value. The functionality of the other commands remains unchanged. The new `group_members_difference` value is calculated as the difference in the number of members in the workspace group and the account group, with a positive value indicating more members in the workspace group and a negative value indicating more members in the account group. The table template in the labs.yml file has also been updated to include the new column for the group membership difference. * Added handling for empty `directory_id` if managed identity encountered during the crawling of StoragePermissionMapping ([#986](#986)). This PR adds a `type` field to the `StoragePermissionMapping` and `Principal` dataclasses to differentiate between service principals and managed identities, allowing `None` for the `directory_id` field if the principal is not a service principal. During the migration to UC storage credentials, managed identities are currently ignored. These changes improve handling of managed identities during the crawling of `StoragePermissionMapping`, prevent errors when creating storage credentials with managed identities, and address issue [#339](#339). The changes are tested through unit tests, manual testing, and integration tests, and only affect the `StoragePermissionMapping` class and related methods, without introducing new commands, workflows, or tables. * Added migration for Azure Service Principals with secrets stored in Databricks Secret to UC Storage Credentials ([#874](#874)). In this release, we have made significant updates to migrate Azure Service Principals with their secrets stored in Databricks Secret to UC Storage Credentials, enhancing security and management of storage access. The changes include: Addition of a new `migrate_credentials` command in the `labs.yml` file to migrate credentials for storage access to UC storage credential. Modification of `secrets.py` to handle the case where a secret has been removed from the backend and to log warning messages for secrets with invalid Base64 bytes. Introduction of the `StorageCredentialManager` and `ServicePrincipalMigration` classes in `credentials.py` to manage Azure Service Principals and their associated client secrets, and to migrate them to UC Storage Credentials. Addition of a new `directory_id` attribute in the `Principal` class and its associated dataclass in `resources.py` to store the directory ID for creating UC storage credentials using a service principal. Creation of a new pytest fixture, `make_storage_credential_spn`, in `fixtures.py` to simplify writing tests requiring Databricks Storage Credentials with Azure Service Principal auth. Addition of a new test file for the Azure integration of the project, including new classes, methods, and test cases for testing the migration of Azure Service Principals to UC Storage Credentials. These improvements will ensure better security and management of storage access using Azure Service Principals, while providing more efficient and robust testing capabilities. * Added permission migration support for feature tables and the root permissions for models and feature tables ([#997](#997)). This commit introduces support for migration of permissions related to feature tables and sets root permissions for models and feature tables. New functions such as `feature_store_listing`, `feature_tables_root_page`, `models_root_page`, and `tokens_and_passwords` have been added to facilitate population of a workspace access page with necessary permissions information. The `factory` function in `manager.py` has been updated to include new listings for models' root page, feature tables' root page, and the feature store for enhanced management and access control of models and feature tables. New classes and methods have been implemented to handle permissions for these resources, utilizing `GenericPermissionsSupport`, `AccessControlRequest`, and `MigratedGroup` classes. Additionally, new test methods have been included to verify feature tables listing functionality and root page listing functionality for feature tables and registered models. The test manager method has been updated to include `feature-tables` in the list of items to be checked for permissions, ensuring comprehensive testing of permission functionality related to these new feature tables. * Added support for serving endpoints ([#990](#990)). In this release, we have made significant enhancements to support serving endpoints in our open-source library. The `fixtures.py` file in the `databricks.labs.ucx.mixins` module has been updated with new classes and functions to create and manage serving endpoints, accompanied by integration tests to verify their functionality. We have added a new listing for serving endpoints in the assessment's permissions crawling, using the `ws.serving_endpoints.list` function and the `serving-endpoints` category. A new integration test, "test_endpoints," has been added to verify that assessments now crawl permissions for serving endpoints. This test demonstrates the ability to migrate permissions from one group to another. The test suite has been updated to ensure the proper functioning of the new feature and improve the assessment of permissions for serving endpoints, ensuring compatibility with the updated `test_manager.py` file. * Expanded end-user documentation with detailed descriptions for workflows and commands ([#999](#999)). The Databricks Labs UCX project has been updated with several new features to assist in upgrading to Unity Catalog, including an assessment workflow that generates a detailed compatibility report for workspace entities, a group migration workflow for upgrading all Databricks workspace assets, and utility commands for managing cross-workspace installations. The Assessment Report now includes a more detailed summary of the assessment findings, table counts, database summaries, and external locations. Additional improvements include expanded workspace group migration to handle potential conflicts with locally scoped group names, enhanced documentation for external Hive Metastore integration, a new debugging notebook, and detailed descriptions of table upgrade considerations, data access permissions, external storage, and table crawler. * Fixed `config.yml` upgrade from very old versions ([#984](#984)). In this release, we've introduced enhancements to the configuration upgrading process for `config.yml` in our open-source library. We've replaced the previous `v1_migrate` class method with a new implementation that specifically handles migration from version 1. The new method retrieves the `groups` field, extracts the `selected` value, and assigns it to the `include_group_names` key in the configuration. The `backup_group_prefix` value from the `groups` field is assigned to the `renamed_group_prefix` key, and the `groups` field is removed, with the version number updated to 2. These changes simplify the code and improve readability, enabling users to upgrade smoothly from version 1 of the configuration. Furthermore, we've added new unit tests to the `test_config.py` file to ensure backward compatibility. Two new tests, `test_v1_migrate_zeroconf` and `test_v1_migrate_some_conf`, have been added, utilizing the `MockInstallation` class and loading the configuration using `WorkspaceConfig`. These tests enhance the robustness and reliability of the migration process for `config.yml`. * Renamed columns in assessment SQL queries to use actual names, not aliases ([#983](#983)). In this update, we have resolved an issue where aliases used for column references in SQL queries caused errors in certain setups by renaming them to use actual names. Specifically, for assessment SQL queries, we have modified the definition of the `is_delta` column to use the actual `table_format` name instead of the alias `format`. This change improves compatibility and enhances the reliability of query execution. As a software engineer, you will appreciate that this modification ensures consistent interpretation of column references across various setups, thereby avoiding potential errors caused by aliases. This change does not introduce any new methods, but instead modifies existing functionality to use actual column names, ensuring a more reliable and consistent SQL query for the `05_0_all_tables` assessment. * Updated groups permissions validation to use Table ACL cluster ([#979](#979)). In this update, the `validate_groups_permissions` task has been modified to utilize the Table ACL cluster, as indicated by the inclusion of `job_cluster="tacl"`. This task is responsible for ensuring that all crawled permissions are accurately applied to the destination groups by calling the `permission_manager.apply_group_permissions` method during the migration state. This modification enhances the validation of group permissions by performing it on the Table ACL cluster, potentially improving performance or functionality. If you are implementing this project, it is crucial to comprehend the consequences of this change on your permissions validation process and adjust your workflows appropriately. Update databricks-labs-blueprint requirement from ~=0.2.4 to ~=0.3.0 (#1001) Updates the requirements on [databricks-labs-blueprint](https://github.com/databrickslabs/blueprint) to permit the latest version. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/databrickslabs/blueprint/releases">databricks-labs-blueprint's releases</a>.</em></p> <blockquote> <h2>v0.3.0</h2> <ul> <li>Added automated upgrade framework (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/50">#50</a>). This update introduces an automated upgrade framework for managing and applying upgrades to the product, with a new <code>upgrades.py</code> file that includes a <code>ProductInfo</code> class having methods for version handling, wheel building, and exception handling. The test code organization has been improved, and new test cases, functions, and a directory structure for fixtures and unit tests have been added for the upgrades functionality. The <code>test_wheels.py</code> file now checks the version of the Databricks SDK and handles cases where the version marker is missing or does not contain the <code>__version__</code> variable. Additionally, a new <code>Application State Migrations</code> section has been added to the README, explaining the process of seamless upgrades from version X to version Z through version Y, addressing the need for configuration or database state migrations as the application evolves. Users can apply these upgrades by following an idiomatic usage pattern involving several classes and functions. Furthermore, improvements have been made to the <code>_trim_leading_whitespace</code> function in the <code>commands.py</code> file of the <code>databricks.labs.blueprint</code> module, ensuring accurate and consistent removal of leading whitespace for each line in the command string, leading to better overall functionality and maintainability.</li> <li>Added brute-forcing <code>SerdeError</code> with <code>as_dict()</code> and <code>from_dict()</code> (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/58">#58</a>). This commit introduces a brute-forcing approach for handling <code>SerdeError</code> using <code>as_dict()</code> and <code>from_dict()</code> methods in an open-source library. The new <code>SomePolicy</code> class demonstrates the usage of these methods for manual serialization and deserialization of custom classes. The <code>as_dict()</code> method returns a dictionary representation of the class instance, and the <code>from_dict()</code> method, decorated with <code>@classmethod</code>, creates a new instance from the provided dictionary. Additionally, the GitHub Actions workflow for acceptance tests has been updated to include the <code>ready_for_review</code> event type, ensuring that tests run not only for opened and synchronized pull requests but also when marked as &quot;ready for review.&quot; These changes provide developers with more control over the deserialization process and facilitate debugging in cases where default deserialization fails, but should be used judiciously to avoid brittle code.</li> <li>Fixed nightly integration tests run as service principals (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/52">#52</a>). In this release, we have enhanced the compatibility of our codebase with service principals, particularly in the context of nightly integration tests. The <code>Installation</code> class in the <code>databricks.labs.blueprint.installation</code> module has been refactored, deprecating the <code>current</code> method and introducing two new methods: <code>assume_global</code> and <code>assume_user_home</code>. These methods enable users to install and manage <code>blueprint</code> as either a global or user-specific installation. Additionally, the <code>existing</code> method has been updated to work with the new <code>Installation</code> methods. In the test suite, the <code>test_installation.py</code> file has been updated to correctly detect global and user-specific installations when running as a service principal. These changes improve the testability and functionality of our software, ensuring seamless operation with service principals during nightly integration tests.</li> <li>Made <code>test_existing_installations_are_detected</code> more resilient (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/51">#51</a>). In this release, we have added a new test function <code>test_existing_installations_are_detected</code> that checks if existing installations are correctly detected and retries the test for up to 15 seconds if they are not. This improves the reliability of the test by making it more resilient to potential intermittent failures. We have also added an import from <code>databricks.sdk.retries</code> named <code>retried</code> which is used to retry the test function in case of an <code>AssertionError</code>. Additionally, the test function <code>test_existing</code> has been renamed to <code>test_existing_installations_are_detected</code> and the <code>xfail</code> marker has been removed. We have also renamed the test function <code>test_dataclass</code> to <code>test_loading_dataclass_from_installation</code> for better clarity. This change will help ensure that the library is correctly detecting existing installations and improve the overall quality of the codebase.</li> </ul> <p>Contributors: <a href="https://github.com/nfx"><code>@​nfx</code></a></p> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/databrickslabs/blueprint/blob/main/CHANGELOG.md">databricks-labs-blueprint's changelog</a>.</em></p> <blockquote> <h2>0.3.0</h2> <ul> <li>Added automated upgrade framework (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/50">#50</a>). This update introduces an automated upgrade framework for managing and applying upgrades to the product, with a new <code>upgrades.py</code> file that includes a <code>ProductInfo</code> class having methods for version handling, wheel building, and exception handling. The test code organization has been improved, and new test cases, functions, and a directory structure for fixtures and unit tests have been added for the upgrades functionality. The <code>test_wheels.py</code> file now checks the version of the Databricks SDK and handles cases where the version marker is missing or does not contain the <code>__version__</code> variable. Additionally, a new <code>Application State Migrations</code> section has been added to the README, explaining the process of seamless upgrades from version X to version Z through version Y, addressing the need for configuration or database state migrations as the application evolves. Users can apply these upgrades by following an idiomatic usage pattern involving several classes and functions. Furthermore, improvements have been made to the <code>_trim_leading_whitespace</code> function in the <code>commands.py</code> file of the <code>databricks.labs.blueprint</code> module, ensuring accurate and consistent removal of leading whitespace for each line in the command string, leading to better overall functionality and maintainability.</li> <li>Added brute-forcing <code>SerdeError</code> with <code>as_dict()</code> and <code>from_dict()</code> (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/58">#58</a>). This commit introduces a brute-forcing approach for handling <code>SerdeError</code> using <code>as_dict()</code> and <code>from_dict()</code> methods in an open-source library. The new <code>SomePolicy</code> class demonstrates the usage of these methods for manual serialization and deserialization of custom classes. The <code>as_dict()</code> method returns a dictionary representation of the class instance, and the <code>from_dict()</code> method, decorated with <code>@classmethod</code>, creates a new instance from the provided dictionary. Additionally, the GitHub Actions workflow for acceptance tests has been updated to include the <code>ready_for_review</code> event type, ensuring that tests run not only for opened and synchronized pull requests but also when marked as &quot;ready for review.&quot; These changes provide developers with more control over the deserialization process and facilitate debugging in cases where default deserialization fails, but should be used judiciously to avoid brittle code.</li> <li>Fixed nightly integration tests run as service principals (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/52">#52</a>). In this release, we have enhanced the compatibility of our codebase with service principals, particularly in the context of nightly integration tests. The <code>Installation</code> class in the <code>databricks.labs.blueprint.installation</code> module has been refactored, deprecating the <code>current</code> method and introducing two new methods: <code>assume_global</code> and <code>assume_user_home</code>. These methods enable users to install and manage <code>blueprint</code> as either a global or user-specific installation. Additionally, the <code>existing</code> method has been updated to work with the new <code>Installation</code> methods. In the test suite, the <code>test_installation.py</code> file has been updated to correctly detect global and user-specific installations when running as a service principal. These changes improve the testability and functionality of our software, ensuring seamless operation with service principals during nightly integration tests.</li> <li>Made <code>test_existing_installations_are_detected</code> more resilient (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/51">#51</a>). In this release, we have added a new test function <code>test_existing_installations_are_detected</code> that checks if existing installations are correctly detected and retries the test for up to 15 seconds if they are not. This improves the reliability of the test by making it more resilient to potential intermittent failures. We have also added an import from <code>databricks.sdk.retries</code> named <code>retried</code> which is used to retry the test function in case of an <code>AssertionError</code>. Additionally, the test function <code>test_existing</code> has been renamed to <code>test_existing_installations_are_detected</code> and the <code>xfail</code> marker has been removed. We have also renamed the test function <code>test_dataclass</code> to <code>test_loading_dataclass_from_installation</code> for better clarity. This change will help ensure that the library is correctly detecting existing installations and improve the overall quality of the codebase.</li> </ul> <h2>0.2.5</h2> <ul> <li>Automatically enable workspace filesystem if the feature is disabled (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/42">#42</a>).</li> </ul> <h2>0.2.4</h2> <ul> <li>Added more integration tests for <code>Installation</code> (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/39">#39</a>).</li> <li>Fixed <code>yaml</code> optional import error (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/38">#38</a>).</li> </ul> <h2>0.2.3</h2> <ul> <li>Added special handling for notebooks in <code>Installation.upload(...)</code> (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/36">#36</a>).</li> </ul> <h2>0.2.2</h2> <ul> <li>Fixed issues with uploading wheels to DBFS and loading a non-existing install state (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/34">#34</a>).</li> </ul> <h2>0.2.1</h2> <ul> <li>Aligned <code>Installation</code> framework with UCX project (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/32">#32</a>).</li> </ul> <h2>0.2.0</h2> <ul> <li>Added common install state primitives with strong typing (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/27">#27</a>).</li> <li>Added documentation for Invoking Databricks Connect (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/28">#28</a>).</li> <li>Added more documentation for Databricks CLI command router (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/30">#30</a>).</li> <li>Enforced <code>pylint</code> standards (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/29">#29</a>).</li> </ul> <h2>0.1.0</h2> <ul> <li>Changed python requirement from 3.10.6 to 3.10 (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/25">#25</a>).</li> </ul> <h2>0.0.6</h2> <ul> <li>Make <code>find_project_root</code> more deterministic (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/23">#23</a>).</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/databrickslabs/blueprint/commit/905e5ff5303a005d48bc98d101a613afeda15d51"><code>905e5ff</code></a> Release v0.3.0 (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/59">#59</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/a029f6bb1ecf807017754e298ea685326dbedf72"><code>a029f6b</code></a> Added brute-forcing <code>SerdeError</code> with <code>as_dict()</code> and <code>from_dict()</code> (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/58">#58</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/c8a74f4129b4592d365aac9670eb86069f3517f7"><code>c8a74f4</code></a> Added automated upgrade framework (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/50">#50</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/24e62ef4f060e43e02c92a7d082d95e8bc164317"><code>24e62ef</code></a> Don't run integration tests on draft pull requests (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/55">#55</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/b4dd5abf4eaf8d022ae0b6ec7e659296ec3d2f37"><code>b4dd5ab</code></a> Added tokei.rs badge (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/54">#54</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/01d9467f425763ab08035001270593253bce11f0"><code>01d9467</code></a> Fixed nightly integration tests run as service principals (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/52">#52</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/aa5714179c65be8e13f54601e1d1fcd70548342d"><code>aa57141</code></a> Made <code>test_existing_installations_are_detected</code> more resilient (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/51">#51</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/9cbc6f863d3ea06659f37939cf1b97115dd873bd"><code>9cbc6f8</code></a> Bump <code>databrickslabs/sandbox/acceptance</code> to v0.1.0 (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/48">#48</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/22fc1a8787b8e98de03048595202f88b7ddb9b94"><code>22fc1a8</code></a> Use <code>databrickslabs/sandbox/acceptance</code> action (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/45">#45</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/c7e47abd82b2f04e95b1d91f346cc1ea6df43961"><code>c7e47ab</code></a> Release v0.2.5 (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/44">#44</a>)</li> <li>Additional commits viewable in <a href="https://github.com/databrickslabs/blueprint/compare/v0.2.4...v0.3.0">compare view</a></li> </ul> </details> <br /> Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Run integration tests only for pull requests ready for review (#1002) Tested on https://github.com/databrickslabs/blueprint Reducing flakiness of create account groups (#1003) Prompt user if Terraform utilised for deploying infrastructure (#1004) Added prompt is_terraform_used and updated the same in the config of WorkspaceInstaller Resolves #393 --------- Co-authored-by: Serge Smertin <[email protected]> Update CONTRIBUTING.md (#1005) Closes #850 Added `databricks labs ucx create-uber-principal` command to create Azure Service Principal for migration (#976) - Added new cli cmd for create-master-principal in labs.yml, cli.py - Added separate class for AzureApiClient to separate out azure API calls - Added logic to create SPN, secret, roleassignment in resources and update workspace config with spn client_id - added logic to call create spn, update rbac of all storage account to that spn, update ucx cluster policy with spn secret for each storage account - test unit and int test cases Resolves #881 Related issues: - #993 - #693 - [ ] added relevant user documentation - [X] added new CLI command - [ ] modified existing command: `databricks labs ucx ...` - [ ] added a new workflow - [ ] modified existing workflow: `...` - [ ] added a new table - [ ] modified existing table: `...` - [X] manually tested - [X] added unit tests - [X] added integration tests - [ ] verified on staging environment (screenshot attached) Fix gitguardian warning caused by "hello world" secret used in unit test (#1010) Replace the plain encoded string by base64.b64encode to mitigate the gitguardian warning. <!-- DOC: Link issue with a keyword: close, closes, closed, fix, fixes, fixed, resolve, resolves, resolved. See https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword --> Resolves #.. - [ ] added relevant user documentation - [ ] added new CLI command - [ ] modified existing command: `databricks labs ucx ...` - [ ] added a new workflow - [ ] modified existing workflow: `...` - [ ] added a new table - [ ] modified existing table: `...` <!-- How is this tested? Please see the checklist below and also describe any other relevant tests --> - [ ] manually tested - [ ] added unit tests - [ ] added integration tests - [ ] verified on staging environment (screenshot attached) Create UC external locations in Azure based on migrated storage credentials (#992) Handle widget delete on upgrade platform bug (#1011)
1 parent c866d42 commit 7735a71

29 files changed

+1851
-208
lines changed

labs.yml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,13 @@ commands:
108108
- name: aws-profile
109109
description: AWS Profile to use for authentication
110110

111+
- name: create-uber-principal
112+
description: For azure cloud, creates a service principal and gives STORAGE BLOB READER access on all the storage account
113+
used by tables in the workspace and stores the spn info in the UCX cluster policy.
114+
flags:
115+
- name: subscription-id
116+
description: Subscription to scan storage account in
117+
111118
- name: validate-groups-membership
112119
description: Validate groups to check if the groups at account level and workspace level have different memberships
113120
table_template: |-

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ classifiers = [
2727
"Programming Language :: Python :: Implementation :: CPython",
2828
]
2929
dependencies = ["databricks-sdk~=0.20.0",
30-
"databricks-labs-blueprint~=0.3.0",
30+
"databricks-labs-blueprint~=0.3.1",
3131
"PyYAML>=6.0.0,<7.0.0"]
3232

3333
[project.entry-points.databricks]

src/databricks/labs/ucx/assessment/aws.py

Lines changed: 20 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -223,8 +223,8 @@ def _s3_actions(self, actions):
223223
s3_actions = [actions]
224224
return s3_actions
225225

226-
def add_uc_role(self, role_name):
227-
aws_role_trust_doc = {
226+
def _aws_role_trust_doc(self, external_id="0000"):
227+
return {
228228
"Version": "2012-10-17",
229229
"Statement": [
230230
{
@@ -233,20 +233,33 @@ def add_uc_role(self, role_name):
233233
"AWS": "arn:aws:iam::414351767826:role/unity-catalog-prod-UCMasterRole-14S5ZJVKOTYTL"
234234
},
235235
"Action": "sts:AssumeRole",
236-
"Condition": {"StringEquals": {"sts:ExternalId": "0000"}},
236+
"Condition": {"StringEquals": {"sts:ExternalId": external_id}},
237237
}
238238
],
239239
}
240+
241+
def add_uc_role(self, role_name):
240242
# the AssumeRole condition will be modified with the external ID captured from the UC credential.
241243
# https://docs.databricks.com/en/connect/unity-catalog/storage-credentials.html
242-
assume_role_json = self._get_json_for_cli(aws_role_trust_doc)
244+
assume_role_json = self._get_json_for_cli(self._aws_role_trust_doc())
243245
add_role = self._run_json_command(
244246
f"iam create-role --role-name {role_name} --assume-role-policy-document {assume_role_json}"
245247
)
246248
if not add_role:
247249
return False
248250
return True
249251

252+
def update_uc_trust_role(self, role_name, external_id="0000"):
253+
# Modify the AssumeRole condition with the external ID captured from the UC credential.
254+
# https://docs.databricks.com/en/connect/unity-catalog/storage-credentials.html
255+
assume_role_json = self._get_json_for_cli(self._aws_role_trust_doc(external_id))
256+
update_role = self._run_json_command(
257+
f"iam update-assume-role-policy --role-name {role_name} --policy-document {assume_role_json}"
258+
)
259+
if not update_role:
260+
return False
261+
return True
262+
250263
def add_uc_role_policy(self, role_name, policy_name, s3_prefixes: set[str], account_id: str, kms_key=None):
251264
s3_prefixes_enriched = sorted([self.S3_PREFIX + s3_prefix for s3_prefix in s3_prefixes])
252265
statement = [
@@ -374,6 +387,9 @@ def create_uc_roles_cli(self, *, single_role=True, role_name="UC_ROLE", policy_n
374387
)
375388
role_id += 1
376389

390+
def update_uc_role_trust_policy(self, role_name, external_id="0000"):
391+
return self._aws_resources.update_uc_trust_role(role_name, external_id)
392+
377393
def save_uc_compatible_roles(self):
378394
uc_role_access = list(self._get_role_access())
379395
if len(uc_role_access) == 0:

src/databricks/labs/ucx/aws/credentials.py

Lines changed: 18 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ def create(self, role_action: AWSRoleAction) -> StorageCredentialInfo:
6161
return self._ws.storage_credentials.create(
6262
role_action.role_name,
6363
aws_iam_role=AwsIamRole(role_action.role_arn),
64-
comment=f"Created by UCX during migration to UC using AWS instance profile: {role_action.role_name}",
64+
comment=f"Created by UCX during migration to UC using AWS IAM Role: {role_action.role_name}",
6565
)
6666

6767
def validate(self, role_action: AWSRoleAction) -> AWSStorageCredentialValidationResult:
@@ -74,7 +74,7 @@ def validate(self, role_action: AWSRoleAction) -> AWSStorageCredentialValidation
7474
except InvalidParameterValue:
7575
logger.warning(
7676
"There is an existing external location overlaps with the prefix that is mapped to "
77-
"the instance profile and used for validating the migrated storage credential. "
77+
"the IAM Role and used for validating the migrated storage credential. "
7878
"Skip the validation"
7979
)
8080
return AWSStorageCredentialValidationResult(
@@ -112,7 +112,7 @@ def validate(self, role_action: AWSRoleAction) -> AWSStorageCredentialValidation
112112
)
113113

114114

115-
class InstanceProfileMigration:
115+
class IamRoleMigration:
116116

117117
def __init__(
118118
self,
@@ -121,7 +121,7 @@ def __init__(
121121
resource_permissions: AWSResourcePermissions,
122122
storage_credential_manager: AWSStorageCredentialManager,
123123
):
124-
self._output_file = "aws_instance_profile_migration_result.csv"
124+
self._output_file = "aws_iam_role_migration_result.csv"
125125
self._installation = installation
126126
self._ws = ws
127127
self._resource_permissions = resource_permissions
@@ -135,7 +135,7 @@ def for_cli(cls, ws: WorkspaceClient, installation: Installation, aws_profile: s
135135

136136
msg = (
137137
f"Have you reviewed the {AWSResourcePermissions.UC_ROLES_FILE_NAMES} "
138-
"and confirm listed instance profiles to be migrated migration?"
138+
"and confirm listed IAM roles to be migrated?"
139139
)
140140
if not prompts.confirm(msg):
141141
raise SystemExit()
@@ -162,7 +162,7 @@ def _generate_migration_list(self, include_names: set[str] | None = None) -> lis
162162
"""
163163
Create the list of IAM roles that need to be migrated, output an action plan as a csv file for users to confirm
164164
"""
165-
# load instance profile list from aws_instance_profile_info.csv
165+
# load IAM role list
166166
iam_list = self._resource_permissions.load_uc_compatible_roles()
167167
# list existing storage credentials
168168
sc_set = self._storage_credential_manager.list(include_names)
@@ -184,22 +184,30 @@ def run(
184184
iam_list = self._generate_migration_list(include_names)
185185

186186
plan_confirmed = prompts.confirm(
187-
"Above Instance Profiles will be migrated to UC storage credentials, please review and confirm."
187+
"Above IAM roles will be migrated to UC storage credentials, please review and confirm."
188188
)
189189
if plan_confirmed is not True:
190190
return []
191191

192192
execution_result = []
193193
for iam in iam_list:
194-
self._storage_credential_manager.create(iam)
194+
storage_credential = self._storage_credential_manager.create(iam)
195+
if storage_credential.aws_iam_role is None:
196+
logger.error(f"Failed to create storage credential for IAM role: {iam.role_arn}")
197+
continue
198+
199+
self._resource_permissions.update_uc_role_trust_policy(
200+
iam.role_arn, storage_credential.aws_iam_role.external_id
201+
)
202+
195203
execution_result.append(self._storage_credential_manager.validate(iam))
196204

197205
if execution_result:
198206
results_file = self.save(execution_result)
199207
logger.info(
200-
f"Completed migration from Instance Profile to UC Storage credentials"
208+
f"Completed migration from IAM Role to UC Storage credentials"
201209
f"Please check {results_file} for validation results"
202210
)
203211
else:
204-
logger.info("No Instance Profile migrated to UC Storage credentials")
212+
logger.info("No IAM Role migrated to UC Storage credentials")
205213
return execution_result

src/databricks/labs/ucx/azure/access.py

Lines changed: 131 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,20 @@
1+
import json
2+
import uuid
13
from dataclasses import dataclass
24

35
from databricks.labs.blueprint.installation import Installation
6+
from databricks.labs.blueprint.tui import Prompts
47
from databricks.sdk import WorkspaceClient
8+
from databricks.sdk.errors import NotFound, ResourceAlreadyExists
59
from databricks.sdk.service.catalog import Privilege
610

711
from databricks.labs.ucx.assessment.crawlers import logger
8-
from databricks.labs.ucx.azure.resources import AzureResource, AzureResources
12+
from databricks.labs.ucx.azure.resources import (
13+
AzureAPIClient,
14+
AzureResource,
15+
AzureResources,
16+
PrincipalSecret,
17+
)
918
from databricks.labs.ucx.config import WorkspaceConfig
1019
from databricks.labs.ucx.framework.crawlers import StatementExecutionBackend
1120
from databricks.labs.ucx.hive_metastore.locations import ExternalLocations
@@ -46,7 +55,12 @@ def for_cli(cls, ws: WorkspaceClient, product='ucx', include_subscriptions=None)
4655
installation = Installation.current(ws, product)
4756
config = installation.load(WorkspaceConfig)
4857
sql_backend = StatementExecutionBackend(ws, config.warehouse_id)
49-
azurerm = AzureResources(ws, include_subscriptions=include_subscriptions)
58+
azure_mgmt_client = AzureAPIClient(
59+
ws.config.arm_environment.resource_manager_endpoint,
60+
ws.config.arm_environment.service_management_endpoint,
61+
)
62+
graph_client = AzureAPIClient("https://graph.microsoft.com", "https://graph.microsoft.com")
63+
azurerm = AzureResources(azure_mgmt_client, graph_client, include_subscriptions)
5064
locations = ExternalLocations(ws, sql_backend, config.inventory_database)
5165
return cls(installation, ws, azurerm, locations)
5266

@@ -91,6 +105,121 @@ def save_spn_permissions(self) -> str | None:
91105
return None
92106
return self._installation.save(storage_account_infos, filename=self._filename)
93107

108+
def _update_cluster_policy_definition(
109+
self,
110+
policy_definition: str,
111+
storage_accounts: list[AzureResource],
112+
uber_principal: PrincipalSecret,
113+
inventory_database: str,
114+
) -> str:
115+
policy_dict = json.loads(policy_definition)
116+
tenant_id = self._azurerm.tenant_id()
117+
endpoint = f"https://login.microsoftonline.com/{tenant_id}/oauth2/token"
118+
for storage in storage_accounts:
119+
policy_dict[
120+
f"spark_conf.fs.azure.account.oauth2.client.id.{storage.storage_account}.dfs.core.windows.net"
121+
] = self._policy_config(uber_principal.client.client_id)
122+
policy_dict[
123+
f"spark_conf.fs.azure.account.oauth.provider.type.{storage.storage_account}.dfs.core.windows.net"
124+
] = self._policy_config("org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
125+
policy_dict[
126+
f"spark_conf.fs.azure.account.oauth2.client.endpoint.{storage.storage_account}.dfs.core.windows.net"
127+
] = self._policy_config(endpoint)
128+
policy_dict[f"spark_conf.fs.azure.account.auth.type.{storage.storage_account}.dfs.core.windows.net"] = (
129+
self._policy_config("OAuth")
130+
)
131+
policy_dict[
132+
f"spark_conf.fs.azure.account.oauth2.client.secret.{storage.storage_account}.dfs.core.windows.net"
133+
] = self._policy_config(f"{{secrets/{inventory_database}/uber_principal_secret}}")
134+
return json.dumps(policy_dict)
135+
136+
@staticmethod
137+
def _policy_config(value: str):
138+
return {"type": "fixed", "value": value}
139+
140+
def _update_cluster_policy_with_spn(
141+
self,
142+
policy_id: str,
143+
storage_accounts: list[AzureResource],
144+
uber_principal: PrincipalSecret,
145+
inventory_database: str,
146+
):
147+
try:
148+
policy_definition = ""
149+
cluster_policy = self._ws.cluster_policies.get(policy_id)
150+
151+
self._installation.save(cluster_policy, filename="policy-backup.json")
152+
153+
if cluster_policy.definition is not None:
154+
policy_definition = self._update_cluster_policy_definition(
155+
cluster_policy.definition, storage_accounts, uber_principal, inventory_database
156+
)
157+
if cluster_policy.name is not None:
158+
self._ws.cluster_policies.edit(policy_id, cluster_policy.name, definition=policy_definition)
159+
except NotFound:
160+
msg = f"cluster policy {policy_id} not found, please run UCX installation to create UCX cluster policy"
161+
raise NotFound(msg) from None
162+
163+
def create_uber_principal(self, prompts: Prompts):
164+
config = self._installation.load(WorkspaceConfig)
165+
inventory_database = config.inventory_database
166+
display_name = f"unity-catalog-migration-{inventory_database}-{self._ws.get_workspace_id()}"
167+
uber_principal_name = prompts.question(
168+
"Enter a name for the uber service principal to be created", default=display_name
169+
)
170+
policy_id = config.policy_id
171+
if policy_id is None:
172+
msg = "UCX cluster policy not found in config. Please run latest UCX installation to set cluster policy"
173+
logger.error(msg)
174+
raise ValueError(msg) from None
175+
if config.uber_spn_id is not None:
176+
logger.warning("Uber service principal already created for this workspace.")
177+
return
178+
used_storage_accounts = self._get_storage_accounts()
179+
if len(used_storage_accounts) == 0:
180+
logger.warning(
181+
"There are no external table present with azure storage account. "
182+
"Please check if assessment job is run"
183+
)
184+
return
185+
storage_account_info = []
186+
for storage in self._azurerm.storage_accounts():
187+
if storage.storage_account in used_storage_accounts:
188+
storage_account_info.append(storage)
189+
logger.info("Creating service principal")
190+
uber_principal = self._azurerm.create_service_principal(uber_principal_name)
191+
self._create_scope(uber_principal, inventory_database)
192+
config.uber_spn_id = uber_principal.client.client_id
193+
logger.info(
194+
f"Created service principal of client_id {config.uber_spn_id}. " f"Applying permission on storage accounts"
195+
)
196+
try:
197+
self._apply_storage_permission(storage_account_info, uber_principal)
198+
self._installation.save(config)
199+
self._update_cluster_policy_with_spn(policy_id, storage_account_info, uber_principal, inventory_database)
200+
except PermissionError:
201+
self._azurerm.delete_service_principal(uber_principal.client.object_id)
202+
logger.info(f"Update UCX cluster policy {policy_id} with spn connection details for storage accounts")
203+
204+
def _apply_storage_permission(self, storage_account_info: list[AzureResource], uber_principal: PrincipalSecret):
205+
for storage in storage_account_info:
206+
role_name = str(uuid.uuid4())
207+
self._azurerm.apply_storage_permission(
208+
uber_principal.client.object_id, storage, "STORAGE_BLOB_DATA_READER", role_name
209+
)
210+
logger.debug(
211+
f"Storage Data Blob Reader permission applied for spn {uber_principal.client.client_id} "
212+
f"to storage account {storage.storage_account}"
213+
)
214+
215+
def _create_scope(self, uber_principal: PrincipalSecret, inventory_database: str):
216+
logger.info(f"Creating secret scope {inventory_database}.")
217+
try:
218+
self._ws.secrets.create_scope(inventory_database)
219+
except ResourceAlreadyExists:
220+
logger.warning(f"Secret scope {inventory_database} already exists, using the same")
221+
self._ws.secrets.put_secret(inventory_database, "uber_principal_secret", string_value=uber_principal.secret)
222+
94223
def load(self):
95224
return self._installation.load(list[StoragePermissionMapping], filename=self._filename)
96225

src/databricks/labs/ucx/azure/credentials.py

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
AzureResourcePermissions,
1919
StoragePermissionMapping,
2020
)
21-
from databricks.labs.ucx.azure.resources import AzureResources
21+
from databricks.labs.ucx.azure.resources import AzureAPIClient, AzureResources
2222
from databricks.labs.ucx.config import WorkspaceConfig
2323
from databricks.labs.ucx.framework.crawlers import StatementExecutionBackend
2424
from databricks.labs.ucx.hive_metastore.locations import ExternalLocations
@@ -171,7 +171,12 @@ def for_cli(cls, ws: WorkspaceClient, installation: Installation, prompts: Promp
171171

172172
config = installation.load(WorkspaceConfig)
173173
sql_backend = StatementExecutionBackend(ws, config.warehouse_id)
174-
azurerm = AzureResources(ws)
174+
azure_mgmt_client = AzureAPIClient(
175+
ws.config.arm_environment.resource_manager_endpoint,
176+
ws.config.arm_environment.service_management_endpoint,
177+
)
178+
graph_client = AzureAPIClient("https://graph.microsoft.com", "https://graph.microsoft.com")
179+
azurerm = AzureResources(azure_mgmt_client, graph_client)
175180
locations = ExternalLocations(ws, sql_backend, config.inventory_database)
176181

177182
resource_permissions = AzureResourcePermissions(installation, ws, azurerm, locations)

0 commit comments

Comments
 (0)