Added AWS IAM role support to databricks labs ucx create-uber-principal command#993
Added AWS IAM role support to databricks labs ucx create-uber-principal command#993
databricks labs ucx create-uber-principal command#993Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #993 +/- ##
==========================================
- Coverage 88.98% 88.64% -0.35%
==========================================
Files 51 52 +1
Lines 6501 6655 +154
Branches 1169 1194 +25
==========================================
+ Hits 5785 5899 +114
- Misses 466 500 +34
- Partials 250 256 +6 ☔ View full report in Codecov by Sentry. |
|
✅ 109/109 passed, 19 skipped, 1h0m38s total Running from acceptance #1559 |
src/databricks/labs/ucx/cli.py
Outdated
|
|
||
| iam_policy_name = f"UCX_MIGRATION_POLICY_{config.inventory_database}" | ||
| prompts = Prompts() | ||
| if iam_role_name_in_cluster_policy and aws_permissions.is_role_exists(iam_role_name_in_cluster_policy): |
There was a problem hiding this comment.
this is too much logic for cli.py. move this into AWSPermissions method and pass it Prompts instance, so that it's unit testable. See service_principal_migration.run(prompts) as the example
src/databricks/labs/ucx/cli.py
Outdated
| logger.info(f"Cluster policy \"{cluster_policy.name}\" updated successfully") | ||
|
|
||
|
|
||
| def _get_cluster_policy(w: WorkspaceClient, policy_id: str) -> Policy: |
There was a problem hiding this comment.
should be a private method in AWSResourcePermissions
src/databricks/labs/ucx/cli.py
Outdated
| return w.cluster_policies.get(policy_id=policy_id) | ||
| except NotFound as err: | ||
| msg = f"UCX Policy {policy_id} not found, please reinstall UCX" | ||
| logger.error(msg) |
There was a problem hiding this comment.
no need to log error - it's caught up the stack
src/databricks/labs/ucx/cli.py
Outdated
| return None | ||
|
|
||
|
|
||
| def _update_cluster_policy_with_aws_instance_profile( |
There was a problem hiding this comment.
private method in AWSResourcePermissions
databricks labs ucx create-uber-principal command
…zure Service Principal for migration (#976) ## Changes - Added new cli cmd for create-master-principal in labs.yml, cli.py - Added separate class for AzureApiClient to separate out azure API calls - Added logic to create SPN, secret, roleassignment in resources and update workspace config with spn client_id - added logic to call create spn, update rbac of all storage account to that spn, update ucx cluster policy with spn secret for each storage account - test unit and int test cases Resolves #881 Related issues: - #993 - #693 ### Functionality - [ ] added relevant user documentation - [X] added new CLI command - [ ] modified existing command: `databricks labs ucx ...` - [ ] added a new workflow - [ ] modified existing workflow: `...` - [ ] added a new table - [ ] modified existing table: `...` ### Tests - [X] manually tested - [X] added unit tests - [X] added integration tests - [ ] verified on staging environment (screenshot attached)
author Vuong <[email protected]> 1709737244 +0000 committer Vuong <[email protected]> 1709739422 +0000 parent c866d42 author Vuong <[email protected]> 1709737244 +0000 committer Vuong <[email protected]> 1709739396 +0000 parent c866d42 author Vuong <[email protected]> 1709737244 +0000 committer Vuong <[email protected]> 1709739377 +0000 parent c866d42 author Vuong <[email protected]> 1709737244 +0000 committer Vuong <[email protected]> 1709739250 +0000 add trust relationship update Fix integration tests on AWS (#978) Update groups permissions validation to use Table ACL cluster (#979) Renamed columns in assessment SQL queries to use actual names, not aliases (#983) <!-- Summary of your changes that are easy to understand. Add screenshots when necessary --> Aliases are usually not allowed in projections (as they are replaced later in the query execution phases). While the DBSQL was smart enough to handle the references via aliases, for some setups this results in an error. Changing column references to use actual names fixes this. <!-- DOC: Link issue with a keyword: close, closes, closed, fix, fixes, fixed, resolve, resolves, resolved. See https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword --> Resolves #980 - [ ] added relevant user documentation - [ ] added new CLI command - [ ] modified existing command: `databricks labs ucx ...` - [ ] added a new workflow - [ ] modified existing workflow: `...` - [ ] added a new table - [ ] modified existing table: `...` <!-- How is this tested? Please see the checklist below and also describe any other relevant tests --> - [x] manually tested - [ ] added unit tests - [ ] added integration tests - [ ] verified on staging environment (screenshot attached) Fixed `config.yml` upgrade from very old versions (#984) Added `upgraded_from_workspace_id` property to migrated tables to indicated the source workspace. (#987) Added table parameter `upgraded_from_ws` to migrated tables. The parameters contains the sources workspace id. Resolves #899 - [ ] added relevant user documentation - [ ] added new CLI command - [ ] modified existing command: `databricks labs ucx ...` - [ ] added a new workflow - [ ] modified existing workflow: `...` - [ ] added a new table - [ ] modified existing table: `...` <!-- How is this tested? Please see the checklist below and also describe any other relevant tests --> - [x] manually tested - [x] added unit tests - [x] added integration tests - [x] verified on staging environment (screenshot attached) Added group members difference to the output of `validate-groups-membership` cli command (#995) The `validate-groups-membership` command has been updated to include a comparison of group memberships at both the account and workspace levels, displaying the difference in members between the two levels in a new column. This enhancement allows for a more detailed analysis of group memberships, with the added functionality implemented in the `validate_group_membership` function in the `groups.py` file located in the `databricks/labs/ucx/workspace_access` directory. A new output field, "group\_members\_difference," has been added to represent the difference in the number of members between a workspace group and an associated account group. The corresponding unit test file, "test\_groups.py," has been updated to include a new test case that verifies the calculation of the "group\_members\_difference" value. This change provides users with a more comprehensive view of their group memberships and allows them to easily identify any discrepancies between the account and workspace levels. The functionality of the other commands remains unchanged. Improved installation integration test flakiness (#998) - improved `_infer_error_from_job_run` and `_infer_error_from_task_run` to also catch `KeyError` and `ValueError` - removed retries for `Unknown` errors for installation tests Expanded end-user documentation with detailed descriptions for workflows and commands (#999) The Databricks Labs UCX project has been updated with several new features to assist in upgrading to Unity Catalog. These include various workflows and command-line utilities, such as an assessment workflow that generates a detailed compatibility report for workspace entities and a group migration workflow to upgrade all Databricks workspace assets. Additionally, new utility commands have been added for managing cross-workspace installations, and users can now view deployed workflows' status and repair failed workflows. A new end-user documentation has also been introduced, featuring comprehensive descriptions of workflows, commands, and an assessment report image. The Assessment Report, generated from UCX tools, now includes a more detailed summary of the assessment findings, table counts, database summaries, and external locations. Improved documentation for external Hive Metastore integration and a new debugging notebook are also included in this release. Lastly, the workspace group migration feature has been expanded to handle potential conflicts when migrating multiple workspaces with locally scoped group names. Release v0.14.0 (#1000) * Added `upgraded_from_workspace_id` property to migrated tables to indicated the source workspace ([#987](#987)). In this release, updates have been made to the `_migrate_external_table`, `_migrate_dbfs_root_table`, and `_migrate_view` methods in the `table_migrate.py` file to include a new parameter `upgraded_from_ws` in the SQL commands used to alter tables, views, or managed tables. This parameter is used to store the source workspace ID in the migrated tables, indicating the migration origin. A new utility method `sql_alter_from` has been added to the `Table` class in `tables.py` to generate the SQL command with the new parameter. Additionally, a new class-level attribute `UPGRADED_FROM_WS_PARAM` has been added to the `Table` class in `tables.py` to indicate the source workspace. A new property `upgraded_from_workspace_id` has been added to migrated tables to store the source workspace ID. These changes resolve issue [#899](#899) and are tested through manual testing, unit tests, and integration tests. No new CLI commands, workflows, or tables have been added or modified, and there are no changes to user documentation. * Added a command to create account level groups if they do not exist ([#763](#763)). This commit introduces a new feature that enables the creation of account-level groups if they do not already exist in the account. A new command, `create-account-groups`, has been added to the `databricks labs ucx` tool, which crawls all workspaces in the account and creates account-level groups if a corresponding workspace-local group is not found. The feature supports various scenarios, including creating account-level groups that exist in some workspaces but not in others, and creating multiple account-level groups with the same name but different members. Several new methods have been added to the `account.py` file to support the new feature, and the `test_account.py` file has been updated with new tests to ensure the correct behavior of the `create_account_level_groups` method. Additionally, the `cli.py` file has been updated to include the new `create-account-groups` command. With these changes, users can easily manage account-level groups and ensure that they are consistent across all workspaces in the account, improving the overall user experience. * Added assessment for the incompatible `RunSubmit` API usages ([#849](#849)). In this release, the assessment functionality for incompatible `RunSubmit` API usages has been significantly enhanced through various changes. The 'clusters.py' file has seen improvements in clarity and consistency with the renaming of private methods `check_spark_conf` to `_check_spark_conf` and `check_cluster_failures` to `_check_cluster_failures`. The `_assess_clusters` method has been updated to call the renamed `_check_cluster_failures` method for thorough checks of cluster configurations, resulting in better assessment functionality. A new `SubmitRunsCrawler` class has been added to the `databricks.labs.ucx.assessment.jobs` module, implementing `CrawlerBase`, `JobsMixin`, and `CheckClusterMixin` classes. This class crawls and assesses job runs based on their submitted runs, ensuring compatibility and identifying failure issues. Additionally, a new configuration attribute, `num_days_submit_runs_history`, has been introduced in the `WorkspaceConfig` class of the `config.py` module, controlling the number of days for which submission history of `RunSubmit` API calls is retained. Lastly, various new JSON files have been added for unit testing, assessing the `RunSubmit` API usages related to different scenarios like dbt task runs, Git source-based job runs, JAR file runs, and more. These tests will aid in identifying and addressing potential compatibility issues with the `RunSubmit` API. * Added group members difference to the output of `validate-groups-membership` cli command ([#995](#995)). The `validate-groups-membership` command has been updated to include a comparison of group memberships at both the account and workspace levels. This enhancement is implemented through the `validate_group_membership` function, which has been updated to calculate the difference in members between the two levels and display it in a new `group_members_difference` column. This allows for a more detailed analysis of group memberships and easily identifies any discrepancies between the account and workspace levels. The corresponding unit test file, "test_groups.py," has been updated to include a new test case that verifies the calculation of the `group_members_difference` value. The functionality of the other commands remains unchanged. The new `group_members_difference` value is calculated as the difference in the number of members in the workspace group and the account group, with a positive value indicating more members in the workspace group and a negative value indicating more members in the account group. The table template in the labs.yml file has also been updated to include the new column for the group membership difference. * Added handling for empty `directory_id` if managed identity encountered during the crawling of StoragePermissionMapping ([#986](#986)). This PR adds a `type` field to the `StoragePermissionMapping` and `Principal` dataclasses to differentiate between service principals and managed identities, allowing `None` for the `directory_id` field if the principal is not a service principal. During the migration to UC storage credentials, managed identities are currently ignored. These changes improve handling of managed identities during the crawling of `StoragePermissionMapping`, prevent errors when creating storage credentials with managed identities, and address issue [#339](#339). The changes are tested through unit tests, manual testing, and integration tests, and only affect the `StoragePermissionMapping` class and related methods, without introducing new commands, workflows, or tables. * Added migration for Azure Service Principals with secrets stored in Databricks Secret to UC Storage Credentials ([#874](#874)). In this release, we have made significant updates to migrate Azure Service Principals with their secrets stored in Databricks Secret to UC Storage Credentials, enhancing security and management of storage access. The changes include: Addition of a new `migrate_credentials` command in the `labs.yml` file to migrate credentials for storage access to UC storage credential. Modification of `secrets.py` to handle the case where a secret has been removed from the backend and to log warning messages for secrets with invalid Base64 bytes. Introduction of the `StorageCredentialManager` and `ServicePrincipalMigration` classes in `credentials.py` to manage Azure Service Principals and their associated client secrets, and to migrate them to UC Storage Credentials. Addition of a new `directory_id` attribute in the `Principal` class and its associated dataclass in `resources.py` to store the directory ID for creating UC storage credentials using a service principal. Creation of a new pytest fixture, `make_storage_credential_spn`, in `fixtures.py` to simplify writing tests requiring Databricks Storage Credentials with Azure Service Principal auth. Addition of a new test file for the Azure integration of the project, including new classes, methods, and test cases for testing the migration of Azure Service Principals to UC Storage Credentials. These improvements will ensure better security and management of storage access using Azure Service Principals, while providing more efficient and robust testing capabilities. * Added permission migration support for feature tables and the root permissions for models and feature tables ([#997](#997)). This commit introduces support for migration of permissions related to feature tables and sets root permissions for models and feature tables. New functions such as `feature_store_listing`, `feature_tables_root_page`, `models_root_page`, and `tokens_and_passwords` have been added to facilitate population of a workspace access page with necessary permissions information. The `factory` function in `manager.py` has been updated to include new listings for models' root page, feature tables' root page, and the feature store for enhanced management and access control of models and feature tables. New classes and methods have been implemented to handle permissions for these resources, utilizing `GenericPermissionsSupport`, `AccessControlRequest`, and `MigratedGroup` classes. Additionally, new test methods have been included to verify feature tables listing functionality and root page listing functionality for feature tables and registered models. The test manager method has been updated to include `feature-tables` in the list of items to be checked for permissions, ensuring comprehensive testing of permission functionality related to these new feature tables. * Added support for serving endpoints ([#990](#990)). In this release, we have made significant enhancements to support serving endpoints in our open-source library. The `fixtures.py` file in the `databricks.labs.ucx.mixins` module has been updated with new classes and functions to create and manage serving endpoints, accompanied by integration tests to verify their functionality. We have added a new listing for serving endpoints in the assessment's permissions crawling, using the `ws.serving_endpoints.list` function and the `serving-endpoints` category. A new integration test, "test_endpoints," has been added to verify that assessments now crawl permissions for serving endpoints. This test demonstrates the ability to migrate permissions from one group to another. The test suite has been updated to ensure the proper functioning of the new feature and improve the assessment of permissions for serving endpoints, ensuring compatibility with the updated `test_manager.py` file. * Expanded end-user documentation with detailed descriptions for workflows and commands ([#999](#999)). The Databricks Labs UCX project has been updated with several new features to assist in upgrading to Unity Catalog, including an assessment workflow that generates a detailed compatibility report for workspace entities, a group migration workflow for upgrading all Databricks workspace assets, and utility commands for managing cross-workspace installations. The Assessment Report now includes a more detailed summary of the assessment findings, table counts, database summaries, and external locations. Additional improvements include expanded workspace group migration to handle potential conflicts with locally scoped group names, enhanced documentation for external Hive Metastore integration, a new debugging notebook, and detailed descriptions of table upgrade considerations, data access permissions, external storage, and table crawler. * Fixed `config.yml` upgrade from very old versions ([#984](#984)). In this release, we've introduced enhancements to the configuration upgrading process for `config.yml` in our open-source library. We've replaced the previous `v1_migrate` class method with a new implementation that specifically handles migration from version 1. The new method retrieves the `groups` field, extracts the `selected` value, and assigns it to the `include_group_names` key in the configuration. The `backup_group_prefix` value from the `groups` field is assigned to the `renamed_group_prefix` key, and the `groups` field is removed, with the version number updated to 2. These changes simplify the code and improve readability, enabling users to upgrade smoothly from version 1 of the configuration. Furthermore, we've added new unit tests to the `test_config.py` file to ensure backward compatibility. Two new tests, `test_v1_migrate_zeroconf` and `test_v1_migrate_some_conf`, have been added, utilizing the `MockInstallation` class and loading the configuration using `WorkspaceConfig`. These tests enhance the robustness and reliability of the migration process for `config.yml`. * Renamed columns in assessment SQL queries to use actual names, not aliases ([#983](#983)). In this update, we have resolved an issue where aliases used for column references in SQL queries caused errors in certain setups by renaming them to use actual names. Specifically, for assessment SQL queries, we have modified the definition of the `is_delta` column to use the actual `table_format` name instead of the alias `format`. This change improves compatibility and enhances the reliability of query execution. As a software engineer, you will appreciate that this modification ensures consistent interpretation of column references across various setups, thereby avoiding potential errors caused by aliases. This change does not introduce any new methods, but instead modifies existing functionality to use actual column names, ensuring a more reliable and consistent SQL query for the `05_0_all_tables` assessment. * Updated groups permissions validation to use Table ACL cluster ([#979](#979)). In this update, the `validate_groups_permissions` task has been modified to utilize the Table ACL cluster, as indicated by the inclusion of `job_cluster="tacl"`. This task is responsible for ensuring that all crawled permissions are accurately applied to the destination groups by calling the `permission_manager.apply_group_permissions` method during the migration state. This modification enhances the validation of group permissions by performing it on the Table ACL cluster, potentially improving performance or functionality. If you are implementing this project, it is crucial to comprehend the consequences of this change on your permissions validation process and adjust your workflows appropriately. Update databricks-labs-blueprint requirement from ~=0.2.4 to ~=0.3.0 (#1001) Updates the requirements on [databricks-labs-blueprint](https://github.com/databrickslabs/blueprint) to permit the latest version. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/databrickslabs/blueprint/releases">databricks-labs-blueprint's releases</a>.</em></p> <blockquote> <h2>v0.3.0</h2> <ul> <li>Added automated upgrade framework (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/50">#50</a>). This update introduces an automated upgrade framework for managing and applying upgrades to the product, with a new <code>upgrades.py</code> file that includes a <code>ProductInfo</code> class having methods for version handling, wheel building, and exception handling. The test code organization has been improved, and new test cases, functions, and a directory structure for fixtures and unit tests have been added for the upgrades functionality. The <code>test_wheels.py</code> file now checks the version of the Databricks SDK and handles cases where the version marker is missing or does not contain the <code>__version__</code> variable. Additionally, a new <code>Application State Migrations</code> section has been added to the README, explaining the process of seamless upgrades from version X to version Z through version Y, addressing the need for configuration or database state migrations as the application evolves. Users can apply these upgrades by following an idiomatic usage pattern involving several classes and functions. Furthermore, improvements have been made to the <code>_trim_leading_whitespace</code> function in the <code>commands.py</code> file of the <code>databricks.labs.blueprint</code> module, ensuring accurate and consistent removal of leading whitespace for each line in the command string, leading to better overall functionality and maintainability.</li> <li>Added brute-forcing <code>SerdeError</code> with <code>as_dict()</code> and <code>from_dict()</code> (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/58">#58</a>). This commit introduces a brute-forcing approach for handling <code>SerdeError</code> using <code>as_dict()</code> and <code>from_dict()</code> methods in an open-source library. The new <code>SomePolicy</code> class demonstrates the usage of these methods for manual serialization and deserialization of custom classes. The <code>as_dict()</code> method returns a dictionary representation of the class instance, and the <code>from_dict()</code> method, decorated with <code>@classmethod</code>, creates a new instance from the provided dictionary. Additionally, the GitHub Actions workflow for acceptance tests has been updated to include the <code>ready_for_review</code> event type, ensuring that tests run not only for opened and synchronized pull requests but also when marked as "ready for review." These changes provide developers with more control over the deserialization process and facilitate debugging in cases where default deserialization fails, but should be used judiciously to avoid brittle code.</li> <li>Fixed nightly integration tests run as service principals (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/52">#52</a>). In this release, we have enhanced the compatibility of our codebase with service principals, particularly in the context of nightly integration tests. The <code>Installation</code> class in the <code>databricks.labs.blueprint.installation</code> module has been refactored, deprecating the <code>current</code> method and introducing two new methods: <code>assume_global</code> and <code>assume_user_home</code>. These methods enable users to install and manage <code>blueprint</code> as either a global or user-specific installation. Additionally, the <code>existing</code> method has been updated to work with the new <code>Installation</code> methods. In the test suite, the <code>test_installation.py</code> file has been updated to correctly detect global and user-specific installations when running as a service principal. These changes improve the testability and functionality of our software, ensuring seamless operation with service principals during nightly integration tests.</li> <li>Made <code>test_existing_installations_are_detected</code> more resilient (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/51">#51</a>). In this release, we have added a new test function <code>test_existing_installations_are_detected</code> that checks if existing installations are correctly detected and retries the test for up to 15 seconds if they are not. This improves the reliability of the test by making it more resilient to potential intermittent failures. We have also added an import from <code>databricks.sdk.retries</code> named <code>retried</code> which is used to retry the test function in case of an <code>AssertionError</code>. Additionally, the test function <code>test_existing</code> has been renamed to <code>test_existing_installations_are_detected</code> and the <code>xfail</code> marker has been removed. We have also renamed the test function <code>test_dataclass</code> to <code>test_loading_dataclass_from_installation</code> for better clarity. This change will help ensure that the library is correctly detecting existing installations and improve the overall quality of the codebase.</li> </ul> <p>Contributors: <a href="https://github.com/nfx"><code>@nfx</code></a></p> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/databrickslabs/blueprint/blob/main/CHANGELOG.md">databricks-labs-blueprint's changelog</a>.</em></p> <blockquote> <h2>0.3.0</h2> <ul> <li>Added automated upgrade framework (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/50">#50</a>). This update introduces an automated upgrade framework for managing and applying upgrades to the product, with a new <code>upgrades.py</code> file that includes a <code>ProductInfo</code> class having methods for version handling, wheel building, and exception handling. The test code organization has been improved, and new test cases, functions, and a directory structure for fixtures and unit tests have been added for the upgrades functionality. The <code>test_wheels.py</code> file now checks the version of the Databricks SDK and handles cases where the version marker is missing or does not contain the <code>__version__</code> variable. Additionally, a new <code>Application State Migrations</code> section has been added to the README, explaining the process of seamless upgrades from version X to version Z through version Y, addressing the need for configuration or database state migrations as the application evolves. Users can apply these upgrades by following an idiomatic usage pattern involving several classes and functions. Furthermore, improvements have been made to the <code>_trim_leading_whitespace</code> function in the <code>commands.py</code> file of the <code>databricks.labs.blueprint</code> module, ensuring accurate and consistent removal of leading whitespace for each line in the command string, leading to better overall functionality and maintainability.</li> <li>Added brute-forcing <code>SerdeError</code> with <code>as_dict()</code> and <code>from_dict()</code> (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/58">#58</a>). This commit introduces a brute-forcing approach for handling <code>SerdeError</code> using <code>as_dict()</code> and <code>from_dict()</code> methods in an open-source library. The new <code>SomePolicy</code> class demonstrates the usage of these methods for manual serialization and deserialization of custom classes. The <code>as_dict()</code> method returns a dictionary representation of the class instance, and the <code>from_dict()</code> method, decorated with <code>@classmethod</code>, creates a new instance from the provided dictionary. Additionally, the GitHub Actions workflow for acceptance tests has been updated to include the <code>ready_for_review</code> event type, ensuring that tests run not only for opened and synchronized pull requests but also when marked as "ready for review." These changes provide developers with more control over the deserialization process and facilitate debugging in cases where default deserialization fails, but should be used judiciously to avoid brittle code.</li> <li>Fixed nightly integration tests run as service principals (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/52">#52</a>). In this release, we have enhanced the compatibility of our codebase with service principals, particularly in the context of nightly integration tests. The <code>Installation</code> class in the <code>databricks.labs.blueprint.installation</code> module has been refactored, deprecating the <code>current</code> method and introducing two new methods: <code>assume_global</code> and <code>assume_user_home</code>. These methods enable users to install and manage <code>blueprint</code> as either a global or user-specific installation. Additionally, the <code>existing</code> method has been updated to work with the new <code>Installation</code> methods. In the test suite, the <code>test_installation.py</code> file has been updated to correctly detect global and user-specific installations when running as a service principal. These changes improve the testability and functionality of our software, ensuring seamless operation with service principals during nightly integration tests.</li> <li>Made <code>test_existing_installations_are_detected</code> more resilient (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/51">#51</a>). In this release, we have added a new test function <code>test_existing_installations_are_detected</code> that checks if existing installations are correctly detected and retries the test for up to 15 seconds if they are not. This improves the reliability of the test by making it more resilient to potential intermittent failures. We have also added an import from <code>databricks.sdk.retries</code> named <code>retried</code> which is used to retry the test function in case of an <code>AssertionError</code>. Additionally, the test function <code>test_existing</code> has been renamed to <code>test_existing_installations_are_detected</code> and the <code>xfail</code> marker has been removed. We have also renamed the test function <code>test_dataclass</code> to <code>test_loading_dataclass_from_installation</code> for better clarity. This change will help ensure that the library is correctly detecting existing installations and improve the overall quality of the codebase.</li> </ul> <h2>0.2.5</h2> <ul> <li>Automatically enable workspace filesystem if the feature is disabled (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/42">#42</a>).</li> </ul> <h2>0.2.4</h2> <ul> <li>Added more integration tests for <code>Installation</code> (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/39">#39</a>).</li> <li>Fixed <code>yaml</code> optional import error (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/38">#38</a>).</li> </ul> <h2>0.2.3</h2> <ul> <li>Added special handling for notebooks in <code>Installation.upload(...)</code> (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/36">#36</a>).</li> </ul> <h2>0.2.2</h2> <ul> <li>Fixed issues with uploading wheels to DBFS and loading a non-existing install state (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/34">#34</a>).</li> </ul> <h2>0.2.1</h2> <ul> <li>Aligned <code>Installation</code> framework with UCX project (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/32">#32</a>).</li> </ul> <h2>0.2.0</h2> <ul> <li>Added common install state primitives with strong typing (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/27">#27</a>).</li> <li>Added documentation for Invoking Databricks Connect (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/28">#28</a>).</li> <li>Added more documentation for Databricks CLI command router (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/30">#30</a>).</li> <li>Enforced <code>pylint</code> standards (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/29">#29</a>).</li> </ul> <h2>0.1.0</h2> <ul> <li>Changed python requirement from 3.10.6 to 3.10 (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/25">#25</a>).</li> </ul> <h2>0.0.6</h2> <ul> <li>Make <code>find_project_root</code> more deterministic (<a href="https://redirect.github.com/databrickslabs/blueprint/pull/23">#23</a>).</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/databrickslabs/blueprint/commit/905e5ff5303a005d48bc98d101a613afeda15d51"><code>905e5ff</code></a> Release v0.3.0 (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/59">#59</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/a029f6bb1ecf807017754e298ea685326dbedf72"><code>a029f6b</code></a> Added brute-forcing <code>SerdeError</code> with <code>as_dict()</code> and <code>from_dict()</code> (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/58">#58</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/c8a74f4129b4592d365aac9670eb86069f3517f7"><code>c8a74f4</code></a> Added automated upgrade framework (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/50">#50</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/24e62ef4f060e43e02c92a7d082d95e8bc164317"><code>24e62ef</code></a> Don't run integration tests on draft pull requests (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/55">#55</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/b4dd5abf4eaf8d022ae0b6ec7e659296ec3d2f37"><code>b4dd5ab</code></a> Added tokei.rs badge (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/54">#54</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/01d9467f425763ab08035001270593253bce11f0"><code>01d9467</code></a> Fixed nightly integration tests run as service principals (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/52">#52</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/aa5714179c65be8e13f54601e1d1fcd70548342d"><code>aa57141</code></a> Made <code>test_existing_installations_are_detected</code> more resilient (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/51">#51</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/9cbc6f863d3ea06659f37939cf1b97115dd873bd"><code>9cbc6f8</code></a> Bump <code>databrickslabs/sandbox/acceptance</code> to v0.1.0 (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/48">#48</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/22fc1a8787b8e98de03048595202f88b7ddb9b94"><code>22fc1a8</code></a> Use <code>databrickslabs/sandbox/acceptance</code> action (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/45">#45</a>)</li> <li><a href="https://github.com/databrickslabs/blueprint/commit/c7e47abd82b2f04e95b1d91f346cc1ea6df43961"><code>c7e47ab</code></a> Release v0.2.5 (<a href="https://redirect.github.com/databrickslabs/blueprint/issues/44">#44</a>)</li> <li>Additional commits viewable in <a href="https://github.com/databrickslabs/blueprint/compare/v0.2.4...v0.3.0">compare view</a></li> </ul> </details> <br /> Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Run integration tests only for pull requests ready for review (#1002) Tested on https://github.com/databrickslabs/blueprint Reducing flakiness of create account groups (#1003) Prompt user if Terraform utilised for deploying infrastructure (#1004) Added prompt is_terraform_used and updated the same in the config of WorkspaceInstaller Resolves #393 --------- Co-authored-by: Serge Smertin <[email protected]> Update CONTRIBUTING.md (#1005) Closes #850 Added `databricks labs ucx create-uber-principal` command to create Azure Service Principal for migration (#976) - Added new cli cmd for create-master-principal in labs.yml, cli.py - Added separate class for AzureApiClient to separate out azure API calls - Added logic to create SPN, secret, roleassignment in resources and update workspace config with spn client_id - added logic to call create spn, update rbac of all storage account to that spn, update ucx cluster policy with spn secret for each storage account - test unit and int test cases Resolves #881 Related issues: - #993 - #693 - [ ] added relevant user documentation - [X] added new CLI command - [ ] modified existing command: `databricks labs ucx ...` - [ ] added a new workflow - [ ] modified existing workflow: `...` - [ ] added a new table - [ ] modified existing table: `...` - [X] manually tested - [X] added unit tests - [X] added integration tests - [ ] verified on staging environment (screenshot attached) Fix gitguardian warning caused by "hello world" secret used in unit test (#1010) Replace the plain encoded string by base64.b64encode to mitigate the gitguardian warning. <!-- DOC: Link issue with a keyword: close, closes, closed, fix, fixes, fixed, resolve, resolves, resolved. See https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword --> Resolves #.. - [ ] added relevant user documentation - [ ] added new CLI command - [ ] modified existing command: `databricks labs ucx ...` - [ ] added a new workflow - [ ] modified existing workflow: `...` - [ ] added a new table - [ ] modified existing table: `...` <!-- How is this tested? Please see the checklist below and also describe any other relevant tests --> - [ ] manually tested - [ ] added unit tests - [ ] added integration tests - [ ] verified on staging environment (screenshot attached) Create UC external locations in Azure based on migrated storage credentials (#992) Handle widget delete on upgrade platform bug (#1011)
# Conflicts: # labs.yml # src/databricks/labs/ucx/assessment/aws.py # src/databricks/labs/ucx/cli.py
* Added AWS IAM role support to `databricks labs ucx create-uber-principal` command ([#993](#993)). The `databricks labs ucx create-uber-principal` command now supports AWS Identity and Access Management (IAM) roles for external table migration. This new feature introduces a CLI command to create an `uber-IAM` profile, which checks for the UCX migration cluster policy and updates or adds the migration policy to provide access to the relevant table locations. If no IAM instance profile or role is specified in the cluster policy, a new one is created and the new migration policy is added. This change includes new methods and functions to handle AWS IAM roles, instance profiles, and related trust policies. Additionally, new unit and integration tests have been added and verified on the staging environment. The implementation also identifies all S3 buckets used by the Instance Profiles configured in the workspace. * Added Dashboard widget to show the list of cluster policies along with DBR version ([#1013](#1013)). In this code revision, the `assessment` module of the 'databricks/labs/ucx' package has been updated to include a new `PoliciesCrawler` class, which fetches, assesses, and snapshots cluster policies. This class extends `CrawlerBase` and `CheckClusterMixin` and introduces the '_crawl', '_assess_policies', '_try_fetch', and `snapshot` methods. The `PolicyInfo` dataclass has been added to hold policy information, with a structure similar to the `ClusterInfo` dataclass. The `ClusterInfo` dataclass has been updated to include `spark_version` and `policy_id` attributes. A new table for policies has been added, and cluster policies along with the DBR version are loaded into this table. Relevant user documentation, tests, and a Dashboard widget have been added to support this feature. The `create` function in 'fixtures.py' has been updated to enable a Delta preview feature in Spark configurations, and a new SQL file has been included for querying cluster policies. Additionally, a new `crawl_cluster_policies` method has been added to scan and store cluster policies with matching configurations. * Added `migration_status` table to capture a snapshot of migrated tables ([#1041](#1041)). A `migration_status` table has been added to track the status of migrated tables in the database, enabling improved management and tracking of migrations. The new `MigrationStatus` class, which is a dataclass that holds the source and destination schema, table, and updated timestamp, is added. The `TablesMigrate` class now has a new `_migration_status_refresher` attribute that is an instance of the new `MigrationStatusRefresher` class. This class crawls the `migration_status` table and returns a snapshot of the migration status, which is used to refresh the migration status and check if the table is upgraded. Additionally, the `_init_seen_tables` method is updated to get the seen tables from the `_migration_status_refresher` instead of fetching from the table properties. The `MigrationStatusRefresher` class fetches the migration status table and returns a snapshot of the migration status. This change also adds new test functions in the test file for the Hive metastore, which covers various scenarios such as migrating managed tables with and without caching, migrating external tables, and reverting migrated tables. * Added a check for existing inventory database to avoid losing existing, inject installation objects in tests and try fetching existing installation before setting global as default ([#1043](#1043)). In this release, we have added a new method, `_check_inventory_database_exists`, to the `WorkspaceInstallation` class, which checks if an inventory database with a given name already exists in the Workspace. This prevents accidental overwriting of existing data and improves the robustness of handling inventory databases. The `validate_and_run` method has been updated to call `app.current_installation(workspace_client)`, allowing for a more flexible handling of installations. The `Installation` class import has been updated to include `SerdeError`, and the test suite has been updated to inject installation objects and check for existing installations before setting the global installation as default. A new argument `inventory_schema_suffix` has been added to the `factory` method for customization of the inventory schema name. We have also added a new method `check_inventory_database_exists` to the `WorkspaceInstaller` class, which checks if an inventory database already exists for a given installation type and raises an `AlreadyExists` error if it does. The behavior of the `download` method in the `WorkspaceClient` class has been mocked, and the `get_status` method has been updated to return `NotFound` in certain tests. These changes aim to improve the robustness, flexibility, and safety of the installation process in the Workspace. * Added a check for external metastore in SQL warehouse configuration ([#1046](#1046)). In this release, we have added new functionality to the Unity Catalog (UCX) installation process to enable checking for and connecting to an external Hive metastore configuration. A new method, `_get_warehouse_config_with_external_hive_metastore`, has been introduced to retrieve the workspace warehouse config and identify if it is set up for an external Hive metastore. If so, and the user confirms the prompt, UCX will be configured to connect to the external metastore. Additionally, new methods `_extract_external_hive_metastore_sql_conf` and `test_cluster_policy_definition_<cloud_provider>_hms_warehouse()` have been added to handle the external metastore configuration for Azure, AWS, and GCP, and to handle the case when the data_access_config is empty. These changes provide more flexibility and ease of use when installing UCX with external Hive metastore configurations. The new imports `EndpointConfPair`, `GetWorkspaceWarehouseConfigResponse` from the `databricks.sdk.service.sql` package are used to handle the endpoint configuration of the SQL warehouse. * Added integration tests for AWS - create locations ([#1026](#1026)). In this release, we have added comprehensive integration tests for AWS resources and their management in the `tests/unit/assessment/test_aws.py` file. The `AWSResources` class has been updated with new methods (AwsIamRole, add_uc_role, add_uc_role_policy, and validate_connection) and the regular expression for matching S3 resource ARN has been modified. The `create_external_locations` method now allows for creating external locations without validating them, and the `_identify_missing_external_locations` function has been enhanced to match roles with a wildcard pattern. The new tests include validating the integration of AWS services with the system, testing the CLI's behavior when it is missing, and introducing new configuration scenarios with the addition of a Key Management Service (KMS) key during the creation of IAM roles and policies. These changes improve the robustness and reliability of AWS resource integration and handling in our system. * Bump Databricks SDK to v0.22.0 ([#1059](#1059)). In this release, we are bumping the Databricks SDK version to 0.22.0 and upgrading the `databricks-labs-lsql` package to ~0.2.2. The new dependencies for this release include `databricks-sdk==0.22.0`, `databricks-labs-lsql~=0.2.2`, `databricks-labs-blueprint~=0.4.3`, and `PyYAML>=6.0.0,<7.0.0`. In the `fixtures.py` file, we have added `PermissionLevel.CAN_QUERY` to the `CAN_VIEW` and `CAN_MANAGE` permissions in the `_path` function, allowing users to query the endpoint. Additionally, we have updated the `test_endpoints` function in the `test_generic.py` file as part of the integration tests for workspace access. This change updates the permission level for creating a serving endpoint from `CAN_MANAGE` to `CAN_QUERY`, meaning that the assigned group can now only query the endpoint. We have also included the `test_feature_tables` function in the commit, which tests the behavior of feature tables in the Databricks workspace. This change only affects the `test_endpoints` function and its assert statements, and does not impact the functionality of the `test_feature_tables` function. * Changed default UCX installation folder to `/Applications/ucx` from `/Users/<me>/.ucx` to allow multiple users users utilising the same installation ([#854](#854)). In this release, we've added a new advanced feature that allows users to force the installation of UCX over an existing installation using the `UCX_FORCE_INSTALL` environment variable. This variable can take two values `global` and 'user', providing more control and flexibility in installing UCX. The default UCX installation folder has been changed to /Applications/ucx from /Users/<me>/.ucx to enable multiple users to utilize the same installation. A table detailing the expected install location, `install_folder`, and mode for each combination of global and user values has been added to the README file. We've also added user prompts to confirm the installation if UCX is already installed and the `UCX_FORCE_INSTALL` variable is set to 'user'. This feature is useful when users want to install UCX in a specific location or force the installation over an existing installation. However, it is recommended to use this feature with caution, as it can potentially break existing installations if not used correctly. Additionally, several changes to the implementation of the UCX installation process have been made, as well as new tests to ensure that the installation process works correctly in various scenarios. * Fix: Recover lost fix for `webbrowser.open` mock ([#1052](#1052)). A fix has been implemented to address an issue related to the mock for `webbrowser.open` in the tests `test_repair_run` and `test_get_existing_installation_global`. This change prevents the `webbrowser.open` function from being called during these tests, which helps improve test stability and consistency. No new methods have been added, and the existing functionality of these tests has only been modified to include the `webbrowser.open` mock. This modification aims to enhance the reliability and predictability of these specific tests, ensuring accurate and consistent results. * Improved table migrations logic ([#1050](#1050)). This change introduces improvements to table migrations logic by refactoring unit tests to load table mappings from JSON instead of inline structs, adding an `escape_sql_identifier` function where missing, and preparing for ACLs migration. The `uc_grant_sql` method in `grants.py` has been updated to accept optional `object_type` and `object_key` parameters, and the hive-to-UC mapping has been expanded to include mappings for views. Additionally, new JSON files for external source table configuration have been added, and new functions have been introduced for loading fixture data from JSON files and creating mocked `WorkspaceClient` and `TableMapping` objects for testing. The changes improve the maintainability and security of the codebase, prepare it for future migration tasks, and ensure that the code is more adaptable and robust. The changes have been manually tested and verified on the staging environment. * Moved `SqlBackend` implementation to `databricks-labs-lsql` dependency ([#1042](#1042)). In this change, the `SqlBackend` implementation, including classes such as `StatementExecutionBackend` and `RuntimeBackend`, has been moved to a separate library, `databricks-labs-lsql`, which is managed at <https://github.com/databrickslabs/lsql>. This refactoring simplifies the current repository, promotes code reuse, and improves modularity by leveraging an external dependency. The modification includes adding a new line in the .gitignore file to exclude `*.out` files from version control. * Prepare for a PyPI release ([#1038](#1038)). In preparation for a PyPI release, this change introduces a new GitHub Actions workflow that automates the package release process and ensures the integrity of the released packages by signing them with Sigstore. When a new git tag starting with `v` is pushed, this workflow is triggered, building wheels using hatch, drafting a new GitHub release, publishing the package distributions to PyPI, and signing the artifacts with Sigstore. The `pyproject.toml` file is now used for metadata, replacing `setup.cfg` and `setup.py`, and is cached to improve build performance. In addition, the `pyproject.toml` file has been updated with recent metadata in preparation for the release, including updates to the package's authors, development status, classifiers, and dependencies. * Prevent fragile `mock.patch('databricks...')` in the test code ([#1037](#1037)). This change introduces a custom `pylint` checker to improve code flexibility and maintainability by preventing fragile `mock.patch` designs in test code. The new checker discourages the use of `MagicMock` and encourages the use of `create_autospec` to ensure that mocks have the same attributes and methods as the original class. This change has been implemented in multiple test files, including `test_cli.py`, `test_locations.py`, `test_mapping.py`, `test_table_migrate.py`, `test_table_move.py`, `test_workspace_access.py`, `test_redash.py`, `test_scim.py`, and `test_verification.py`, to improve the robustness and maintainability of the test code. Additionally, the commit removes the `verification.py` file, which contained a `VerificationManager` class for verifying applied permissions, scope ACLs, roles, and entitlements for various objects in a Databricks workspace. * Removed `mocker.patch("databricks...)` from `test_cli` ([#1047](#1047)). In this release, we have made significant updates to the library's handling of Azure and AWS workspaces. We have added new parameters `azure_resource_permissions` and `aws_permissions` to the `_execute_for_cloud` function in `cli.py`, which are passed to the `func_azure` and `func_aws` functions respectively. The `create_uber_principal` and `principal_prefix_access` commands have also been updated to include these new parameters. Additionally, the `_azure_setup_uber_principal` and `_aws_setup_uber_principal` functions have been updated to accept the new `azure_resource_permissions` and `aws_resource_permissions` parameters. The `_azure_principal_prefix_access` and `_aws_principal_prefix_access` functions have also been updated similarly. We have also introduced a new `aws_resources` parameter in the `migrate_credentials` command, which is used to migrate Azure Service Principals in ADLS Gen2 locations to UC storage credentials. In terms of testing, we have replaced the `mocker.patch` calls with the creation of `AzureResourcePermissions` and `AWSResourcePermissions` objects, improving the code's readability and maintainability. Overall, these changes significantly enhance the library's functionality and maintainability in handling Azure and AWS workspaces. * Require Hatch v1.9.4 on build machines ([#1049](#1049)). In this release, we have updated the Hatch package version to 1.9.4 on build machines, addressing issue [#1049](#1049). The changes include updating the toolchain dependencies and setup in the `.codegen.json` file, which simplifies the setup process and now relies on a pre-existing Hatch environment and Python 3. The acceptance workflow has also been updated to use the latest version of Hatch and the `databrickslabs/sandbox/acceptance` GitHub action version `v0.1.4`. Hatch is a Python package manager that simplifies package development and management, and this update provides new features and bug fixes that can help improve the reliability and performance of the acceptance workflow. This change requires version 1.9.4 of the Hatch package on build machines, and it will affect the build process for the project but will not have any impact on the functionality of the project itself. As a software engineer adopting this project, it's important to note this change to ensure that the build process runs smoothly and takes advantage of any new features or improvements in Hatch 1.9.4. * Set acceptance tests to timeout after 45 minutes ([#1036](#1036)). As part of issue [#1036](#1036), the acceptance tests in this open-source library now have a 45-minute timeout configured, improving the reliability and stability of the testing environment. This change has been implemented in the `.github/workflows/acceptance.yml` file by adding the `timeout` parameter to the step where the `databrickslabs/sandbox/acceptance` action is called. This ensures that the acceptance tests will not run indefinitely and prevents any potential issues caused by long-running tests. By adopting this project, software engineers can now benefit from a more stable and reliable testing environment, with acceptance tests that are guaranteed to complete within a maximum of 45 minutes. * Updated databricks-labs-blueprint requirement from ~0.4.1 to ~0.4.3 ([#1058](#1058)). In this release, the version requirement for the `databricks-labs-blueprint` library has been updated from ~0.4.1 to ~0.4.3 in the pyproject.toml file. This change is necessary to support issues [#1056](#1056) and [#1057](#1057). The code has been manually tested and is ready for further testing to ensure the compatibility and smooth functioning of the software. It is essential to thoroughly test the latest version of the `databricks-labs-blueprint` library with the existing codebase before deploying it to production. This includes running a comprehensive suite of tests such as unit tests, integration tests, and verification on the staging environment. This modification allows the software to use the latest version of the library, improving its functionality and overall performance. * Use `MockPrompts.extend()` functionality in test_install to supply multiple prompts ([#1057](#1057)). This diff introduces the `MockPrompts.extend()` functionality in the `test_install` module to enable the supplying of multiple prompts for testing purposes. A new `base_prompts` dictionary with default prompts has been added and is extended with additional prompts for specific test cases. This allows for the testing of various scenarios, such as when UCX is already installed on the workspace and the user is prompted to choose between global or user installation. Additionally, new `force_user_environ` and `force_global_env` dictionaries have been added to simulate different installation environments. The functionality of the `WorkspaceInstaller` class and mocking of `webbrowser.open` are also utilized in the test cases. These changes aim to ensure the proper functioning of the configuration process for different installation scenarios.
* Added AWS IAM role support to `databricks labs ucx create-uber-principal` command ([#993](#993)). The `databricks labs ucx create-uber-principal` command now supports AWS Identity and Access Management (IAM) roles for external table migration. This new feature introduces a CLI command to create an `uber-IAM` profile, which checks for the UCX migration cluster policy and updates or adds the migration policy to provide access to the relevant table locations. If no IAM instance profile or role is specified in the cluster policy, a new one is created and the new migration policy is added. This change includes new methods and functions to handle AWS IAM roles, instance profiles, and related trust policies. Additionally, new unit and integration tests have been added and verified on the staging environment. The implementation also identifies all S3 buckets used by the Instance Profiles configured in the workspace. * Added Dashboard widget to show the list of cluster policies along with DBR version ([#1013](#1013)). In this code revision, the `assessment` module of the 'databricks/labs/ucx' package has been updated to include a new `PoliciesCrawler` class, which fetches, assesses, and snapshots cluster policies. This class extends `CrawlerBase` and `CheckClusterMixin` and introduces the '_crawl', '_assess_policies', '_try_fetch', and `snapshot` methods. The `PolicyInfo` dataclass has been added to hold policy information, with a structure similar to the `ClusterInfo` dataclass. The `ClusterInfo` dataclass has been updated to include `spark_version` and `policy_id` attributes. A new table for policies has been added, and cluster policies along with the DBR version are loaded into this table. Relevant user documentation, tests, and a Dashboard widget have been added to support this feature. The `create` function in 'fixtures.py' has been updated to enable a Delta preview feature in Spark configurations, and a new SQL file has been included for querying cluster policies. Additionally, a new `crawl_cluster_policies` method has been added to scan and store cluster policies with matching configurations. * Added `migration_status` table to capture a snapshot of migrated tables ([#1041](#1041)). A `migration_status` table has been added to track the status of migrated tables in the database, enabling improved management and tracking of migrations. The new `MigrationStatus` class, which is a dataclass that holds the source and destination schema, table, and updated timestamp, is added. The `TablesMigrate` class now has a new `_migration_status_refresher` attribute that is an instance of the new `MigrationStatusRefresher` class. This class crawls the `migration_status` table and returns a snapshot of the migration status, which is used to refresh the migration status and check if the table is upgraded. Additionally, the `_init_seen_tables` method is updated to get the seen tables from the `_migration_status_refresher` instead of fetching from the table properties. The `MigrationStatusRefresher` class fetches the migration status table and returns a snapshot of the migration status. This change also adds new test functions in the test file for the Hive metastore, which covers various scenarios such as migrating managed tables with and without caching, migrating external tables, and reverting migrated tables. * Added a check for existing inventory database to avoid losing existing, inject installation objects in tests and try fetching existing installation before setting global as default ([#1043](#1043)). In this release, we have added a new method, `_check_inventory_database_exists`, to the `WorkspaceInstallation` class, which checks if an inventory database with a given name already exists in the Workspace. This prevents accidental overwriting of existing data and improves the robustness of handling inventory databases. The `validate_and_run` method has been updated to call `app.current_installation(workspace_client)`, allowing for a more flexible handling of installations. The `Installation` class import has been updated to include `SerdeError`, and the test suite has been updated to inject installation objects and check for existing installations before setting the global installation as default. A new argument `inventory_schema_suffix` has been added to the `factory` method for customization of the inventory schema name. We have also added a new method `check_inventory_database_exists` to the `WorkspaceInstaller` class, which checks if an inventory database already exists for a given installation type and raises an `AlreadyExists` error if it does. The behavior of the `download` method in the `WorkspaceClient` class has been mocked, and the `get_status` method has been updated to return `NotFound` in certain tests. These changes aim to improve the robustness, flexibility, and safety of the installation process in the Workspace. * Added a check for external metastore in SQL warehouse configuration ([#1046](#1046)). In this release, we have added new functionality to the Unity Catalog (UCX) installation process to enable checking for and connecting to an external Hive metastore configuration. A new method, `_get_warehouse_config_with_external_hive_metastore`, has been introduced to retrieve the workspace warehouse config and identify if it is set up for an external Hive metastore. If so, and the user confirms the prompt, UCX will be configured to connect to the external metastore. Additionally, new methods `_extract_external_hive_metastore_sql_conf` and `test_cluster_policy_definition_<cloud_provider>_hms_warehouse()` have been added to handle the external metastore configuration for Azure, AWS, and GCP, and to handle the case when the data_access_config is empty. These changes provide more flexibility and ease of use when installing UCX with external Hive metastore configurations. The new imports `EndpointConfPair`, `GetWorkspaceWarehouseConfigResponse` from the `databricks.sdk.service.sql` package are used to handle the endpoint configuration of the SQL warehouse. * Added integration tests for AWS - create locations ([#1026](#1026)). In this release, we have added comprehensive integration tests for AWS resources and their management in the `tests/unit/assessment/test_aws.py` file. The `AWSResources` class has been updated with new methods (AwsIamRole, add_uc_role, add_uc_role_policy, and validate_connection) and the regular expression for matching S3 resource ARN has been modified. The `create_external_locations` method now allows for creating external locations without validating them, and the `_identify_missing_external_locations` function has been enhanced to match roles with a wildcard pattern. The new tests include validating the integration of AWS services with the system, testing the CLI's behavior when it is missing, and introducing new configuration scenarios with the addition of a Key Management Service (KMS) key during the creation of IAM roles and policies. These changes improve the robustness and reliability of AWS resource integration and handling in our system. * Bump Databricks SDK to v0.22.0 ([#1059](#1059)). In this release, we are bumping the Databricks SDK version to 0.22.0 and upgrading the `databricks-labs-lsql` package to ~0.2.2. The new dependencies for this release include `databricks-sdk==0.22.0`, `databricks-labs-lsql~=0.2.2`, `databricks-labs-blueprint~=0.4.3`, and `PyYAML>=6.0.0,<7.0.0`. In the `fixtures.py` file, we have added `PermissionLevel.CAN_QUERY` to the `CAN_VIEW` and `CAN_MANAGE` permissions in the `_path` function, allowing users to query the endpoint. Additionally, we have updated the `test_endpoints` function in the `test_generic.py` file as part of the integration tests for workspace access. This change updates the permission level for creating a serving endpoint from `CAN_MANAGE` to `CAN_QUERY`, meaning that the assigned group can now only query the endpoint. We have also included the `test_feature_tables` function in the commit, which tests the behavior of feature tables in the Databricks workspace. This change only affects the `test_endpoints` function and its assert statements, and does not impact the functionality of the `test_feature_tables` function. * Changed default UCX installation folder to `/Applications/ucx` from `/Users/<me>/.ucx` to allow multiple users users utilising the same installation ([#854](#854)). In this release, we've added a new advanced feature that allows users to force the installation of UCX over an existing installation using the `UCX_FORCE_INSTALL` environment variable. This variable can take two values `global` and 'user', providing more control and flexibility in installing UCX. The default UCX installation folder has been changed to /Applications/ucx from /Users/<me>/.ucx to enable multiple users to utilize the same installation. A table detailing the expected install location, `install_folder`, and mode for each combination of global and user values has been added to the README file. We've also added user prompts to confirm the installation if UCX is already installed and the `UCX_FORCE_INSTALL` variable is set to 'user'. This feature is useful when users want to install UCX in a specific location or force the installation over an existing installation. However, it is recommended to use this feature with caution, as it can potentially break existing installations if not used correctly. Additionally, several changes to the implementation of the UCX installation process have been made, as well as new tests to ensure that the installation process works correctly in various scenarios. * Fix: Recover lost fix for `webbrowser.open` mock ([#1052](#1052)). A fix has been implemented to address an issue related to the mock for `webbrowser.open` in the tests `test_repair_run` and `test_get_existing_installation_global`. This change prevents the `webbrowser.open` function from being called during these tests, which helps improve test stability and consistency. No new methods have been added, and the existing functionality of these tests has only been modified to include the `webbrowser.open` mock. This modification aims to enhance the reliability and predictability of these specific tests, ensuring accurate and consistent results. * Improved table migrations logic ([#1050](#1050)). This change introduces improvements to table migrations logic by refactoring unit tests to load table mappings from JSON instead of inline structs, adding an `escape_sql_identifier` function where missing, and preparing for ACLs migration. The `uc_grant_sql` method in `grants.py` has been updated to accept optional `object_type` and `object_key` parameters, and the hive-to-UC mapping has been expanded to include mappings for views. Additionally, new JSON files for external source table configuration have been added, and new functions have been introduced for loading fixture data from JSON files and creating mocked `WorkspaceClient` and `TableMapping` objects for testing. The changes improve the maintainability and security of the codebase, prepare it for future migration tasks, and ensure that the code is more adaptable and robust. The changes have been manually tested and verified on the staging environment. * Moved `SqlBackend` implementation to `databricks-labs-lsql` dependency ([#1042](#1042)). In this change, the `SqlBackend` implementation, including classes such as `StatementExecutionBackend` and `RuntimeBackend`, has been moved to a separate library, `databricks-labs-lsql`, which is managed at <https://github.com/databrickslabs/lsql>. This refactoring simplifies the current repository, promotes code reuse, and improves modularity by leveraging an external dependency. The modification includes adding a new line in the .gitignore file to exclude `*.out` files from version control. * Prepare for a PyPI release ([#1038](#1038)). In preparation for a PyPI release, this change introduces a new GitHub Actions workflow that automates the package release process and ensures the integrity of the released packages by signing them with Sigstore. When a new git tag starting with `v` is pushed, this workflow is triggered, building wheels using hatch, drafting a new GitHub release, publishing the package distributions to PyPI, and signing the artifacts with Sigstore. The `pyproject.toml` file is now used for metadata, replacing `setup.cfg` and `setup.py`, and is cached to improve build performance. In addition, the `pyproject.toml` file has been updated with recent metadata in preparation for the release, including updates to the package's authors, development status, classifiers, and dependencies. * Prevent fragile `mock.patch('databricks...')` in the test code ([#1037](#1037)). This change introduces a custom `pylint` checker to improve code flexibility and maintainability by preventing fragile `mock.patch` designs in test code. The new checker discourages the use of `MagicMock` and encourages the use of `create_autospec` to ensure that mocks have the same attributes and methods as the original class. This change has been implemented in multiple test files, including `test_cli.py`, `test_locations.py`, `test_mapping.py`, `test_table_migrate.py`, `test_table_move.py`, `test_workspace_access.py`, `test_redash.py`, `test_scim.py`, and `test_verification.py`, to improve the robustness and maintainability of the test code. Additionally, the commit removes the `verification.py` file, which contained a `VerificationManager` class for verifying applied permissions, scope ACLs, roles, and entitlements for various objects in a Databricks workspace. * Removed `mocker.patch("databricks...)` from `test_cli` ([#1047](#1047)). In this release, we have made significant updates to the library's handling of Azure and AWS workspaces. We have added new parameters `azure_resource_permissions` and `aws_permissions` to the `_execute_for_cloud` function in `cli.py`, which are passed to the `func_azure` and `func_aws` functions respectively. The `create_uber_principal` and `principal_prefix_access` commands have also been updated to include these new parameters. Additionally, the `_azure_setup_uber_principal` and `_aws_setup_uber_principal` functions have been updated to accept the new `azure_resource_permissions` and `aws_resource_permissions` parameters. The `_azure_principal_prefix_access` and `_aws_principal_prefix_access` functions have also been updated similarly. We have also introduced a new `aws_resources` parameter in the `migrate_credentials` command, which is used to migrate Azure Service Principals in ADLS Gen2 locations to UC storage credentials. In terms of testing, we have replaced the `mocker.patch` calls with the creation of `AzureResourcePermissions` and `AWSResourcePermissions` objects, improving the code's readability and maintainability. Overall, these changes significantly enhance the library's functionality and maintainability in handling Azure and AWS workspaces. * Require Hatch v1.9.4 on build machines ([#1049](#1049)). In this release, we have updated the Hatch package version to 1.9.4 on build machines, addressing issue [#1049](#1049). The changes include updating the toolchain dependencies and setup in the `.codegen.json` file, which simplifies the setup process and now relies on a pre-existing Hatch environment and Python 3. The acceptance workflow has also been updated to use the latest version of Hatch and the `databrickslabs/sandbox/acceptance` GitHub action version `v0.1.4`. Hatch is a Python package manager that simplifies package development and management, and this update provides new features and bug fixes that can help improve the reliability and performance of the acceptance workflow. This change requires version 1.9.4 of the Hatch package on build machines, and it will affect the build process for the project but will not have any impact on the functionality of the project itself. As a software engineer adopting this project, it's important to note this change to ensure that the build process runs smoothly and takes advantage of any new features or improvements in Hatch 1.9.4. * Set acceptance tests to timeout after 45 minutes ([#1036](#1036)). As part of issue [#1036](#1036), the acceptance tests in this open-source library now have a 45-minute timeout configured, improving the reliability and stability of the testing environment. This change has been implemented in the `.github/workflows/acceptance.yml` file by adding the `timeout` parameter to the step where the `databrickslabs/sandbox/acceptance` action is called. This ensures that the acceptance tests will not run indefinitely and prevents any potential issues caused by long-running tests. By adopting this project, software engineers can now benefit from a more stable and reliable testing environment, with acceptance tests that are guaranteed to complete within a maximum of 45 minutes. * Updated databricks-labs-blueprint requirement from ~0.4.1 to ~0.4.3 ([#1058](#1058)). In this release, the version requirement for the `databricks-labs-blueprint` library has been updated from ~0.4.1 to ~0.4.3 in the pyproject.toml file. This change is necessary to support issues [#1056](#1056) and [#1057](#1057). The code has been manually tested and is ready for further testing to ensure the compatibility and smooth functioning of the software. It is essential to thoroughly test the latest version of the `databricks-labs-blueprint` library with the existing codebase before deploying it to production. This includes running a comprehensive suite of tests such as unit tests, integration tests, and verification on the staging environment. This modification allows the software to use the latest version of the library, improving its functionality and overall performance. * Use `MockPrompts.extend()` functionality in test_install to supply multiple prompts ([#1057](#1057)). This diff introduces the `MockPrompts.extend()` functionality in the `test_install` module to enable the supplying of multiple prompts for testing purposes. A new `base_prompts` dictionary with default prompts has been added and is extended with additional prompts for specific test cases. This allows for the testing of various scenarios, such as when UCX is already installed on the workspace and the user is prompted to choose between global or user installation. Additionally, new `force_user_environ` and `force_global_env` dictionaries have been added to simulate different installation environments. The functionality of the `WorkspaceInstaller` class and mocking of `webbrowser.open` are also utilized in the test cases. These changes aim to ensure the proper functioning of the configuration process for different installation scenarios.
…zure Service Principal for migration (#976) ## Changes - Added new cli cmd for create-master-principal in labs.yml, cli.py - Added separate class for AzureApiClient to separate out azure API calls - Added logic to create SPN, secret, roleassignment in resources and update workspace config with spn client_id - added logic to call create spn, update rbac of all storage account to that spn, update ucx cluster policy with spn secret for each storage account - test unit and int test cases Resolves #881 Related issues: - #993 - #693 ### Functionality - [ ] added relevant user documentation - [X] added new CLI command - [ ] modified existing command: `databricks labs ucx ...` - [ ] added a new workflow - [ ] modified existing workflow: `...` - [ ] added a new table - [ ] modified existing table: `...` ### Tests - [X] manually tested - [X] added unit tests - [X] added integration tests - [ ] verified on staging environment (screenshot attached)
…pal` command (#993) ## Changes Added CLI command `databricks labs ucx create-uber-principal` for creating uber-IAM profile for performing external table migration on AWS. Logic: * Stop if UCX migration cluster policy is not found * Collect paths of all locations/paths used in tables (call `external_location.snapshot`) * If cluster policy has an existing iam instance profile/role specified, then add/update migration policy providing access to the locations * If cluster policy does not have iam instance profile/role specified, then create new iam profile/role and migration policy, and add it to the cluster policy ### Linked issues Resolves #879 Related issues: - #976 - #693 ### Functionality - [x] added new CLI command ### Tests - [x] manually tested - [x] added unit tests ### TODO - [x] added integration tests - [x] verified on staging environment (screenshot attached) --------- Co-authored-by: Vuong <[email protected]>
* Added AWS IAM role support to `databricks labs ucx create-uber-principal` command ([#993](#993)). The `databricks labs ucx create-uber-principal` command now supports AWS Identity and Access Management (IAM) roles for external table migration. This new feature introduces a CLI command to create an `uber-IAM` profile, which checks for the UCX migration cluster policy and updates or adds the migration policy to provide access to the relevant table locations. If no IAM instance profile or role is specified in the cluster policy, a new one is created and the new migration policy is added. This change includes new methods and functions to handle AWS IAM roles, instance profiles, and related trust policies. Additionally, new unit and integration tests have been added and verified on the staging environment. The implementation also identifies all S3 buckets used by the Instance Profiles configured in the workspace. * Added Dashboard widget to show the list of cluster policies along with DBR version ([#1013](#1013)). In this code revision, the `assessment` module of the 'databricks/labs/ucx' package has been updated to include a new `PoliciesCrawler` class, which fetches, assesses, and snapshots cluster policies. This class extends `CrawlerBase` and `CheckClusterMixin` and introduces the '_crawl', '_assess_policies', '_try_fetch', and `snapshot` methods. The `PolicyInfo` dataclass has been added to hold policy information, with a structure similar to the `ClusterInfo` dataclass. The `ClusterInfo` dataclass has been updated to include `spark_version` and `policy_id` attributes. A new table for policies has been added, and cluster policies along with the DBR version are loaded into this table. Relevant user documentation, tests, and a Dashboard widget have been added to support this feature. The `create` function in 'fixtures.py' has been updated to enable a Delta preview feature in Spark configurations, and a new SQL file has been included for querying cluster policies. Additionally, a new `crawl_cluster_policies` method has been added to scan and store cluster policies with matching configurations. * Added `migration_status` table to capture a snapshot of migrated tables ([#1041](#1041)). A `migration_status` table has been added to track the status of migrated tables in the database, enabling improved management and tracking of migrations. The new `MigrationStatus` class, which is a dataclass that holds the source and destination schema, table, and updated timestamp, is added. The `TablesMigrate` class now has a new `_migration_status_refresher` attribute that is an instance of the new `MigrationStatusRefresher` class. This class crawls the `migration_status` table and returns a snapshot of the migration status, which is used to refresh the migration status and check if the table is upgraded. Additionally, the `_init_seen_tables` method is updated to get the seen tables from the `_migration_status_refresher` instead of fetching from the table properties. The `MigrationStatusRefresher` class fetches the migration status table and returns a snapshot of the migration status. This change also adds new test functions in the test file for the Hive metastore, which covers various scenarios such as migrating managed tables with and without caching, migrating external tables, and reverting migrated tables. * Added a check for existing inventory database to avoid losing existing, inject installation objects in tests and try fetching existing installation before setting global as default ([#1043](#1043)). In this release, we have added a new method, `_check_inventory_database_exists`, to the `WorkspaceInstallation` class, which checks if an inventory database with a given name already exists in the Workspace. This prevents accidental overwriting of existing data and improves the robustness of handling inventory databases. The `validate_and_run` method has been updated to call `app.current_installation(workspace_client)`, allowing for a more flexible handling of installations. The `Installation` class import has been updated to include `SerdeError`, and the test suite has been updated to inject installation objects and check for existing installations before setting the global installation as default. A new argument `inventory_schema_suffix` has been added to the `factory` method for customization of the inventory schema name. We have also added a new method `check_inventory_database_exists` to the `WorkspaceInstaller` class, which checks if an inventory database already exists for a given installation type and raises an `AlreadyExists` error if it does. The behavior of the `download` method in the `WorkspaceClient` class has been mocked, and the `get_status` method has been updated to return `NotFound` in certain tests. These changes aim to improve the robustness, flexibility, and safety of the installation process in the Workspace. * Added a check for external metastore in SQL warehouse configuration ([#1046](#1046)). In this release, we have added new functionality to the Unity Catalog (UCX) installation process to enable checking for and connecting to an external Hive metastore configuration. A new method, `_get_warehouse_config_with_external_hive_metastore`, has been introduced to retrieve the workspace warehouse config and identify if it is set up for an external Hive metastore. If so, and the user confirms the prompt, UCX will be configured to connect to the external metastore. Additionally, new methods `_extract_external_hive_metastore_sql_conf` and `test_cluster_policy_definition_<cloud_provider>_hms_warehouse()` have been added to handle the external metastore configuration for Azure, AWS, and GCP, and to handle the case when the data_access_config is empty. These changes provide more flexibility and ease of use when installing UCX with external Hive metastore configurations. The new imports `EndpointConfPair`, `GetWorkspaceWarehouseConfigResponse` from the `databricks.sdk.service.sql` package are used to handle the endpoint configuration of the SQL warehouse. * Added integration tests for AWS - create locations ([#1026](#1026)). In this release, we have added comprehensive integration tests for AWS resources and their management in the `tests/unit/assessment/test_aws.py` file. The `AWSResources` class has been updated with new methods (AwsIamRole, add_uc_role, add_uc_role_policy, and validate_connection) and the regular expression for matching S3 resource ARN has been modified. The `create_external_locations` method now allows for creating external locations without validating them, and the `_identify_missing_external_locations` function has been enhanced to match roles with a wildcard pattern. The new tests include validating the integration of AWS services with the system, testing the CLI's behavior when it is missing, and introducing new configuration scenarios with the addition of a Key Management Service (KMS) key during the creation of IAM roles and policies. These changes improve the robustness and reliability of AWS resource integration and handling in our system. * Bump Databricks SDK to v0.22.0 ([#1059](#1059)). In this release, we are bumping the Databricks SDK version to 0.22.0 and upgrading the `databricks-labs-lsql` package to ~0.2.2. The new dependencies for this release include `databricks-sdk==0.22.0`, `databricks-labs-lsql~=0.2.2`, `databricks-labs-blueprint~=0.4.3`, and `PyYAML>=6.0.0,<7.0.0`. In the `fixtures.py` file, we have added `PermissionLevel.CAN_QUERY` to the `CAN_VIEW` and `CAN_MANAGE` permissions in the `_path` function, allowing users to query the endpoint. Additionally, we have updated the `test_endpoints` function in the `test_generic.py` file as part of the integration tests for workspace access. This change updates the permission level for creating a serving endpoint from `CAN_MANAGE` to `CAN_QUERY`, meaning that the assigned group can now only query the endpoint. We have also included the `test_feature_tables` function in the commit, which tests the behavior of feature tables in the Databricks workspace. This change only affects the `test_endpoints` function and its assert statements, and does not impact the functionality of the `test_feature_tables` function. * Changed default UCX installation folder to `/Applications/ucx` from `/Users/<me>/.ucx` to allow multiple users users utilising the same installation ([#854](#854)). In this release, we've added a new advanced feature that allows users to force the installation of UCX over an existing installation using the `UCX_FORCE_INSTALL` environment variable. This variable can take two values `global` and 'user', providing more control and flexibility in installing UCX. The default UCX installation folder has been changed to /Applications/ucx from /Users/<me>/.ucx to enable multiple users to utilize the same installation. A table detailing the expected install location, `install_folder`, and mode for each combination of global and user values has been added to the README file. We've also added user prompts to confirm the installation if UCX is already installed and the `UCX_FORCE_INSTALL` variable is set to 'user'. This feature is useful when users want to install UCX in a specific location or force the installation over an existing installation. However, it is recommended to use this feature with caution, as it can potentially break existing installations if not used correctly. Additionally, several changes to the implementation of the UCX installation process have been made, as well as new tests to ensure that the installation process works correctly in various scenarios. * Fix: Recover lost fix for `webbrowser.open` mock ([#1052](#1052)). A fix has been implemented to address an issue related to the mock for `webbrowser.open` in the tests `test_repair_run` and `test_get_existing_installation_global`. This change prevents the `webbrowser.open` function from being called during these tests, which helps improve test stability and consistency. No new methods have been added, and the existing functionality of these tests has only been modified to include the `webbrowser.open` mock. This modification aims to enhance the reliability and predictability of these specific tests, ensuring accurate and consistent results. * Improved table migrations logic ([#1050](#1050)). This change introduces improvements to table migrations logic by refactoring unit tests to load table mappings from JSON instead of inline structs, adding an `escape_sql_identifier` function where missing, and preparing for ACLs migration. The `uc_grant_sql` method in `grants.py` has been updated to accept optional `object_type` and `object_key` parameters, and the hive-to-UC mapping has been expanded to include mappings for views. Additionally, new JSON files for external source table configuration have been added, and new functions have been introduced for loading fixture data from JSON files and creating mocked `WorkspaceClient` and `TableMapping` objects for testing. The changes improve the maintainability and security of the codebase, prepare it for future migration tasks, and ensure that the code is more adaptable and robust. The changes have been manually tested and verified on the staging environment. * Moved `SqlBackend` implementation to `databricks-labs-lsql` dependency ([#1042](#1042)). In this change, the `SqlBackend` implementation, including classes such as `StatementExecutionBackend` and `RuntimeBackend`, has been moved to a separate library, `databricks-labs-lsql`, which is managed at <https://github.com/databrickslabs/lsql>. This refactoring simplifies the current repository, promotes code reuse, and improves modularity by leveraging an external dependency. The modification includes adding a new line in the .gitignore file to exclude `*.out` files from version control. * Prepare for a PyPI release ([#1038](#1038)). In preparation for a PyPI release, this change introduces a new GitHub Actions workflow that automates the package release process and ensures the integrity of the released packages by signing them with Sigstore. When a new git tag starting with `v` is pushed, this workflow is triggered, building wheels using hatch, drafting a new GitHub release, publishing the package distributions to PyPI, and signing the artifacts with Sigstore. The `pyproject.toml` file is now used for metadata, replacing `setup.cfg` and `setup.py`, and is cached to improve build performance. In addition, the `pyproject.toml` file has been updated with recent metadata in preparation for the release, including updates to the package's authors, development status, classifiers, and dependencies. * Prevent fragile `mock.patch('databricks...')` in the test code ([#1037](#1037)). This change introduces a custom `pylint` checker to improve code flexibility and maintainability by preventing fragile `mock.patch` designs in test code. The new checker discourages the use of `MagicMock` and encourages the use of `create_autospec` to ensure that mocks have the same attributes and methods as the original class. This change has been implemented in multiple test files, including `test_cli.py`, `test_locations.py`, `test_mapping.py`, `test_table_migrate.py`, `test_table_move.py`, `test_workspace_access.py`, `test_redash.py`, `test_scim.py`, and `test_verification.py`, to improve the robustness and maintainability of the test code. Additionally, the commit removes the `verification.py` file, which contained a `VerificationManager` class for verifying applied permissions, scope ACLs, roles, and entitlements for various objects in a Databricks workspace. * Removed `mocker.patch("databricks...)` from `test_cli` ([#1047](#1047)). In this release, we have made significant updates to the library's handling of Azure and AWS workspaces. We have added new parameters `azure_resource_permissions` and `aws_permissions` to the `_execute_for_cloud` function in `cli.py`, which are passed to the `func_azure` and `func_aws` functions respectively. The `create_uber_principal` and `principal_prefix_access` commands have also been updated to include these new parameters. Additionally, the `_azure_setup_uber_principal` and `_aws_setup_uber_principal` functions have been updated to accept the new `azure_resource_permissions` and `aws_resource_permissions` parameters. The `_azure_principal_prefix_access` and `_aws_principal_prefix_access` functions have also been updated similarly. We have also introduced a new `aws_resources` parameter in the `migrate_credentials` command, which is used to migrate Azure Service Principals in ADLS Gen2 locations to UC storage credentials. In terms of testing, we have replaced the `mocker.patch` calls with the creation of `AzureResourcePermissions` and `AWSResourcePermissions` objects, improving the code's readability and maintainability. Overall, these changes significantly enhance the library's functionality and maintainability in handling Azure and AWS workspaces. * Require Hatch v1.9.4 on build machines ([#1049](#1049)). In this release, we have updated the Hatch package version to 1.9.4 on build machines, addressing issue [#1049](#1049). The changes include updating the toolchain dependencies and setup in the `.codegen.json` file, which simplifies the setup process and now relies on a pre-existing Hatch environment and Python 3. The acceptance workflow has also been updated to use the latest version of Hatch and the `databrickslabs/sandbox/acceptance` GitHub action version `v0.1.4`. Hatch is a Python package manager that simplifies package development and management, and this update provides new features and bug fixes that can help improve the reliability and performance of the acceptance workflow. This change requires version 1.9.4 of the Hatch package on build machines, and it will affect the build process for the project but will not have any impact on the functionality of the project itself. As a software engineer adopting this project, it's important to note this change to ensure that the build process runs smoothly and takes advantage of any new features or improvements in Hatch 1.9.4. * Set acceptance tests to timeout after 45 minutes ([#1036](#1036)). As part of issue [#1036](#1036), the acceptance tests in this open-source library now have a 45-minute timeout configured, improving the reliability and stability of the testing environment. This change has been implemented in the `.github/workflows/acceptance.yml` file by adding the `timeout` parameter to the step where the `databrickslabs/sandbox/acceptance` action is called. This ensures that the acceptance tests will not run indefinitely and prevents any potential issues caused by long-running tests. By adopting this project, software engineers can now benefit from a more stable and reliable testing environment, with acceptance tests that are guaranteed to complete within a maximum of 45 minutes. * Updated databricks-labs-blueprint requirement from ~0.4.1 to ~0.4.3 ([#1058](#1058)). In this release, the version requirement for the `databricks-labs-blueprint` library has been updated from ~0.4.1 to ~0.4.3 in the pyproject.toml file. This change is necessary to support issues [#1056](#1056) and [#1057](#1057). The code has been manually tested and is ready for further testing to ensure the compatibility and smooth functioning of the software. It is essential to thoroughly test the latest version of the `databricks-labs-blueprint` library with the existing codebase before deploying it to production. This includes running a comprehensive suite of tests such as unit tests, integration tests, and verification on the staging environment. This modification allows the software to use the latest version of the library, improving its functionality and overall performance. * Use `MockPrompts.extend()` functionality in test_install to supply multiple prompts ([#1057](#1057)). This diff introduces the `MockPrompts.extend()` functionality in the `test_install` module to enable the supplying of multiple prompts for testing purposes. A new `base_prompts` dictionary with default prompts has been added and is extended with additional prompts for specific test cases. This allows for the testing of various scenarios, such as when UCX is already installed on the workspace and the user is prompted to choose between global or user installation. Additionally, new `force_user_environ` and `force_global_env` dictionaries have been added to simulate different installation environments. The functionality of the `WorkspaceInstaller` class and mocking of `webbrowser.open` are also utilized in the test cases. These changes aim to ensure the proper functioning of the configuration process for different installation scenarios.
Changes
Added CLI command
databricks labs ucx create-uber-principalfor creating uber-IAM profile for performing external table migration on AWS.Logic:
external_location.snapshot)Linked issues
Resolves #879
Related issues:
databricks labs ucx create-uber-principalcommand to create Azure Service Principal for migration #976databricks labs ucx create-uber-principalcommand to automate the creation of SPN with storage access to all locations #693Functionality
Tests
TODO