Prefetch all account-level and workspace-level groups#192
Conversation
nfx
left a comment
There was a problem hiding this comment.
Thank you for fixing the TODO: calling this can cause issues for SCIM backend, cache groups instead :)
- adding AvailableGroups type looks redundant, make two properties -
self._account_groupsandself._workspace_groups - BUG: when you
_replace_group, also delete item from self._workspace_groups, otherwise we have an inconsistent state, that may lead to surprising errors. it's already done in_get_or_create_backup_group, so probably forgotten to be added here. - don't do
with patch.object(GroupManager, "_list_account_groups", but rather doclient.api_client.do.return_value = { ... }.
| group = next( | ||
| iter(method(filter=query_filter, attributes=attributes)), | ||
| None, | ||
| self._available_groups: AvailableGroups = self._list_available_groups() |
There was a problem hiding this comment.
adding AvailableGroups type looks redundant, make two properties - self._account_groups and self._workspace_groups
| def _list_available_groups(self): | ||
| """ | ||
| Note - this method will provide all groups during initialization. | ||
| We apply this method to decrease the amount of calls to the SCIM API. | ||
| To decrease API load, we also do not request the group members. | ||
| :return: | ||
| """ | ||
| logger.info("Listing available groups...") | ||
| workspace_groups = self._list_workspace_groups() | ||
| account_groups = self._list_account_groups() | ||
|
|
||
| logger.info(f"Found {len(workspace_groups)} workspace groups and {len(account_groups)} account groups") | ||
| return AvailableGroups(workspace=workspace_groups, account=account_groups) |
There was a problem hiding this comment.
| def _list_available_groups(self): | |
| """ | |
| Note - this method will provide all groups during initialization. | |
| We apply this method to decrease the amount of calls to the SCIM API. | |
| To decrease API load, we also do not request the group members. | |
| :return: | |
| """ | |
| logger.info("Listing available groups...") | |
| workspace_groups = self._list_workspace_groups() | |
| account_groups = self._list_account_groups() | |
| logger.info(f"Found {len(workspace_groups)} workspace groups and {len(account_groups)} account groups") | |
| return AvailableGroups(workspace=workspace_groups, account=account_groups) |
this method looks redundant, please assign properties directly
| relevant_level_groups = ( | ||
| self._available_groups.workspace if level == GroupLevel.WORKSPACE else self._available_groups.account | ||
| ) | ||
| _group = next((g for g in relevant_level_groups if g.display_name == group_name), None) |
There was a problem hiding this comment.
nit: it's a bit unclear what next does and it may be less readable by people less familiar with python. can you replace it with a for loop with early return?
to be honest, self._get_account_group(name) and self._get_workspace_group(name) might be better readable, but you don't have to do it this PR
tests/unit/test_group.py
Outdated
| group_conf = GroupsConfig(selected=[""]) | ||
|
|
||
| assert GroupManager(client, group_conf)._find_eligible_groups() == [users_group.display_name] | ||
| with patch.object(GroupManager, "_list_account_groups", return_value=[account_admins_group]): |
There was a problem hiding this comment.
it's a unit testing anti-pattern to mock out methods of the class under test. correct way is to mock it through client, which will cover the self._ws.api_client.do call. see client.groups.list.return_value few lines above
nfx
left a comment
There was a problem hiding this comment.
Thank you! Way more readable with the proper mocking :)
Addresses #190 and other SCIM-related issues. Conceptually, the following happens: 1. Existing method to list the workspace groups is inconsistent, and the root cause of these issues is somewhere on SCIM API side. We've raised a ticket, but the resolution timing is unclear. 2. To avoid being time-dependent on SCIM API, we re-introduce the listing methods, but this time we intentionally don't use `filter`-based conditions and explicitly split the API methods into two. 3. To avoid reaching out to the SCIM API too frequently, we list the whole API only once and then work with an in-memory data. This PR also adds 100% coverage for `GroupManager` and adds integration tests for groups to verify it works as expected.
# Version changelog ## 0.1.0 Features * Added interactive installation wizard ([#184](#184), [#117](#117)). * Added schedule of jobs as part of `install.sh` flow and created some documentation ([#187](#187)). * Added debug notebook companion to troubleshoot the installation ([#191](#191)). * Added support for Hive Metastore Table ACLs inventory from all databases ([#78](#78), [#122](#122), [#151](#151)). * Created `$inventory.tables` from Scala notebook ([#207](#207)). * Added local group migration support for ML-related objects ([#56](#56)). * Added local group migration support for SQL warehouses ([#57](#57)). * Added local group migration support for all compute-related resources ([#53](#53)). * Added local group migration support for security-related objects ([#58](#58)). * Added local group migration support for workflows ([#54](#54)). * Added local group migration support for workspace-level objects ([#59](#59)). * Added local group migration support for dashboards, queries, and alerts ([#144](#144)). Stability * Added `codecov.io` publishing ([#204](#204)). * Added more tests to group.py ([#148](#148)). * Added tests for group state ([#133](#133)). * Added tests for inventorizer and typed ([#125](#125)). * Added tests WorkspaceListing ([#110](#110)). * Added `make_*_permissions` fixtures ([#159](#159)). * Added reusable fixtures module ([#119](#119)). * Added testing for permissions ([#126](#126)). * Added inventory table manager tests ([#153](#153)). * Added `product_info` to track as SDK integration ([#76](#76)). * Added failsafe permission get operations ([#65](#65)). * Always install the latest `pip` version in `./install.sh` ([#201](#201)). * Always store inventory in `hive_metastore` and make only `inventory_database` configurable ([#178](#178)). * Changed default logging level from `TRACE` to `DEBUG` log level ([#124](#124)). * Consistently use `WorkspaceClient` from `databricks.sdk` ([#120](#120)). * Convert pipeline code to use fixtures. ([#166](#166)). * Exclude mixins from coverage ([#130](#130)). * Fixed codecov.io reporting ([#212](#212)). * Fixed configuration path in job task install code ([#210](#210)). * Fixed a bug with dependency definitions ([#70](#70)). * Fixed failing `test_jobs` ([#140](#140)). * Fixed the issues with experiment listing ([#64](#64)). * Fixed integration testing configuration ([#77](#77)). * Make project runnable on nightly testing infrastructure ([#75](#75)). * Migrated cluster policies to new fixtures ([#174](#174)). * Migrated clusters to the new fixture framework ([#162](#162)). * Migrated instance pool to the new fixture framework ([#161](#161)). * Migrated to `databricks.labs.ucx` package ([#90](#90)). * Migrated token authorization to new fixtures ([#175](#175)). * Migrated experiment fixture to standard one ([#168](#168)). * Migrated jobs test to fixture based one. ([#167](#167)). * Migrated model fixture to the standard fixtures ([#169](#169)). * Migrated warehouse fixture to standard one ([#170](#170)). * Organise modules by domain ([#197](#197)). * Prefetch all account-level and workspace-level groups ([#192](#192)). * Programmatically create a dashboard ([#121](#121)). * Properly integrate Python `logging` facility ([#118](#118)). * Refactored code to use Databricks SDK for Python ([#27](#27)). * Refactored configuration and remove global provider state ([#71](#71)). * Removed `pydantic` dependency ([#138](#138)). * Removed redundant `pyspark`, `databricks-connect`, `delta-spark`, and `pandas` dependencies ([#193](#193)). * Removed redundant `typer[all]` dependency and its usages ([#194](#194)). * Renamed `MigrationGroupsProvider` to `GroupMigrationState` ([#81](#81)). * Replaced `ratelimit` and `tenacity` dependencies with simpler implementations ([#195](#195)). * Reorganised integration tests to align more with unit tests ([#206](#206)). * Run `build` workflow also on `main` branch ([#211](#211)). * Run integration test with a single group ([#152](#152)). * Simplify `SqlBackend` and table creation logic ([#203](#203)). * Updated `migration_config.yml` ([#179](#179)). * Updated legal information ([#196](#196)). * Use `make_secret_scope` fixture ([#163](#163)). * Use fixture factory for `make_table`, `make_schema`, and `make_catalog` ([#189](#189)). * Use new fixtures for notebooks and folders ([#176](#176)). * Validate toolkit notebook test ([#183](#183)). Contributing * Added a note on external dependencies ([#139](#139)). * Added ability to run SQL queries on Spark when in Databricks Runtime ([#108](#108)). * Added some ground rules for contributing ([#82](#82)). * Added contributing instructions link from main readme ([#109](#109)). * Added info about environment refreshes ([#155](#155)). * Clarified documentation ([#137](#137)). * Enabled merge queue ([#146](#146)). * Improved `CONTRIBUTING.md` guide ([#135](#135), [#145](#145)).
# Version changelog ## 0.1.0 Features * Added interactive installation wizard ([#184](#184), [#117](#117)). * Added schedule of jobs as part of `install.sh` flow and created some documentation ([#187](#187)). * Added debug notebook companion to troubleshoot the installation ([#191](#191)). * Added support for Hive Metastore Table ACLs inventory from all databases ([#78](#78), [#122](#122), [#151](#151)). * Created `$inventory.tables` from Scala notebook ([#207](#207)). * Added local group migration support for ML-related objects ([#56](#56)). * Added local group migration support for SQL warehouses ([#57](#57)). * Added local group migration support for all compute-related resources ([#53](#53)). * Added local group migration support for security-related objects ([#58](#58)). * Added local group migration support for workflows ([#54](#54)). * Added local group migration support for workspace-level objects ([#59](#59)). * Added local group migration support for dashboards, queries, and alerts ([#144](#144)). Stability * Added `codecov.io` publishing ([#204](#204)). * Added more tests to group.py ([#148](#148)). * Added tests for group state ([#133](#133)). * Added tests for inventorizer and typed ([#125](#125)). * Added tests WorkspaceListing ([#110](#110)). * Added `make_*_permissions` fixtures ([#159](#159)). * Added reusable fixtures module ([#119](#119)). * Added testing for permissions ([#126](#126)). * Added inventory table manager tests ([#153](#153)). * Added `product_info` to track as SDK integration ([#76](#76)). * Added failsafe permission get operations ([#65](#65)). * Always install the latest `pip` version in `./install.sh` ([#201](#201)). * Always store inventory in `hive_metastore` and make only `inventory_database` configurable ([#178](#178)). * Changed default logging level from `TRACE` to `DEBUG` log level ([#124](#124)). * Consistently use `WorkspaceClient` from `databricks.sdk` ([#120](#120)). * Convert pipeline code to use fixtures. ([#166](#166)). * Exclude mixins from coverage ([#130](#130)). * Fixed codecov.io reporting ([#212](#212)). * Fixed configuration path in job task install code ([#210](#210)). * Fixed a bug with dependency definitions ([#70](#70)). * Fixed failing `test_jobs` ([#140](#140)). * Fixed the issues with experiment listing ([#64](#64)). * Fixed integration testing configuration ([#77](#77)). * Make project runnable on nightly testing infrastructure ([#75](#75)). * Migrated cluster policies to new fixtures ([#174](#174)). * Migrated clusters to the new fixture framework ([#162](#162)). * Migrated instance pool to the new fixture framework ([#161](#161)). * Migrated to `databricks.labs.ucx` package ([#90](#90)). * Migrated token authorization to new fixtures ([#175](#175)). * Migrated experiment fixture to standard one ([#168](#168)). * Migrated jobs test to fixture based one. ([#167](#167)). * Migrated model fixture to the standard fixtures ([#169](#169)). * Migrated warehouse fixture to standard one ([#170](#170)). * Organise modules by domain ([#197](#197)). * Prefetch all account-level and workspace-level groups ([#192](#192)). * Programmatically create a dashboard ([#121](#121)). * Properly integrate Python `logging` facility ([#118](#118)). * Refactored code to use Databricks SDK for Python ([#27](#27)). * Refactored configuration and remove global provider state ([#71](#71)). * Removed `pydantic` dependency ([#138](#138)). * Removed redundant `pyspark`, `databricks-connect`, `delta-spark`, and `pandas` dependencies ([#193](#193)). * Removed redundant `typer[all]` dependency and its usages ([#194](#194)). * Renamed `MigrationGroupsProvider` to `GroupMigrationState` ([#81](#81)). * Replaced `ratelimit` and `tenacity` dependencies with simpler implementations ([#195](#195)). * Reorganised integration tests to align more with unit tests ([#206](#206)). * Run `build` workflow also on `main` branch ([#211](#211)). * Run integration test with a single group ([#152](#152)). * Simplify `SqlBackend` and table creation logic ([#203](#203)). * Updated `migration_config.yml` ([#179](#179)). * Updated legal information ([#196](#196)). * Use `make_secret_scope` fixture ([#163](#163)). * Use fixture factory for `make_table`, `make_schema`, and `make_catalog` ([#189](#189)). * Use new fixtures for notebooks and folders ([#176](#176)). * Validate toolkit notebook test ([#183](#183)). Contributing * Added a note on external dependencies ([#139](#139)). * Added ability to run SQL queries on Spark when in Databricks Runtime ([#108](#108)). * Added some ground rules for contributing ([#82](#82)). * Added contributing instructions link from main readme ([#109](#109)). * Added info about environment refreshes ([#155](#155)). * Clarified documentation ([#137](#137)). * Enabled merge queue ([#146](#146)). * Improved `CONTRIBUTING.md` guide ([#135](#135), [#145](#145)).
Addresses #190 and other SCIM-related issues.
Conceptually, the following happens:
filter-based conditions and explicitly split the API methods into two.This PR also adds 100% coverage for
GroupManagerand adds integration tests for groups to verify it works as expected.