Prefetch all account-level and workspace-level groups#192

Merged

nfx merged 5 commits intomainfrom

refactor/group-scim-api-usages

Sep 13, 2023

Contributor

renardeinside commented Sep 13, 2023 •

edited

Loading

Addresses #190 and other SCIM-related issues.

Conceptually, the following happens:

Existing method to list the workspace groups is inconsistent, and the root cause of these issues is somewhere on SCIM API side. We've raised a ticket, but the resolution timing is unclear.
To avoid being time-dependent on SCIM API, we re-introduce the listing methods, but this time we intentionally don't use filter-based conditions and explicitly split the API methods into two.
To avoid reaching out to the SCIM API too frequently, we list the whole API only once and then work with an in-memory data.

This PR also adds 100% coverage for GroupManager and adds integration tests for groups to verify it works as expected.


          add integration tests for group

a5e5083

renardeinside requested review from larsgeorge-db and nfx as code owners

September 13, 2023 14:27


          remove filter usages

eac133b

renardeinside linked an issue

that may be closed by this pull request

group matching logic has changed due to changes on the API #190

Closed


          Merge remote-tracking branch 'origin/main' into refactor/group-scim-a…

0d4592a

…pi-usages

nfx requested changes

View reviewed changes

Collaborator

nfx left a comment

Thank you for fixing the TODO: calling this can cause issues for SCIM backend, cache groups instead :)

adding AvailableGroups type looks redundant, make two properties - self._account_groups and self._workspace_groups
BUG: when you _replace_group , also delete item from self._workspace_groups, otherwise we have an inconsistent state, that may lead to surprising errors. it's already done in _get_or_create_backup_group, so probably forgotten to be added here.
don't do with patch.object(GroupManager, "_list_account_groups", but rather do client.api_client.do.return_value = { ... }.

src/databricks/labs/ucx/managers/group.py Outdated

-                      group = next(
-                          iter(method(filter=query_filter, attributes=attributes)),
-                          None,
+                      self._available_groups: AvailableGroups = self._list_available_groups()

Collaborator

nfx Sep 13, 2023

adding AvailableGroups type looks redundant, make two properties - self._account_groups and self._workspace_groups

src/databricks/labs/ucx/managers/group.py Show resolved Hide resolved

src/databricks/labs/ucx/managers/group.py Outdated

Comment on lines +70 to +82

+                  def _list_available_groups(self):
+                      """
+                      Note - this method will provide all groups during initialization.
+                      We apply this method to decrease the amount of calls to the SCIM API.
+                      To decrease API load, we also do not request the group members.
+                      :return:
+                      """
+                      logger.info("Listing available groups...")
+                      workspace_groups = self._list_workspace_groups()
+                      account_groups = self._list_account_groups()
+                      logger.info(f"Found {len(workspace_groups)} workspace groups and {len(account_groups)} account groups")
+                      return AvailableGroups(workspace=workspace_groups, account=account_groups)

Collaborator

nfx Sep 13, 2023 •

edited

Loading

Suggested change

      
                def _list_available_groups(self):
          
                    """
          
                    Note - this method will provide all groups during initialization.
          
                    We apply this method to decrease the amount of calls to the SCIM API.
          
                    To decrease API load, we also do not request the group members.
          
                    :return:
          
                    """
          
                    logger.info("Listing available groups...")
          
                    workspace_groups = self._list_workspace_groups()
          
                    account_groups = self._list_account_groups()
          
                    logger.info(f"Found {len(workspace_groups)} workspace groups and {len(account_groups)} account groups")
          
                    return AvailableGroups(workspace=workspace_groups, account=account_groups)

this method looks redundant, please assign properties directly

src/databricks/labs/ucx/managers/group.py Outdated

+                      relevant_level_groups = (
+                          self._available_groups.workspace if level == GroupLevel.WORKSPACE else self._available_groups.account
                       )
+                      _group = next((g for g in relevant_level_groups if g.display_name == group_name), None)

Collaborator

nfx Sep 13, 2023

nit: it's a bit unclear what next does and it may be less readable by people less familiar with python. can you replace it with a for loop with early return?

to be honest, self._get_account_group(name) and self._get_workspace_group(name) might be better readable, but you don't have to do it this PR

src/databricks/labs/ucx/managers/group.py Show resolved Hide resolved

tests/unit/test_group.py Outdated

                   group_conf = GroupsConfig(selected=[""])
-                  assert GroupManager(client, group_conf)._find_eligible_groups() == [users_group.display_name]
+                  with patch.object(GroupManager, "_list_account_groups", return_value=[account_admins_group]):

Collaborator

nfx Sep 13, 2023

it's a unit testing anti-pattern to mock out methods of the class under test. correct way is to mock it through client, which will cover the self._ws.api_client.do call. see client.groups.list.return_value few lines above

nfx changed the title ~~Refactor SCIM API usages~~ Prefetch all account-level and workspace-level groups

renardeinside added 2 commits

September 13, 2023 20:56


          address review

0017d10


          Merge remote-tracking branch 'origin/main' into refactor/group-scim-a…

df39c52

…pi-usages

nfx approved these changes

View reviewed changes

Collaborator

nfx left a comment

Thank you! Way more readable with the proper mocking :)

nfx added this pull request to the merge queue

Merged via the queue into main with commit b654a3a

renardeinside deleted the refactor/group-scim-api-usages branch

September 13, 2023 20:30

william-conti pushed a commit that referenced this pull request


          Prefetch all account-level and workspace-level groups (#192)

1860e65

Addresses #190 and other SCIM-related issues. 

Conceptually, the following happens:

1. Existing method to list the workspace groups is inconsistent, and the
root cause of these issues is somewhere on SCIM API side. We've raised a
ticket, but the resolution timing is unclear.
2. To avoid being time-dependent on SCIM API, we re-introduce the
listing methods, but this time we intentionally don't use `filter`-based
conditions and explicitly split the API methods into two.
3. To avoid reaching out to the SCIM API too frequently, we list the
whole API only once and then work with an in-memory data.

This PR also adds 100% coverage for `GroupManager` and adds integration
tests for groups to verify it works as expected.

nfx mentioned this pull request

Release v0.1.0 #213

Merged

nfx added a commit that referenced this pull request


          Release v0.1.0 (#213)

c6019ad

# Version changelog

## 0.1.0

Features

* Added interactive installation wizard
([#184](#184),
[#117](#117)).
* Added schedule of jobs as part of `install.sh` flow and created some
documentation ([#187](#187)).
* Added debug notebook companion to troubleshoot the installation
([#191](#191)).
* Added support for Hive Metastore Table ACLs inventory from all
databases ([#78](#78),
[#122](#122),
[#151](#151)).
* Created `$inventory.tables` from Scala notebook
([#207](#207)).
* Added local group migration support for ML-related objects
([#56](#56)).
* Added local group migration support for SQL warehouses
([#57](#57)).
* Added local group migration support for all compute-related resources
([#53](#53)).
* Added local group migration support for security-related objects
([#58](#58)).
* Added local group migration support for workflows
([#54](#54)).
* Added local group migration support for workspace-level objects
([#59](#59)).
* Added local group migration support for dashboards, queries, and
alerts ([#144](#144)).

Stability

* Added `codecov.io` publishing
([#204](#204)).
* Added more tests to group.py
([#148](#148)).
* Added tests for group state
([#133](#133)).
* Added tests for inventorizer and typed
([#125](#125)).
* Added tests WorkspaceListing
([#110](#110)).
* Added `make_*_permissions` fixtures
([#159](#159)).
* Added reusable fixtures module
([#119](#119)).
* Added testing for permissions
([#126](#126)).
* Added inventory table manager tests
([#153](#153)).
* Added `product_info` to track as SDK integration
([#76](#76)).
* Added failsafe permission get operations
([#65](#65)).
* Always install the latest `pip` version in `./install.sh`
([#201](#201)).
* Always store inventory in `hive_metastore` and make only
`inventory_database` configurable
([#178](#178)).
* Changed default logging level from `TRACE` to `DEBUG` log level
([#124](#124)).
* Consistently use `WorkspaceClient` from `databricks.sdk`
([#120](#120)).
* Convert pipeline code to use fixtures.
([#166](#166)).
* Exclude mixins from coverage
([#130](#130)).
* Fixed codecov.io reporting
([#212](#212)).
* Fixed configuration path in job task install code
([#210](#210)).
* Fixed a bug with dependency definitions
([#70](#70)).
* Fixed failing `test_jobs`
([#140](#140)).
* Fixed the issues with experiment listing
([#64](#64)).
* Fixed integration testing configuration
([#77](#77)).
* Make project runnable on nightly testing infrastructure
([#75](#75)).
* Migrated cluster policies to new fixtures
([#174](#174)).
* Migrated clusters to the new fixture framework
([#162](#162)).
* Migrated instance pool to the new fixture framework
([#161](#161)).
* Migrated to `databricks.labs.ucx` package
([#90](#90)).
* Migrated token authorization to new fixtures
([#175](#175)).
* Migrated experiment fixture to standard one
([#168](#168)).
* Migrated jobs test to fixture based one.
([#167](#167)).
* Migrated model fixture to the standard fixtures
([#169](#169)).
* Migrated warehouse fixture to standard one
([#170](#170)).
* Organise modules by domain
([#197](#197)).
* Prefetch all account-level and workspace-level groups
([#192](#192)).
* Programmatically create a dashboard
([#121](#121)).
* Properly integrate Python `logging` facility
([#118](#118)).
* Refactored code to use Databricks SDK for Python
([#27](#27)).
* Refactored configuration and remove global provider state
([#71](#71)).
* Removed `pydantic` dependency
([#138](#138)).
* Removed redundant `pyspark`, `databricks-connect`, `delta-spark`, and
`pandas` dependencies
([#193](#193)).
* Removed redundant `typer[all]` dependency and its usages
([#194](#194)).
* Renamed `MigrationGroupsProvider` to `GroupMigrationState`
([#81](#81)).
* Replaced `ratelimit` and `tenacity` dependencies with simpler
implementations ([#195](#195)).
* Reorganised integration tests to align more with unit tests
([#206](#206)).
* Run `build` workflow also on `main` branch
([#211](#211)).
* Run integration test with a single group
([#152](#152)).
* Simplify `SqlBackend` and table creation logic
([#203](#203)).
* Updated `migration_config.yml`
([#179](#179)).
* Updated legal information
([#196](#196)).
* Use `make_secret_scope` fixture
([#163](#163)).
* Use fixture factory for `make_table`, `make_schema`, and
`make_catalog` ([#189](#189)).
* Use new fixtures for notebooks and folders
([#176](#176)).
* Validate toolkit notebook test
([#183](#183)).

Contributing

* Added a note on external dependencies
([#139](#139)).
* Added ability to run SQL queries on Spark when in Databricks Runtime
([#108](#108)).
* Added some ground rules for contributing
([#82](#82)).
* Added contributing instructions link from main readme
([#109](#109)).
* Added info about environment refreshes
([#155](#155)).
* Clarified documentation
([#137](#137)).
* Enabled merge queue
([#146](#146)).
* Improved `CONTRIBUTING.md` guide
([#135](#135),
[#145](#145)).

FastLee pushed a commit that referenced this pull request


          Release v0.1.0 (#213)

3d4c4a2

# Version changelog

## 0.1.0

Features

* Added interactive installation wizard
([#184](#184),
[#117](#117)).
* Added schedule of jobs as part of `install.sh` flow and created some
documentation ([#187](#187)).
* Added debug notebook companion to troubleshoot the installation
([#191](#191)).
* Added support for Hive Metastore Table ACLs inventory from all
databases ([#78](#78),
[#122](#122),
[#151](#151)).
* Created `$inventory.tables` from Scala notebook
([#207](#207)).
* Added local group migration support for ML-related objects
([#56](#56)).
* Added local group migration support for SQL warehouses
([#57](#57)).
* Added local group migration support for all compute-related resources
([#53](#53)).
* Added local group migration support for security-related objects
([#58](#58)).
* Added local group migration support for workflows
([#54](#54)).
* Added local group migration support for workspace-level objects
([#59](#59)).
* Added local group migration support for dashboards, queries, and
alerts ([#144](#144)).

Stability

* Added `codecov.io` publishing
([#204](#204)).
* Added more tests to group.py
([#148](#148)).
* Added tests for group state
([#133](#133)).
* Added tests for inventorizer and typed
([#125](#125)).
* Added tests WorkspaceListing
([#110](#110)).
* Added `make_*_permissions` fixtures
([#159](#159)).
* Added reusable fixtures module
([#119](#119)).
* Added testing for permissions
([#126](#126)).
* Added inventory table manager tests
([#153](#153)).
* Added `product_info` to track as SDK integration
([#76](#76)).
* Added failsafe permission get operations
([#65](#65)).
* Always install the latest `pip` version in `./install.sh`
([#201](#201)).
* Always store inventory in `hive_metastore` and make only
`inventory_database` configurable
([#178](#178)).
* Changed default logging level from `TRACE` to `DEBUG` log level
([#124](#124)).
* Consistently use `WorkspaceClient` from `databricks.sdk`
([#120](#120)).
* Convert pipeline code to use fixtures.
([#166](#166)).
* Exclude mixins from coverage
([#130](#130)).
* Fixed codecov.io reporting
([#212](#212)).
* Fixed configuration path in job task install code
([#210](#210)).
* Fixed a bug with dependency definitions
([#70](#70)).
* Fixed failing `test_jobs`
([#140](#140)).
* Fixed the issues with experiment listing
([#64](#64)).
* Fixed integration testing configuration
([#77](#77)).
* Make project runnable on nightly testing infrastructure
([#75](#75)).
* Migrated cluster policies to new fixtures
([#174](#174)).
* Migrated clusters to the new fixture framework
([#162](#162)).
* Migrated instance pool to the new fixture framework
([#161](#161)).
* Migrated to `databricks.labs.ucx` package
([#90](#90)).
* Migrated token authorization to new fixtures
([#175](#175)).
* Migrated experiment fixture to standard one
([#168](#168)).
* Migrated jobs test to fixture based one.
([#167](#167)).
* Migrated model fixture to the standard fixtures
([#169](#169)).
* Migrated warehouse fixture to standard one
([#170](#170)).
* Organise modules by domain
([#197](#197)).
* Prefetch all account-level and workspace-level groups
([#192](#192)).
* Programmatically create a dashboard
([#121](#121)).
* Properly integrate Python `logging` facility
([#118](#118)).
* Refactored code to use Databricks SDK for Python
([#27](#27)).
* Refactored configuration and remove global provider state
([#71](#71)).
* Removed `pydantic` dependency
([#138](#138)).
* Removed redundant `pyspark`, `databricks-connect`, `delta-spark`, and
`pandas` dependencies
([#193](#193)).
* Removed redundant `typer[all]` dependency and its usages
([#194](#194)).
* Renamed `MigrationGroupsProvider` to `GroupMigrationState`
([#81](#81)).
* Replaced `ratelimit` and `tenacity` dependencies with simpler
implementations ([#195](#195)).
* Reorganised integration tests to align more with unit tests
([#206](#206)).
* Run `build` workflow also on `main` branch
([#211](#211)).
* Run integration test with a single group
([#152](#152)).
* Simplify `SqlBackend` and table creation logic
([#203](#203)).
* Updated `migration_config.yml`
([#179](#179)).
* Updated legal information
([#196](#196)).
* Use `make_secret_scope` fixture
([#163](#163)).
* Use fixture factory for `make_table`, `make_schema`, and
`make_catalog` ([#189](#189)).
* Use new fixtures for notebooks and folders
([#176](#176)).
* Validate toolkit notebook test
([#183](#183)).

Contributing

* Added a note on external dependencies
([#139](#139)).
* Added ability to run SQL queries on Spark when in Databricks Runtime
([#108](#108)).
* Added some ground rules for contributing
([#82](#82)).
* Added contributing instructions link from main readme
([#109](#109)).
* Added info about environment refreshes
([#155](#155)).
* Clarified documentation
([#137](#137)).
* Enabled merge queue
([#146](#146)).
* Improved `CONTRIBUTING.md` guide
([#135](#135),
[#145](#145)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet