Skip to content

Prefetch all account-level and workspace-level groups#192

Merged
nfx merged 5 commits intomainfrom
refactor/group-scim-api-usages
Sep 13, 2023
Merged

Prefetch all account-level and workspace-level groups#192
nfx merged 5 commits intomainfrom
refactor/group-scim-api-usages

Conversation

@renardeinside
Copy link
Copy Markdown
Contributor

@renardeinside renardeinside commented Sep 13, 2023

Addresses #190 and other SCIM-related issues.

Conceptually, the following happens:

  1. Existing method to list the workspace groups is inconsistent, and the root cause of these issues is somewhere on SCIM API side. We've raised a ticket, but the resolution timing is unclear.
  2. To avoid being time-dependent on SCIM API, we re-introduce the listing methods, but this time we intentionally don't use filter-based conditions and explicitly split the API methods into two.
  3. To avoid reaching out to the SCIM API too frequently, we list the whole API only once and then work with an in-memory data.

This PR also adds 100% coverage for GroupManager and adds integration tests for groups to verify it works as expected.

@renardeinside renardeinside linked an issue Sep 13, 2023 that may be closed by this pull request
Copy link
Copy Markdown
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for fixing the TODO: calling this can cause issues for SCIM backend, cache groups instead :)

  1. adding AvailableGroups type looks redundant, make two properties - self._account_groups and self._workspace_groups
  2. BUG: when you _replace_group , also delete item from self._workspace_groups, otherwise we have an inconsistent state, that may lead to surprising errors. it's already done in _get_or_create_backup_group, so probably forgotten to be added here.
  3. don't do with patch.object(GroupManager, "_list_account_groups", but rather do client.api_client.do.return_value = { ... }.

group = next(
iter(method(filter=query_filter, attributes=attributes)),
None,
self._available_groups: AvailableGroups = self._list_available_groups()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding AvailableGroups type looks redundant, make two properties - self._account_groups and self._workspace_groups

Comment on lines +70 to +82
def _list_available_groups(self):
"""
Note - this method will provide all groups during initialization.
We apply this method to decrease the amount of calls to the SCIM API.
To decrease API load, we also do not request the group members.
:return:
"""
logger.info("Listing available groups...")
workspace_groups = self._list_workspace_groups()
account_groups = self._list_account_groups()

logger.info(f"Found {len(workspace_groups)} workspace groups and {len(account_groups)} account groups")
return AvailableGroups(workspace=workspace_groups, account=account_groups)
Copy link
Copy Markdown
Collaborator

@nfx nfx Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _list_available_groups(self):
"""
Note - this method will provide all groups during initialization.
We apply this method to decrease the amount of calls to the SCIM API.
To decrease API load, we also do not request the group members.
:return:
"""
logger.info("Listing available groups...")
workspace_groups = self._list_workspace_groups()
account_groups = self._list_account_groups()
logger.info(f"Found {len(workspace_groups)} workspace groups and {len(account_groups)} account groups")
return AvailableGroups(workspace=workspace_groups, account=account_groups)

this method looks redundant, please assign properties directly

relevant_level_groups = (
self._available_groups.workspace if level == GroupLevel.WORKSPACE else self._available_groups.account
)
_group = next((g for g in relevant_level_groups if g.display_name == group_name), None)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's a bit unclear what next does and it may be less readable by people less familiar with python. can you replace it with a for loop with early return?

to be honest, self._get_account_group(name) and self._get_workspace_group(name) might be better readable, but you don't have to do it this PR

group_conf = GroupsConfig(selected=[""])

assert GroupManager(client, group_conf)._find_eligible_groups() == [users_group.display_name]
with patch.object(GroupManager, "_list_account_groups", return_value=[account_admins_group]):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a unit testing anti-pattern to mock out methods of the class under test. correct way is to mock it through client, which will cover the self._ws.api_client.do call. see client.groups.list.return_value few lines above

@nfx nfx changed the title Refactor SCIM API usages Prefetch all account-level and workspace-level groups Sep 13, 2023
Copy link
Copy Markdown
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Way more readable with the proper mocking :)

@nfx nfx added this pull request to the merge queue Sep 13, 2023
Merged via the queue into main with commit b654a3a Sep 13, 2023
@renardeinside renardeinside deleted the refactor/group-scim-api-usages branch September 13, 2023 20:30
william-conti pushed a commit that referenced this pull request Sep 15, 2023
Addresses #190 and other SCIM-related issues. 

Conceptually, the following happens:

1. Existing method to list the workspace groups is inconsistent, and the
root cause of these issues is somewhere on SCIM API side. We've raised a
ticket, but the resolution timing is unclear.
2. To avoid being time-dependent on SCIM API, we re-introduce the
listing methods, but this time we intentionally don't use `filter`-based
conditions and explicitly split the API methods into two.
3. To avoid reaching out to the SCIM API too frequently, we list the
whole API only once and then work with an in-memory data.

This PR also adds 100% coverage for `GroupManager` and adds integration
tests for groups to verify it works as expected.
@nfx nfx mentioned this pull request Sep 18, 2023
nfx added a commit that referenced this pull request Sep 18, 2023
# Version changelog

## 0.1.0

Features

* Added interactive installation wizard
([#184](#184),
[#117](#117)).
* Added schedule of jobs as part of `install.sh` flow and created some
documentation ([#187](#187)).
* Added debug notebook companion to troubleshoot the installation
([#191](#191)).
* Added support for Hive Metastore Table ACLs inventory from all
databases ([#78](#78),
[#122](#122),
[#151](#151)).
* Created `$inventory.tables` from Scala notebook
([#207](#207)).
* Added local group migration support for ML-related objects
([#56](#56)).
* Added local group migration support for SQL warehouses
([#57](#57)).
* Added local group migration support for all compute-related resources
([#53](#53)).
* Added local group migration support for security-related objects
([#58](#58)).
* Added local group migration support for workflows
([#54](#54)).
* Added local group migration support for workspace-level objects
([#59](#59)).
* Added local group migration support for dashboards, queries, and
alerts ([#144](#144)).

Stability

* Added `codecov.io` publishing
([#204](#204)).
* Added more tests to group.py
([#148](#148)).
* Added tests for group state
([#133](#133)).
* Added tests for inventorizer and typed
([#125](#125)).
* Added tests WorkspaceListing
([#110](#110)).
* Added `make_*_permissions` fixtures
([#159](#159)).
* Added reusable fixtures module
([#119](#119)).
* Added testing for permissions
([#126](#126)).
* Added inventory table manager tests
([#153](#153)).
* Added `product_info` to track as SDK integration
([#76](#76)).
* Added failsafe permission get operations
([#65](#65)).
* Always install the latest `pip` version in `./install.sh`
([#201](#201)).
* Always store inventory in `hive_metastore` and make only
`inventory_database` configurable
([#178](#178)).
* Changed default logging level from `TRACE` to `DEBUG` log level
([#124](#124)).
* Consistently use `WorkspaceClient` from `databricks.sdk`
([#120](#120)).
* Convert pipeline code to use fixtures.
([#166](#166)).
* Exclude mixins from coverage
([#130](#130)).
* Fixed codecov.io reporting
([#212](#212)).
* Fixed configuration path in job task install code
([#210](#210)).
* Fixed a bug with dependency definitions
([#70](#70)).
* Fixed failing `test_jobs`
([#140](#140)).
* Fixed the issues with experiment listing
([#64](#64)).
* Fixed integration testing configuration
([#77](#77)).
* Make project runnable on nightly testing infrastructure
([#75](#75)).
* Migrated cluster policies to new fixtures
([#174](#174)).
* Migrated clusters to the new fixture framework
([#162](#162)).
* Migrated instance pool to the new fixture framework
([#161](#161)).
* Migrated to `databricks.labs.ucx` package
([#90](#90)).
* Migrated token authorization to new fixtures
([#175](#175)).
* Migrated experiment fixture to standard one
([#168](#168)).
* Migrated jobs test to fixture based one.
([#167](#167)).
* Migrated model fixture to the standard fixtures
([#169](#169)).
* Migrated warehouse fixture to standard one
([#170](#170)).
* Organise modules by domain
([#197](#197)).
* Prefetch all account-level and workspace-level groups
([#192](#192)).
* Programmatically create a dashboard
([#121](#121)).
* Properly integrate Python `logging` facility
([#118](#118)).
* Refactored code to use Databricks SDK for Python
([#27](#27)).
* Refactored configuration and remove global provider state
([#71](#71)).
* Removed `pydantic` dependency
([#138](#138)).
* Removed redundant `pyspark`, `databricks-connect`, `delta-spark`, and
`pandas` dependencies
([#193](#193)).
* Removed redundant `typer[all]` dependency and its usages
([#194](#194)).
* Renamed `MigrationGroupsProvider` to `GroupMigrationState`
([#81](#81)).
* Replaced `ratelimit` and `tenacity` dependencies with simpler
implementations ([#195](#195)).
* Reorganised integration tests to align more with unit tests
([#206](#206)).
* Run `build` workflow also on `main` branch
([#211](#211)).
* Run integration test with a single group
([#152](#152)).
* Simplify `SqlBackend` and table creation logic
([#203](#203)).
* Updated `migration_config.yml`
([#179](#179)).
* Updated legal information
([#196](#196)).
* Use `make_secret_scope` fixture
([#163](#163)).
* Use fixture factory for `make_table`, `make_schema`, and
`make_catalog` ([#189](#189)).
* Use new fixtures for notebooks and folders
([#176](#176)).
* Validate toolkit notebook test
([#183](#183)).

Contributing

* Added a note on external dependencies
([#139](#139)).
* Added ability to run SQL queries on Spark when in Databricks Runtime
([#108](#108)).
* Added some ground rules for contributing
([#82](#82)).
* Added contributing instructions link from main readme
([#109](#109)).
* Added info about environment refreshes
([#155](#155)).
* Clarified documentation
([#137](#137)).
* Enabled merge queue
([#146](#146)).
* Improved `CONTRIBUTING.md` guide
([#135](#135),
[#145](#145)).
FastLee pushed a commit that referenced this pull request Sep 19, 2023
# Version changelog

## 0.1.0

Features

* Added interactive installation wizard
([#184](#184),
[#117](#117)).
* Added schedule of jobs as part of `install.sh` flow and created some
documentation ([#187](#187)).
* Added debug notebook companion to troubleshoot the installation
([#191](#191)).
* Added support for Hive Metastore Table ACLs inventory from all
databases ([#78](#78),
[#122](#122),
[#151](#151)).
* Created `$inventory.tables` from Scala notebook
([#207](#207)).
* Added local group migration support for ML-related objects
([#56](#56)).
* Added local group migration support for SQL warehouses
([#57](#57)).
* Added local group migration support for all compute-related resources
([#53](#53)).
* Added local group migration support for security-related objects
([#58](#58)).
* Added local group migration support for workflows
([#54](#54)).
* Added local group migration support for workspace-level objects
([#59](#59)).
* Added local group migration support for dashboards, queries, and
alerts ([#144](#144)).

Stability

* Added `codecov.io` publishing
([#204](#204)).
* Added more tests to group.py
([#148](#148)).
* Added tests for group state
([#133](#133)).
* Added tests for inventorizer and typed
([#125](#125)).
* Added tests WorkspaceListing
([#110](#110)).
* Added `make_*_permissions` fixtures
([#159](#159)).
* Added reusable fixtures module
([#119](#119)).
* Added testing for permissions
([#126](#126)).
* Added inventory table manager tests
([#153](#153)).
* Added `product_info` to track as SDK integration
([#76](#76)).
* Added failsafe permission get operations
([#65](#65)).
* Always install the latest `pip` version in `./install.sh`
([#201](#201)).
* Always store inventory in `hive_metastore` and make only
`inventory_database` configurable
([#178](#178)).
* Changed default logging level from `TRACE` to `DEBUG` log level
([#124](#124)).
* Consistently use `WorkspaceClient` from `databricks.sdk`
([#120](#120)).
* Convert pipeline code to use fixtures.
([#166](#166)).
* Exclude mixins from coverage
([#130](#130)).
* Fixed codecov.io reporting
([#212](#212)).
* Fixed configuration path in job task install code
([#210](#210)).
* Fixed a bug with dependency definitions
([#70](#70)).
* Fixed failing `test_jobs`
([#140](#140)).
* Fixed the issues with experiment listing
([#64](#64)).
* Fixed integration testing configuration
([#77](#77)).
* Make project runnable on nightly testing infrastructure
([#75](#75)).
* Migrated cluster policies to new fixtures
([#174](#174)).
* Migrated clusters to the new fixture framework
([#162](#162)).
* Migrated instance pool to the new fixture framework
([#161](#161)).
* Migrated to `databricks.labs.ucx` package
([#90](#90)).
* Migrated token authorization to new fixtures
([#175](#175)).
* Migrated experiment fixture to standard one
([#168](#168)).
* Migrated jobs test to fixture based one.
([#167](#167)).
* Migrated model fixture to the standard fixtures
([#169](#169)).
* Migrated warehouse fixture to standard one
([#170](#170)).
* Organise modules by domain
([#197](#197)).
* Prefetch all account-level and workspace-level groups
([#192](#192)).
* Programmatically create a dashboard
([#121](#121)).
* Properly integrate Python `logging` facility
([#118](#118)).
* Refactored code to use Databricks SDK for Python
([#27](#27)).
* Refactored configuration and remove global provider state
([#71](#71)).
* Removed `pydantic` dependency
([#138](#138)).
* Removed redundant `pyspark`, `databricks-connect`, `delta-spark`, and
`pandas` dependencies
([#193](#193)).
* Removed redundant `typer[all]` dependency and its usages
([#194](#194)).
* Renamed `MigrationGroupsProvider` to `GroupMigrationState`
([#81](#81)).
* Replaced `ratelimit` and `tenacity` dependencies with simpler
implementations ([#195](#195)).
* Reorganised integration tests to align more with unit tests
([#206](#206)).
* Run `build` workflow also on `main` branch
([#211](#211)).
* Run integration test with a single group
([#152](#152)).
* Simplify `SqlBackend` and table creation logic
([#203](#203)).
* Updated `migration_config.yml`
([#179](#179)).
* Updated legal information
([#196](#196)).
* Use `make_secret_scope` fixture
([#163](#163)).
* Use fixture factory for `make_table`, `make_schema`, and
`make_catalog` ([#189](#189)).
* Use new fixtures for notebooks and folders
([#176](#176)).
* Validate toolkit notebook test
([#183](#183)).

Contributing

* Added a note on external dependencies
([#139](#139)).
* Added ability to run SQL queries on Spark when in Databricks Runtime
([#108](#108)).
* Added some ground rules for contributing
([#82](#82)).
* Added contributing instructions link from main readme
([#109](#109)).
* Added info about environment refreshes
([#155](#155)).
* Clarified documentation
([#137](#137)).
* Enabled merge queue
([#146](#146)).
* Improved `CONTRIBUTING.md` guide
([#135](#135),
[#145](#145)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

group matching logic has changed due to changes on the API

2 participants