Paginate all SCIM list requests in the SDK#440
Conversation
nfx
left a comment
There was a problem hiding this comment.
thank you for adding this. few nits
tests/integration/test_iam.py
Outdated
| ("/api/2.0/preview/scim/v2/Users", lambda w: w.users.list(count=1)), | ||
| ("/api/2.0/preview/scim/v2/Groups", lambda w: w.groups.list(count=1)), | ||
| ("/api/2.0/preview/scim/v2/ServicePrincipals", lambda w: w.service_principals.list(count=1)), |
There was a problem hiding this comment.
| ("/api/2.0/preview/scim/v2/Users", lambda w: w.users.list(count=1)), | |
| ("/api/2.0/preview/scim/v2/Groups", lambda w: w.groups.list(count=1)), | |
| ("/api/2.0/preview/scim/v2/ServicePrincipals", lambda w: w.service_principals.list(count=1)), | |
| ("/api/2.0/preview/scim/v2/Users", lambda w: w.users.list(count=10)), | |
| ("/api/2.0/preview/scim/v2/Groups", lambda w: w.groups.list(count=10)), | |
| ("/api/2.0/preview/scim/v2/ServicePrincipals", lambda w: w.service_principals.list(count=10)), |
i think count=1 would make integration tests too slow...
There was a problem hiding this comment.
I tried this, but I wanted to make sure that we actually exercised the pagination flows. This test took around 15 seconds for all test cases combined.
tests/integration/test_iam.py
Outdated
|
|
||
|
|
||
| @pytest.mark.parametrize("path,call", [ | ||
| # there are ~7k users in our aws prod account |
There was a problem hiding this comment.
| # there are ~7k users in our aws prod account |
i don't think it should be in public code
tests/integration/test_iam.py
Outdated
| # there are ~7k users in our aws prod account | ||
| ("/api/2.0/accounts/%s/scim/v2/Users", lambda a: a.users.list(count=1000)), | ||
| ("/api/2.0/accounts/%s/scim/v2/Groups", lambda a: a.groups.list(count=1)), | ||
| # there are ~3k service principals in our aws prod account |
There was a problem hiding this comment.
| # there are ~3k service principals in our aws prod account |
| # deduplicate items that may have been added during iteration | ||
| seen = set() | ||
| query['startIndex'] = 0 | ||
| query['startIndex'] = 1 |
There was a problem hiding this comment.
Is this because the first item starts from number 1 in query? https://docs.databricks.com/api/workspace/groups/list
* Introduce more specific exceptions, like `NotFound`, `AlreadyExists`, `BadRequest`, `PermissionDenied`, `InternalError`, and others ([#376](#376)). This makes it easier to handle errors thrown by the Databricks API. Instead of catching `DatabricksError` and checking the error_code field, you can catch one of these subtypes of `DatabricksError`, which is more ergonomic and removes the need to rethrow exceptions that you don't want to catch. For example: ```python try: return (self._ws .permissions .get(object_type, object_id)) except DatabricksError as e: if e.error_code in [ "RESOURCE_DOES_NOT_EXIST", "RESOURCE_NOT_FOUND", "PERMISSION_DENIED", "FEATURE_DISABLED", "BAD_REQUEST"]: logger.warning(...) return None raise RetryableError(...) from e ``` can be replaced with ```python try: return (self._ws .permissions .get(object_type, object_id)) except PermissionDenied, FeatureDisabled: logger.warning(...) return None except NotFound: raise RetryableError(...) ``` * Paginate all SCIM list requests in the SDK ([#440](#440)). This change ensures that SCIM list() APIs use a default limit of 100 resources, leveraging SCIM's offset + limit pagination to batch requests to the Databricks API. * Added taskValues support in remoteDbUtils ([#406](#406)). * Added more detailed error message on default credentials not found error ([#419](#419)). * Request management token via Azure CLI only for Service Principals and not human users ([#408](#408)). API Changes: * Fixed `create()` method for [w.functions](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/functions.html) workspace-level service and corresponding `databricks.sdk.service.catalog.CreateFunction` and `databricks.sdk.service.catalog.FunctionInfo` dataclasses. * Changed `create()` method for [w.metastores](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/metastores.html) workspace-level service with new required argument order. * Changed `storage_root` field for `databricks.sdk.service.catalog.CreateMetastore` to be optional. * Added `skip_validation` field for `databricks.sdk.service.catalog.UpdateExternalLocation`. * Added `libraries` field for `databricks.sdk.service.compute.CreatePolicy`, `databricks.sdk.service.compute.EditPolicy` and `databricks.sdk.service.compute.Policy`. * Added `init_scripts` field for `databricks.sdk.service.compute.EventDetails`. * Added `file` field for `databricks.sdk.service.compute.InitScriptInfo`. * Added `zone_id` field for `databricks.sdk.service.compute.InstancePoolGcpAttributes`. * Added several dataclasses related to init scripts. * Added `databricks.sdk.service.compute.LocalFileInfo` dataclass. * Replaced `ui_state` field with `edit_mode` for `databricks.sdk.service.jobs.CreateJob` and `databricks.sdk.service.jobs.JobSettings`. * Replaced `databricks.sdk.service.jobs.CreateJobUiState` dataclass with `databricks.sdk.service.jobs.CreateJobEditMode`. * Added `include_resolved_values` field for `databricks.sdk.service.jobs.GetRunRequest`. * Replaced `databricks.sdk.service.jobs.JobSettingsUiState` dataclass with `databricks.sdk.service.jobs.JobSettingsEditMode`. * Removed [a.o_auth_enrollment](https://databricks-sdk-py.readthedocs.io/en/latest/account/o_auth_enrollment.html) account-level service. This was only used to aid in OAuth enablement during the public preview of OAuth. OAuth is now enabled for all AWS E2 accounts, so usage of this API is no longer needed. * Added `network_connectivity_config_id` field for `databricks.sdk.service.provisioning.UpdateWorkspaceRequest`. * Added [a.network_connectivity](https://databricks-sdk-py.readthedocs.io/en/latest/account/network_connectivity.html) account-level service. * Added `string_shared_as` field for `databricks.sdk.service.sharing.SharedDataObject`. Internal changes: * Added regression question to issue template ([#414](#414)). * Made test_auth no longer fail if you have a default profile setup ([#426](#426)). OpenAPI SHA: d136ad0541f036372601bad9a4382db06c3c912d, Date: 2023-11-14
* Introduce more specific exceptions, like `NotFound`, `AlreadyExists`, `BadRequest`, `PermissionDenied`, `InternalError`, and others ([#376](#376)). This makes it easier to handle errors thrown by the Databricks API. Instead of catching `DatabricksError` and checking the error_code field, you can catch one of these subtypes of `DatabricksError`, which is more ergonomic and removes the need to rethrow exceptions that you don't want to catch. For example: ```python try: return (self._ws .permissions .get(object_type, object_id)) except DatabricksError as e: if e.error_code in [ "RESOURCE_DOES_NOT_EXIST", "RESOURCE_NOT_FOUND", "PERMISSION_DENIED", "FEATURE_DISABLED", "BAD_REQUEST"]: logger.warning(...) return None raise RetryableError(...) from e ``` can be replaced with ```python try: return (self._ws .permissions .get(object_type, object_id)) except PermissionDenied, FeatureDisabled: logger.warning(...) return None except NotFound: raise RetryableError(...) ``` * Paginate all SCIM list requests in the SDK ([#440](#440)). This change ensures that SCIM list() APIs use a default limit of 100 resources, leveraging SCIM's offset + limit pagination to batch requests to the Databricks API. * Added taskValues support in remoteDbUtils ([#406](#406)). * Added more detailed error message on default credentials not found error ([#419](#419)). * Request management token via Azure CLI only for Service Principals and not human users ([#408](#408)). API Changes: * Fixed `create()` method for [w.functions](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/functions.html) workspace-level service and corresponding `databricks.sdk.service.catalog.CreateFunction` and `databricks.sdk.service.catalog.FunctionInfo` dataclasses. * Changed `create()` method for [w.metastores](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/metastores.html) workspace-level service with new required argument order. * Changed `storage_root` field for `databricks.sdk.service.catalog.CreateMetastore` to be optional. * Added `skip_validation` field for `databricks.sdk.service.catalog.UpdateExternalLocation`. * Added `libraries` field for `databricks.sdk.service.compute.CreatePolicy`, `databricks.sdk.service.compute.EditPolicy` and `databricks.sdk.service.compute.Policy`. * Added `init_scripts` field for `databricks.sdk.service.compute.EventDetails`. * Added `file` field for `databricks.sdk.service.compute.InitScriptInfo`. * Added `zone_id` field for `databricks.sdk.service.compute.InstancePoolGcpAttributes`. * Added several dataclasses related to init scripts. * Added `databricks.sdk.service.compute.LocalFileInfo` dataclass. * Replaced `ui_state` field with `edit_mode` for `databricks.sdk.service.jobs.CreateJob` and `databricks.sdk.service.jobs.JobSettings`. * Replaced `databricks.sdk.service.jobs.CreateJobUiState` dataclass with `databricks.sdk.service.jobs.CreateJobEditMode`. * Added `include_resolved_values` field for `databricks.sdk.service.jobs.GetRunRequest`. * Replaced `databricks.sdk.service.jobs.JobSettingsUiState` dataclass with `databricks.sdk.service.jobs.JobSettingsEditMode`. * Removed [a.o_auth_enrollment](https://databricks-sdk-py.readthedocs.io/en/latest/account/o_auth_enrollment.html) account-level service. This was only used to aid in OAuth enablement during the public preview of OAuth. OAuth is now enabled for all AWS E2 accounts, so usage of this API is no longer needed. * Added `network_connectivity_config_id` field for `databricks.sdk.service.provisioning.UpdateWorkspaceRequest`. * Added [a.network_connectivity](https://databricks-sdk-py.readthedocs.io/en/latest/account/network_connectivity.html) account-level service. * Added `string_shared_as` field for `databricks.sdk.service.sharing.SharedDataObject`. Internal changes: * Added regression question to issue template ([#414](#414)). * Made test_auth no longer fail if you have a default profile setup ([#426](#426)). OpenAPI SHA: d136ad0541f036372601bad9a4382db06c3c912d, Date: 2023-11-14
|
Hi @mgyucht & @nfx, maybe I am missing something here but I'm no longer able to query users due to hardcoding of startIndex. I have some code that removes inactive users on workspaces, so it iterates over all users. Previously I was using startIndex and count to get 100 users at a time, then query the next 100 users. Now, when I try to query the next 100 users it still shows the first 100 due to startIndex being hardcoded. Is this change working as intended? |
Changes
This PR incorporates two hard-coded changes for the SCIM API in the Python SDK:
one_indexed(defaulting tofalse) andpagination_basis(defaulting toresourcebut can be overridden topage).default_limit, which is useful for any non-paginated APIs that later expose pagination options and allow the SDK to gracefully support those. However, we don't want to encourage folks to use this pattern: all new list APIs are required to be paginated from the start.Tests
make testrun locallymake fmtapplied