Skip to content

Timeout error when applying group permissions to Notebooks #702

@william-conti

Description

@william-conti

Sometimes we are not able to apply permissions back to notebooks due to Timeout errors :

19:05:19 DEBUG [databricks.sdk] {apply_account_group_permissions_0} PATCH /api/2.0/permissions/notebooks/4038693298171070
> {
>   "access_control_list": [
>     {
>       "group_name": "ucx_gd1EqS",
>       "permission_level": "CAN_RUN"
>     },
>     {
>       "group_name": "ucx_5NTfVg",
>       "permission_level": "CAN_EDIT"
>     },
>     "... (36 additional elements)"
>   ]
> }
< 500 Internal Server Error
< {
<   "error_code": "INTERNAL_ERROR",
<   "message": "java.io.IOException: java.util.concurrent.TimeoutException: Timed out after 5 seconds"
< }
19:05:19 ERROR [databricks.labs.ucx.framework.parallel] {apply_account_group_permissions_0} apply account group permissions task failed: java.io.IOException: java.util.concurrent.TimeoutException: Timed out after 5 seconds
Traceback (most recent call last):
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/framework/parallel.py", line 117, in inner
    return func(*args, **kwargs), None
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/mixins/hardening.py", line 57, in wrapper
    return func(*args, **kwargs)
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/workspace_access/generic.py", line 128, in _applier_task
    update_retried_check(object_type, object_id, acl)
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/sdk/retries.py", line 47, in wrapper
    raise err
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/sdk/retries.py", line 29, in wrapper
    return func(*args, **kwargs)
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/workspace_access/generic.py", line 213, in _safe_update_permissions
    return self._ws.permissions.update(object_type, object_id, access_control_list=acl)
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/sdk/service/iam.py", line 2486, in update
    res = self._api.do('PATCH',
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/sdk/core.py", line 1144, in do
    return retryable(self._perform)(method,
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/sdk/retries.py", line 47, in wrapper
    raise err
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/sdk/retries.py", line 29, in wrapper
    return func(*args, **kwargs)
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/sdk/core.py", line 1236, in _perform
    raise self._make_nicer_error(response=response, **payload) from None
databricks.sdk.errors.mapping.InternalError: java.io.IOException: java.util.concurrent.TimeoutException: Timed out after 5 seconds

The code should retry automatically when there's a Timeout (or any backend related error) at this phase.
Additionally, 5 seconds timeout is a little bit short , it should be increase to a higher value (30s - 1 min).

Metadata

Metadata

Assignees

Labels

migrate/groupsCorresponds to Migrate Groups Step of go/uc/upgradeplatform-issueissues (potentially) related to the Databricks platform and not this project

Type

No type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions