Skip to content

implement _safe_* calls for applier_task in redash, scim, generic#507

Closed
tnguyen-db wants to merge 15 commits intodatabrickslabs:mainfrom
tnguyen-db:feature/retry-redash-permissions
Closed

implement _safe_* calls for applier_task in redash, scim, generic#507
tnguyen-db wants to merge 15 commits intodatabrickslabs:mainfrom
tnguyen-db:feature/retry-redash-permissions

Conversation

@tnguyen-db
Copy link
Copy Markdown
Contributor

@tnguyen-db tnguyen-db commented Oct 26, 2023

@tnguyen-db tnguyen-db requested a review from a team October 26, 2023 07:11
@tnguyen-db tnguyen-db force-pushed the feature/retry-redash-permissions branch from 5c88480 to c2acce5 Compare October 26, 2023 07:14
@codecov
Copy link
Copy Markdown

codecov bot commented Oct 26, 2023

Codecov Report

Merging #507 (06d311d) into main (70aa0f5) will increase coverage by 0.51%.
Report is 4 commits behind head on main.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #507      +/-   ##
==========================================
+ Coverage   80.30%   80.81%   +0.51%     
==========================================
  Files          31       31              
  Lines        3219     3284      +65     
  Branches      617      621       +4     
==========================================
+ Hits         2585     2654      +69     
+ Misses        489      488       -1     
+ Partials      145      142       -3     
Files Coverage Δ
...rc/databricks/labs/ucx/workspace_access/generic.py 92.41% <100.00%> (+1.58%) ⬆️
src/databricks/labs/ucx/workspace_access/redash.py 87.82% <100.00%> (+2.56%) ⬆️
src/databricks/labs/ucx/workspace_access/scim.py 100.00% <100.00%> (+3.57%) ⬆️

... and 1 file with indirect coverage changes

Copy link
Copy Markdown
Contributor

@larsgeorge-db larsgeorge-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain the change to me. I do not understand how the loop and the "retries on errors" achieve the same. More inline...

raise RetryableError() from e

@retried(on=[RetryableError])
def _safe_update_permissions(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This handles errors now, but how does this ensure all permissions have been applied successfully?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no error is returned, can we assume the permissions have been applied?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently not? Please consult with @renardeinside, as there are seemingly cases where the function returns without error, but the permission is not available? I am just making the point that before it looped over presence of permissions, not that it runs without exceptions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, indeed, there is now _inflight_check to check if the permissions have been applied as expected in _applier_task for each type

Copy link
Copy Markdown
Contributor

@mwojtyczka mwojtyczka Oct 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, when the API returns 200 and no error the permission should be applied. If it's not then it should be reported as a platform bug. What are the scenarios we try to cover here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, the API can be inconsistent, I don't know if the problem has been raised already to the platform

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, we should fix that but this should also be reported to the platform.

@larsgeorge-db
Copy link
Copy Markdown
Contributor

Also, this adds a bunch of new code paths that the test now covers less. Please add more coverage in the tests.

Copy link
Copy Markdown
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR introduces regressions, please address all the comments

else:
raise RetryableError() from e

@retried(on=[RetryableError])
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No timeout configured. See how it's configurable for unit testing in other places

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which timeout value do you suggest?
I haven't found examples on how to configure it in the unit tests.
I was rather thinking of mocking @retried in the unit tests, wdyt?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tnguyen-db did you look at this one?

@rate_limited(max_requests=30)
def _rate_limited_put_acl(self, object_id: str, principal: str, permission: workspace.AclPermission):
self._ws.secrets.put_acl(object_id, principal, permission)
retry_on_value_error = retried(on=[ValueError], timeout=self._verify_timeout)
retried_check = retry_on_value_error(self._inflight_check)
retried_check(object_id, principal, permission)

@tnguyen-db
Copy link
Copy Markdown
Contributor Author

@larsgeorge-db
The main problem this PR was trying to solve was to introduce retries for the dash permissions.
self._ws.dbsql_permissions.set only had the 3 attempts in the loop which did not take into consideration the types of errors.
The solution attempts to have a better control on the retry mechanism by lowering them at the level of the API call.

I've then generalized the solution for generic and scim

Comment on lines +132 to +140
set_retry_on_value_error = retried(on=[RetryableError], timeout=self._verify_timeout)
set_retried_check = set_retry_on_value_error(self._safe_set_permissions)
set_retried_check(object_type, object_id, acl)

retry_on_value_error = retried(on=[ValueError], timeout=self._verify_timeout)
retried_check = retry_on_value_error(self._inflight_check)
return retried_check(object_id, object_id, acl)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nfx is it the kind of logic you expect?

Comment on lines +186 to +193
sup = RedashPermissionsSupport(ws=ws, listings=[], verify_timeout=timedelta(seconds=1))
with pytest.raises(TimeoutError) as e:
result = sup._applier_task(
sql.ObjectTypePlural.QUERIES,
"test",
[sql.AccessControl(group_name="group_1", permission_level=sql.PermissionLevel.CAN_RUN)],
)
assert "Timed out after" in str(e.value)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nfx same question as above but for the test

@tnguyen-db tnguyen-db force-pushed the feature/retry-redash-permissions branch from af4e3f1 to 80824c0 Compare October 26, 2023 13:48
@tnguyen-db tnguyen-db force-pushed the feature/retry-redash-permissions branch from 80824c0 to 7a5711e Compare October 26, 2023 13:50
Copy link
Copy Markdown
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@tnguyen-db tnguyen-db force-pushed the feature/retry-redash-permissions branch from 06d311d to 4924012 Compare October 30, 2023 09:54
@tnguyen-db tnguyen-db force-pushed the feature/retry-redash-permissions branch from 4924012 to 79f7134 Compare October 30, 2023 13:21
@tnguyen-db tnguyen-db temporarily deployed to account-admin October 30, 2023 13:22 — with GitHub Actions Inactive
nfx pushed a commit that referenced this pull request Oct 30, 2023
@nfx
Copy link
Copy Markdown
Collaborator

nfx commented Oct 30, 2023

Closing this as merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add retry to redash permissions

5 participants