Skip to content

Improved cleanup for workspace backup groups by adding more retries on errors#375

Merged
nfx merged 19 commits intomainfrom
fix/improve_retry_and_rate_limits
Oct 4, 2023
Merged

Improved cleanup for workspace backup groups by adding more retries on errors#375
nfx merged 19 commits intomainfrom
fix/improve_retry_and_rate_limits

Conversation

@mwojtyczka
Copy link
Copy Markdown
Contributor

@mwojtyczka mwojtyczka commented Oct 4, 2023

Fixed deletion of backup groups [issue #374].
Added rate limits and retries to group operations [issue #353].
Temp fix for issue #359
Added log messages for better visibility.
Added useful troubleshooting snippets to the docs.

@codecov
Copy link
Copy Markdown

codecov bot commented Oct 4, 2023

Codecov Report

Merging #375 (55e3277) into main (47123a9) will increase coverage by 0.20%.
The diff coverage is 92.68%.

@@            Coverage Diff             @@
##             main     #375      +/-   ##
==========================================
+ Coverage   83.73%   83.94%   +0.20%     
==========================================
  Files          30       30              
  Lines        2337     2360      +23     
  Branches      410      414       +4     
==========================================
+ Hits         1957     1981      +24     
+ Misses        293      291       -2     
- Partials       87       88       +1     
Files Coverage Δ
src/databricks/labs/ucx/runtime.py 55.55% <ø> (+0.76%) ⬆️
src/databricks/labs/ucx/workspace_access/scim.py 100.00% <100.00%> (ø)
...rc/databricks/labs/ucx/workspace_access/generic.py 97.36% <80.00%> (-1.73%) ⬇️
src/databricks/labs/ucx/workspace_access/groups.py 92.45% <93.33%> (+2.17%) ⬆️

Copy link
Copy Markdown
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also add an integration test for this?

if group.display_name == group_name:
return group

@retried(on=[IOError])
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't you retry on DatabrocksError instead? what exception message were you getting to add retries, btw? we'll probably be getting more concrete exceptions in SDK soon, so i'd like comments with messages :)

see databricks/databricks-sdk-py#376

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it would be better to use the specific DatabrocksError . Let me update this.

Copy link
Copy Markdown
Contributor Author

@mwojtyczka mwojtyczka Oct 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double-checked and indeed it was DatabrocksError sub class of IOError. Code was updated accordingly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed now

@nfx nfx changed the title Bugfix backup group deletion and improve stability Improved cleanup for workspace backup groups Oct 4, 2023
@nfx nfx added the migrate/groups Corresponds to Migrate Groups Step of go/uc/upgrade label Oct 4, 2023
@mwojtyczka
Copy link
Copy Markdown
Contributor Author

mwojtyczka commented Oct 4, 2023

can you also add an integration test for this?

Added clean-up to the test_installation.

@nfx nfx changed the title Improved cleanup for workspace backup groups Improved cleanup for workspace backup groups by adding more retries on errors Oct 4, 2023
@nfx nfx added this pull request to the merge queue Oct 4, 2023
Merged via the queue into main with commit b0e82a0 Oct 4, 2023
zpappa pushed a commit that referenced this pull request Oct 6, 2023
…n errors (#375)

Fixed deletion of backup groups [issue #374].
Added rate limits and retries to group operations [issue #353].
Temp fix for issue #359
Added log messages for better visibility.
Added useful troubleshooting snippets to the docs.
@nfx nfx mentioned this pull request Oct 12, 2023
nfx added a commit that referenced this pull request Oct 12, 2023
# Version changelog

## 0.4.0

* Added exception handling for secret scope not found.
([#418](#418)).
* Added a crawler for creating an inventory of Azure Service Principals
([#326](#326)).
* Added check if account group already exists during failure recovery
([#446](#446)).
* Added checking for index out of range.
([#429](#429)).
* Added hyperlink to UCX releases in the main readme
([#408](#408)).
* Added integration test to check backup groups get deleted
([#387](#387)).
* Added logging of errors during threadpool operations.
([#376](#376)).
* Added recovery mode for workspace-local groups from temporary groups
([#435](#435)).
* Added support for migrating Legacy Table ACLs from workspace-local to
account-level groups
([#412](#412)).
* Added detection for installations of unreleased versions
([#399](#399)).
* Decoupled `PermissionsManager` from `GroupMigrationToolkit`
([#407](#407)).
* Enabled debug logging for every job task run through a file, which is
accessible from both workspace UI and Databricks CLI
([#426](#426)).
* Ensured that table exists, even when crawlers produce zero records
([#373](#373)).
* Extended test suite for HMS->HMS TACL migration
([#439](#439)).
* Fixed handling of secret scope responses
([#431](#431)).
* Fixed `crawl_permissions` task to respect 'workspace_start_path'
config ([#444](#444)).
* Fixed broken logic in `parallel` module and applied hardened error
handling design for parallel code
([#405](#405)).
* Fixed codecov.io reporting
([#403](#403)).
* Fixed integration tests for crawlers
([#379](#379)).
* Improved README.py and logging messages
([#433](#433)).
* Improved cleanup for workspace backup groups by adding more retries on
errors ([#375](#375)).
* Improved dashboard queries to show unsupported storage types.
([#398](#398)).
* Improved documentation for readme notebook
([#257](#257)).
* Improved test coverage for installer
([#371](#371)).
* Introduced deterministic `env_or_skip` fixture for integration tests
([#396](#396)).
* Made HMS & UC fixtures return `CatalogInfo`, `SchemaInfo`, and
`TableInfo` ([#409](#409)).
* Merge `workspace_access.Crawler` and `workspace_access.Applier`
interfaces to `workspace_access.AclSupport`
([#436](#436)).
* Moved examples to docs
([#404](#404)).
* Properly isolated integration testing for workflows on an existing
shared cluster ([#414](#414)).
* Removed thread pool for any IAM Group removals and additions
([#394](#394)).
* Replace plus char with minus in version tag for GCP dev installation
of UCX ([#420](#420)).
* Run integration tests on shared clusters for a faster devloop
([#397](#397)).
* Show difference between serverless and PRO warehouses during
installation ([#385](#385)).
* Split `migrate-groups` workflow into three different stages for
reliability ([#442](#442)).
* Use groups instead of usernames in code owners file
([#389](#389)).
@nfx nfx deleted the fix/improve_retry_and_rate_limits branch October 17, 2023 22:44
FastLee pushed a commit that referenced this pull request Oct 25, 2023
# Version changelog

## 0.4.0

* Added exception handling for secret scope not found.
([#418](#418)).
* Added a crawler for creating an inventory of Azure Service Principals
([#326](#326)).
* Added check if account group already exists during failure recovery
([#446](#446)).
* Added checking for index out of range.
([#429](#429)).
* Added hyperlink to UCX releases in the main readme
([#408](#408)).
* Added integration test to check backup groups get deleted
([#387](#387)).
* Added logging of errors during threadpool operations.
([#376](#376)).
* Added recovery mode for workspace-local groups from temporary groups
([#435](#435)).
* Added support for migrating Legacy Table ACLs from workspace-local to
account-level groups
([#412](#412)).
* Added detection for installations of unreleased versions
([#399](#399)).
* Decoupled `PermissionsManager` from `GroupMigrationToolkit`
([#407](#407)).
* Enabled debug logging for every job task run through a file, which is
accessible from both workspace UI and Databricks CLI
([#426](#426)).
* Ensured that table exists, even when crawlers produce zero records
([#373](#373)).
* Extended test suite for HMS->HMS TACL migration
([#439](#439)).
* Fixed handling of secret scope responses
([#431](#431)).
* Fixed `crawl_permissions` task to respect 'workspace_start_path'
config ([#444](#444)).
* Fixed broken logic in `parallel` module and applied hardened error
handling design for parallel code
([#405](#405)).
* Fixed codecov.io reporting
([#403](#403)).
* Fixed integration tests for crawlers
([#379](#379)).
* Improved README.py and logging messages
([#433](#433)).
* Improved cleanup for workspace backup groups by adding more retries on
errors ([#375](#375)).
* Improved dashboard queries to show unsupported storage types.
([#398](#398)).
* Improved documentation for readme notebook
([#257](#257)).
* Improved test coverage for installer
([#371](#371)).
* Introduced deterministic `env_or_skip` fixture for integration tests
([#396](#396)).
* Made HMS & UC fixtures return `CatalogInfo`, `SchemaInfo`, and
`TableInfo` ([#409](#409)).
* Merge `workspace_access.Crawler` and `workspace_access.Applier`
interfaces to `workspace_access.AclSupport`
([#436](#436)).
* Moved examples to docs
([#404](#404)).
* Properly isolated integration testing for workflows on an existing
shared cluster ([#414](#414)).
* Removed thread pool for any IAM Group removals and additions
([#394](#394)).
* Replace plus char with minus in version tag for GCP dev installation
of UCX ([#420](#420)).
* Run integration tests on shared clusters for a faster devloop
([#397](#397)).
* Show difference between serverless and PRO warehouses during
installation ([#385](#385)).
* Split `migrate-groups` workflow into three different stages for
reliability ([#442](#442)).
* Use groups instead of usernames in code owners file
([#389](#389)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

migrate/groups Corresponds to Migrate Groups Step of go/uc/upgrade

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Delete backup groups does not work if run as separate step Tool is failing creating new groups

2 participants