Skip to content

Improved automation for MANAGED table migration and continued building tables migration component#295

Merged
nfx merged 19 commits intomainfrom
add_sync_command
Sep 28, 2023
Merged

Improved automation for MANAGED table migration and continued building tables migration component#295
nfx merged 19 commits intomainfrom
add_sync_command

Conversation

@william-conti
Copy link
Copy Markdown
Contributor

Fixes #106

@codecov
Copy link
Copy Markdown

codecov bot commented Sep 25, 2023

Codecov Report

Merging #295 (ebb3c41) into main (b6fc0ab) will increase coverage by 0.19%.
The diff coverage is 90.69%.

❗ Current head ebb3c41 differs from pull request most recent head dd14a17. Consider uploading reports for the commit dd14a17 to get more accurate results

@@            Coverage Diff             @@
##             main     #295      +/-   ##
==========================================
+ Coverage   83.27%   83.47%   +0.19%     
==========================================
  Files          30       30              
  Lines        2146     2184      +38     
  Branches      366      370       +4     
==========================================
+ Hits         1787     1823      +36     
- Misses        279      281       +2     
  Partials       80       80              
Files Coverage Δ
src/databricks/labs/ucx/config.py 86.50% <100.00%> (+0.21%) ⬆️
src/databricks/labs/ucx/hive_metastore/tables.py 94.59% <90.24%> (-0.08%) ⬇️

Copy link
Copy Markdown
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create a new workflow in the runtime.py and schedule this as a task? Please also add the integration tests after running the jobs.

See tests/integration/test_installation.py

@william-conti
Copy link
Copy Markdown
Contributor Author

What do you mean by creating a task ? Does it need to create all the UC tables by fetching the tables in the assessment + executing all the SQL code returned by uc_create_sql ?

@nfx
Copy link
Copy Markdown
Collaborator

nfx commented Sep 25, 2023

@william-conti yes

@william-conti
Copy link
Copy Markdown
Contributor Author

Then the scope of this ticket is way larger because we need to define the catalog output for the tables / views that needs to be migrated. AFAIK we do not let customer choose where his tables are going to be stored ...

We can start by specifying a default catalog as the target, and then the customer can run ALTER TABLES to map these tables to his own data model that he will choose ...

@nfx
Copy link
Copy Markdown
Collaborator

nfx commented Sep 25, 2023

Hm. Can you at least add the integration test to verify that sync has completed successfully? :) other prs would add the task then

@william-conti
Copy link
Copy Markdown
Contributor Author

Won't be able to Integration SYNC with external tables as we need to have access to S3 to create external tables ...

@nfx nfx changed the title Add SYNC command Improved automation for MANAGED table migration and continued building tables migration component Sep 28, 2023
@nfx nfx enabled auto-merge September 28, 2023 17:07
@nfx nfx added this pull request to the merge queue Sep 28, 2023
Merged via the queue into main with commit 2e8a880 Sep 28, 2023
nfx added a commit that referenced this pull request Sep 29, 2023
# Version changelog

## 0.2.0

* Added retrieving for all account-level groups with matching names to workspace-level groups in case no explicit configuration ([#277](#277)).
* Added crawler for Azure Service principals used for direct storage access ([#305](#305)).
* Added more SQL queries to the assessment step dashboard ([#269](#269)).
* Added filtering out for job clusters in the clusters crawler ([#298](#298)).
* Added recording errors from `crawl_tables` step in `$inventory.table_failures` table and display counter on the dashboard ([#300](#300)).
* Added comprehensive introduction user manual ([#273](#273)).
* Added interactive tutorial for local group migration readme ([#291](#291)).
* Added tutorial links to the landing page of documentation ([#290](#290)).
* Added (internal) support for account-level configuration and multi-cloud workspace list ([#264](#264)).
* Improved order of tasks in the README notebook ([#286](#286)).
* Improved installation script to run in a Windows Git Bash terminal ([#282](#282)).
* Improved installation script by setting log level to uppercase by default ([#271](#271)).
* Improved installation finish messages within installer script ([#267](#267)).
* Improved automation for `MANAGED` table migration and continued building tables migration component ([#295](#295)).
* Fixed debug notebook code with refactored package structure ([#250](#250)) ([#265](#265)).
* Fixed replacement of custom configured database to replicate in the report for external locations ([#296](#296)).
* Removed redundant `notebooks` top-level folder ([#263](#263)).
* Split checking for test failures and linting errors into independent GitHub Actions checks ([#287](#287)).
* Verify query metadata for assessment dashboards during unit tests ([#294](#294)).
@nfx nfx mentioned this pull request Sep 29, 2023
nfx added a commit that referenced this pull request Sep 29, 2023
# Version changelog

## 0.2.0

* Added retrieving for all account-level groups with matching names to
workspace-level groups in case no explicit configuration
([#277](#277)).
* Added crawler for Azure Service principals used for direct storage
access ([#305](#305)).
* Added more SQL queries to the assessment step dashboard
([#269](#269)).
* Added filtering out for job clusters in the clusters crawler
([#298](#298)).
* Added recording errors from `crawl_tables` step in
`$inventory.table_failures` table and display counter on the dashboard
([#300](#300)).
* Added comprehensive introduction user manual
([#273](#273)).
* Added interactive tutorial for local group migration readme
([#291](#291)).
* Added tutorial links to the landing page of documentation
([#290](#290)).
* Added (internal) support for account-level configuration and
multi-cloud workspace list
([#264](#264)).
* Improved order of tasks in the README notebook
([#286](#286)).
* Improved installation script to run in a Windows Git Bash terminal
([#282](#282)).
* Improved installation script by setting log level to uppercase by
default ([#271](#271)).
* Improved installation finish messages within installer script
([#267](#267)).
* Improved automation for `MANAGED` table migration and continued
building tables migration component
([#295](#295)).
* Fixed debug notebook code with refactored package structure
([#250](#250))
([#265](#265)).
* Fixed replacement of custom configured database to replicate in the
report for external locations
([#296](#296)).
* Removed redundant `notebooks` top-level folder
([#263](#263)).
* Split checking for test failures and linting errors into independent
GitHub Actions checks
([#287](#287)).
* Verify query metadata for assessment dashboards during unit tests
([#294](#294)).
FastLee pushed a commit that referenced this pull request Sep 29, 2023
…ing tables migration component (#295)

Fixes #106

---------

Co-authored-by: Serge Smertin <[email protected]>
FastLee pushed a commit that referenced this pull request Sep 29, 2023
# Version changelog

## 0.2.0

* Added retrieving for all account-level groups with matching names to
workspace-level groups in case no explicit configuration
([#277](#277)).
* Added crawler for Azure Service principals used for direct storage
access ([#305](#305)).
* Added more SQL queries to the assessment step dashboard
([#269](#269)).
* Added filtering out for job clusters in the clusters crawler
([#298](#298)).
* Added recording errors from `crawl_tables` step in
`$inventory.table_failures` table and display counter on the dashboard
([#300](#300)).
* Added comprehensive introduction user manual
([#273](#273)).
* Added interactive tutorial for local group migration readme
([#291](#291)).
* Added tutorial links to the landing page of documentation
([#290](#290)).
* Added (internal) support for account-level configuration and
multi-cloud workspace list
([#264](#264)).
* Improved order of tasks in the README notebook
([#286](#286)).
* Improved installation script to run in a Windows Git Bash terminal
([#282](#282)).
* Improved installation script by setting log level to uppercase by
default ([#271](#271)).
* Improved installation finish messages within installer script
([#267](#267)).
* Improved automation for `MANAGED` table migration and continued
building tables migration component
([#295](#295)).
* Fixed debug notebook code with refactored package structure
([#250](#250))
([#265](#265)).
* Fixed replacement of custom configured database to replicate in the
report for external locations
([#296](#296)).
* Removed redundant `notebooks` top-level folder
([#263](#263)).
* Split checking for test failures and linting errors into independent
GitHub Actions checks
([#287](#287)).
* Verify query metadata for assessment dashboards during unit tests
([#294](#294)).
@nfx nfx deleted the add_sync_command branch October 2, 2023 17:20
@pohlposition
Copy link
Copy Markdown
Contributor

This issue is for "MANAGED" tables (Step 6 in go/uc/upgrade)

SYNC is only for EXTERNAL tables (Step 5 in go/uc/upgrade)

We should keep development and workflows separate for these

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use SYNC instead of CREATE TABLE LIKE for migration DDL

4 participants