Skip to content

Fixed databricks labs ucx repair-run command to execute correctly#801

Merged
nfx merged 11 commits intomainfrom
feature/fix_repai_run
Jan 19, 2024
Merged

Fixed databricks labs ucx repair-run command to execute correctly#801
nfx merged 11 commits intomainfrom
feature/fix_repai_run

Conversation

@prajin-29
Copy link
Copy Markdown
Contributor

Changes

Fixing the issue for repair run CLI databricks labs ucx repair-run . When a CLI tries to repair run a job before if updates its response json to either FAILED or SUCCESS it was failing with NoneType exception.

Added a check in repair_run inside install.py to check the status of the response and wait for 20 seconds to get it updated .

Enhanced the code to repair run already repaired job.

Linked issues

closes #787

Resolves #787

Functionality

  • modified the cli command databricks labs ucx repair-run which was failing in regression testing

Tests

  • This has been manually tested
  • added unit tests test_repair_run_result_state in test_install.py
  • This was tested using integration test
  • Screenshot 2024-01-17 at 11 08 42 AM

@prajin-29 prajin-29 requested review from a team and nsenno-dbr January 17, 2024 07:07
@codecov
Copy link
Copy Markdown

codecov bot commented Jan 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (af80620) 84.07% compared to head (add2404) 84.13%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #801      +/-   ##
==========================================
+ Coverage   84.07%   84.13%   +0.05%     
==========================================
  Files          39       39              
  Lines        4872     4890      +18     
  Branches      913      916       +3     
==========================================
+ Hits         4096     4114      +18     
  Misses        564      564              
  Partials      212      212              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Copy Markdown
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrite to retried decorator.


while not state.result_state and (time.time() - start_time < timeout):
logger.info("Waiting for the result_state to update the state")
time.sleep(10)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not unit testable, see how we use retried() decorator in workspace access package (dbsql permissions, secrets acls, etc).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nfx .Updated the code with retried logic.

latest_job_run = job_runs[0]
state = latest_job_run.state

while not state.result_state and (time.time() - start_time < timeout):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor this into private method and decode with retried

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nfx .Refactored the same with retried logic.

@prajin-29 prajin-29 requested a review from nfx January 18, 2024 10:08
Comment on lines +890 to +897
def _get_result_state(self, job_id):
job_runs = list(self._ws.jobs.list_runs(job_id=job_id, limit=1))
latest_job_run = job_runs[0]
if not latest_job_run.state.result_state:
logger.info("Waiting for the result_state to update the state")
time.sleep(10)
job_state = latest_job_run.state.result_state.value
return job_state
Copy link
Copy Markdown
Collaborator

@nfx nfx Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _get_result_state(self, job_id):
job_runs = list(self._ws.jobs.list_runs(job_id=job_id, limit=1))
latest_job_run = job_runs[0]
if not latest_job_run.state.result_state:
logger.info("Waiting for the result_state to update the state")
time.sleep(10)
job_state = latest_job_run.state.result_state.value
return job_state
def _get_result_state(self, job_id):
job_runs = list(self._ws.jobs.list_runs(job_id=job_id, limit=1))
if len(job_runs) == 0:
raise AttributeError("no job runs found")
latest_job_run = job_runs[0]
if not latest_job_run.state.result_state:
raise AttributeError("no result state in job run")
job_state = latest_job_run.state.result_state.value
return job_state

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have retried(on=[AttributeError], but don't throw it anywhere

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If latest_job_run.state is None then latest_job_run.state.result_state.value will throw AttributeError. But I have rewritten now to raise the exception.

For Job Runs during the initial stage itself we are exiting immediately if don't have any job run for the job_id with proper message.

# Conflicts:
#	src/databricks/labs/ucx/install.py
@nfx nfx changed the title Fixing the Issue for Repair Run databricks labs ucx repair-run Fixed databricks labs ucx repair-run command to execute correctly Jan 19, 2024
@nfx nfx merged commit b45fa41 into main Jan 19, 2024
@nfx nfx deleted the feature/fix_repai_run branch January 19, 2024 09:14
nfx added a commit that referenced this pull request Jan 19, 2024
* Added `databricks labs ucx validate-groups-membership` command to validate groups to see if they have same membership across acount and workspace level ([#772](#772)).
* Added baseline for getting Azure Resource Role Assignments ([#764](#764)).
* Added issue and pull request templates ([#791](#791)).
* Added linked issues to PR template ([#793](#793)).
* Added optional `debug_truncate_bytes` parameter to the config and extend the default log truncation limit ([#782](#782)).
* Added support for crawling grants and applying Hive Metastore UDF ACLs ([#812](#812)).
* Changed Python requirement from 3.10.6 to 3.10 ([#805](#805)).
* Extend error handling of delta issues in crawlers and hive metastore ([#795](#795)).
* Fixed `databricks labs ucx repair-run` command to execute correctly ([#801](#801)).
* Fixed handling of `DELTASHARING` table format ([#802](#802)).
* Fixed listing of workflows via CLI ([#811](#811)).
* Fixed logger import path for DEBUG notebook ([#792](#792)).
* Fixed move table command to delete table/view regardless if permissions are present, skipping corrupted tables when crawling table size and making existing tests more stable ([#777](#777)).
* Fixed the issue of `databricks labs ucx installations` and `databricks labs ucx manual-workspace-info` ([#814](#814)).
* Increase the unit test coverage for cli.py ([#800](#800)).
* Mount Point crawler lists /Volume with four variations which is confusing ([#779](#779)).
* Updated README.md to remove mention of deprecated install.sh ([#781](#781)).
* Updated `bug` issue template ([#797](#797)).
* Fixed writing log readme in multiprocess safe way ([#794](#794)).
@nfx nfx mentioned this pull request Jan 19, 2024
nfx added a commit that referenced this pull request Jan 19, 2024
* Added `databricks labs ucx validate-groups-membership` command to
validate groups to see if they have same membership across acount and
workspace level
([#772](#772)).
* Added baseline for getting Azure Resource Role Assignments
([#764](#764)).
* Added issue and pull request templates
([#791](#791)).
* Added linked issues to PR template
([#793](#793)).
* Added optional `debug_truncate_bytes` parameter to the config and
extend the default log truncation limit
([#782](#782)).
* Added support for crawling grants and applying Hive Metastore UDF ACLs
([#812](#812)).
* Changed Python requirement from 3.10.6 to 3.10
([#805](#805)).
* Extend error handling of delta issues in crawlers and hive metastore
([#795](#795)).
* Fixed `databricks labs ucx repair-run` command to execute correctly
([#801](#801)).
* Fixed handling of `DELTASHARING` table format
([#802](#802)).
* Fixed listing of workflows via CLI
([#811](#811)).
* Fixed logger import path for DEBUG notebook
([#792](#792)).
* Fixed move table command to delete table/view regardless if
permissions are present, skipping corrupted tables when crawling table
size and making existing tests more stable
([#777](#777)).
* Fixed the issue of `databricks labs ucx installations` and `databricks
labs ucx manual-workspace-info`
([#814](#814)).
* Increase the unit test coverage for cli.py
([#800](#800)).
* Mount Point crawler lists /Volume with four variations which is
confusing ([#779](#779)).
* Updated README.md to remove mention of deprecated install.sh
([#781](#781)).
* Updated `bug` issue template
([#797](#797)).
* Fixed writing log readme in multiprocess safe way
([#794](#794)).
dmoore247 pushed a commit that referenced this pull request Mar 23, 2024
* Added `databricks labs ucx validate-groups-membership` command to
validate groups to see if they have same membership across acount and
workspace level
([#772](#772)).
* Added baseline for getting Azure Resource Role Assignments
([#764](#764)).
* Added issue and pull request templates
([#791](#791)).
* Added linked issues to PR template
([#793](#793)).
* Added optional `debug_truncate_bytes` parameter to the config and
extend the default log truncation limit
([#782](#782)).
* Added support for crawling grants and applying Hive Metastore UDF ACLs
([#812](#812)).
* Changed Python requirement from 3.10.6 to 3.10
([#805](#805)).
* Extend error handling of delta issues in crawlers and hive metastore
([#795](#795)).
* Fixed `databricks labs ucx repair-run` command to execute correctly
([#801](#801)).
* Fixed handling of `DELTASHARING` table format
([#802](#802)).
* Fixed listing of workflows via CLI
([#811](#811)).
* Fixed logger import path for DEBUG notebook
([#792](#792)).
* Fixed move table command to delete table/view regardless if
permissions are present, skipping corrupted tables when crawling table
size and making existing tests more stable
([#777](#777)).
* Fixed the issue of `databricks labs ucx installations` and `databricks
labs ucx manual-workspace-info`
([#814](#814)).
* Increase the unit test coverage for cli.py
([#800](#800)).
* Mount Point crawler lists /Volume with four variations which is
confusing ([#779](#779)).
* Updated README.md to remove mention of deprecated install.sh
([#781](#781)).
* Updated `bug` issue template
([#797](#797)).
* Fixed writing log readme in multiprocess safe way
([#794](#794)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cannot repair failed assessment job with databricks labs ucx repair-run

2 participants