Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #724 +/- ##
==========================================
+ Coverage 79.45% 79.58% +0.13%
==========================================
Files 42 42
Lines 4517 4546 +29
Branches 845 849 +4
==========================================
+ Hits 3589 3618 +29
Misses 710 710
Partials 218 218 ☔ View full report in Codecov by Sentry. |
nfx
left a comment
There was a problem hiding this comment.
- change the title of the pull request to be changelog-ready. See other commit messages on main branch for consistency examples
|
|
||
| def repair_run(self, workflow): | ||
| try: | ||
| job_id_list = [job_id for step, job_id in self._state.jobs.items() if workflow == step] |
There was a problem hiding this comment.
There's only one job id per step, use job_id = state.jobs.get(step), not the list comprehension
There was a problem hiding this comment.
changed accordingly
| if job_id_list: | ||
| job_id = job_id_list[0] | ||
| job_runs = list(self._ws.jobs.list_runs(job_id=job_id, limit=1)) | ||
| state = job_runs[0].state |
There was a problem hiding this comment.
Bug: will crash when there are no runs
There was a problem hiding this comment.
Changed the code. If there is no runs it will throw a warning message
| job_runs = list(self._ws.jobs.list_runs(job_id=job_id, limit=1)) | ||
| state = job_runs[0].state | ||
| if job_runs and state.result_state.value != "SUCCESS": | ||
| run_id = job_runs[0].run_id |
There was a problem hiding this comment.
Extract first run into a variable
There was a problem hiding this comment.
Changed the code now.
| if job_runs and state.result_state.value != "SUCCESS": | ||
| run_id = job_runs[0].run_id | ||
| self._ws.jobs.repair_run(run_id=run_id, rerun_all_failed_tasks=True) | ||
| logger.info("Exception") |
There was a problem hiding this comment.
Added for testing.Removed now from the code
| def repair_run(self, workflow): | ||
| try: | ||
| job_id_list = [job_id for step, job_id in self._state.jobs.items() if workflow == step] | ||
| if job_id_list: |
There was a problem hiding this comment.
Can you invert the logic a bit: has to be "if job id is not found, show a warning and return immediately". Don't delay that to an else statement. See contributing.md - I think i added explanation why
There was a problem hiding this comment.
inverted the logic.
|
Can you also add an integration test? |
| job_runs = list(self._ws.jobs.list_runs(job_id=job_id, limit=1)) | ||
| if not job_runs: | ||
| logger.warning(f"{workflow} job is not initialized yet. Can't trigger repair run now") | ||
| else: |
There was a problem hiding this comment.
You don't need "else" here, just de-dent the next code block
There was a problem hiding this comment.
Changed according
| @retried(on=[NotFound, InvalidParameterValue, OperationFailed], timeout=timedelta(minutes=10)) | ||
| def test_repair_run_assessment_job(ws, new_installation, caplog): | ||
| install = new_installation() | ||
| install.run_workflow("assessment") |
There was a problem hiding this comment.
It's easier to "break" the destroy database workflow, where you can inject failure by removing the database before starting the job
There was a problem hiding this comment.
used destroy database workflow to test the repair run in Integration test
|
Make pull request title changelog-friendly |
| if state.result_state.value != "SUCCESS": | ||
| run_id = latest_job_run.run_id | ||
| job_run_waiter = self._ws.jobs.repair_run(run_id=run_id, rerun_all_failed_tasks=True) | ||
| job_run_waiter.result() |
There was a problem hiding this comment.
This would timeout in 20 minutes. What you should do instead is construct a url of this new run and open it in browser and return from cli
There was a problem hiding this comment.
removed calling the result.Instead opening the run url in browser.
# Conflicts: # labs.yml # src/databricks/labs/ucx/cli.py
databricks labs ucx repair-run --step ... CLI command for repair run of any failed workflows, like assessment, migrate-groups etc.
|
@nfx All the comments has been implemented.Kindly review the same. |
* Added `databricks labs ucx repair-run --step ...` CLI command for repair run of any failed workflows, like `assessment`, `migrate-groups` etc. ([#724](#724)). * Added `databricks labs ucx revert-migrated-table` command ([#729](#729)). * Allow specifying a group list when group match options are used ([#725](#725)). * Fixed installation issue when upgrading from an older version of the tool and improve logs ([#740](#740)). * Renamed summary panel from Failure Summary to Assessment Summary ([#733](#733)). * Retry internal error when getting permissions and update legacy table ACL documentation ([#728](#728)). * Speedup installer execution ([#727](#727)).
* Added `databricks labs ucx repair-run --step ...` CLI command for repair run of any failed workflows, like `assessment`, `migrate-groups` etc. ([#724](#724)). * Added `databricks labs ucx revert-migrated-table` command ([#729](#729)). * Allow specifying a group list when group match options are used ([#725](#725)). * Fixed installation issue when upgrading from an older version of the tool and improve logs ([#740](#740)). * Renamed summary panel from Failure Summary to Assessment Summary ([#733](#733)). * Retry internal error when getting permissions and update legacy table ACL documentation ([#728](#728)). * Speedup installer execution ([#727](#727)).
…air run of any failed workflows, like `assessment`, `migrate-groups` etc. (#724)
* Added `databricks labs ucx repair-run --step ...` CLI command for repair run of any failed workflows, like `assessment`, `migrate-groups` etc. ([#724](#724)). * Added `databricks labs ucx revert-migrated-table` command ([#729](#729)). * Allow specifying a group list when group match options are used ([#725](#725)). * Fixed installation issue when upgrading from an older version of the tool and improve logs ([#740](#740)). * Renamed summary panel from Failure Summary to Assessment Summary ([#733](#733)). * Retry internal error when getting permissions and update legacy table ACL documentation ([#728](#728)). * Speedup installer execution ([#727](#727)).
…air run of any failed workflows, like `assessment`, `migrate-groups` etc. (#724)
* Added `databricks labs ucx repair-run --step ...` CLI command for repair run of any failed workflows, like `assessment`, `migrate-groups` etc. ([#724](#724)). * Added `databricks labs ucx revert-migrated-table` command ([#729](#729)). * Allow specifying a group list when group match options are used ([#725](#725)). * Fixed installation issue when upgrading from an older version of the tool and improve logs ([#740](#740)). * Renamed summary panel from Failure Summary to Assessment Summary ([#733](#733)). * Retry internal error when getting permissions and update legacy table ACL documentation ([#728](#728)). * Speedup installer execution ([#727](#727)).

This may close #658