Skip to content

Added assessment for the incompatible RunSubmit API usages#849

Merged
nfx merged 22 commits intomainfrom
feature/crawl-job-run
Mar 3, 2024
Merged

Added assessment for the incompatible RunSubmit API usages#849
nfx merged 22 commits intomainfrom
feature/crawl-job-run

Conversation

@FastLee
Copy link
Copy Markdown
Contributor

@FastLee FastLee commented Jan 26, 2024

This PR has the following changes:

  1. Introduction of a new crawler to identify potentially incompatible submit runs: This PR aims to address issue Assessment for RunSubmit API usages #266 by implementing a crawler that focuses on detecting submit runs with potential compatibility issues. It is important to note that not all submit runs will be analyzed, as they do not directly contribute to the migration process. The crawler serves to identify incompatible groups of submit runs by analyzing their tasks and clustering them based on unique hash values.
  2. Analysis of tasks within submit runs: Upon examining each submit run, the crawler performs an in-depth analysis of all tasks included in the run. This analysis is crucial for identifying the compatibility status of the submit run.
  3. Calculation of unique tasks hashes: For each task within a submit run, the crawler calculates a unique hash value. This is achieved by employing the _retrieve_hash_values_from_task function, which retrieves necessary details from the task for accurate hash calculation.
  4. Coalescing tasks hashes into submit run hashes: Once the unique task hashes have been determined, the crawler proceeds to combine them into a single hash for the submit run. This step facilitates the grouping of submit runs based on their hash values.
  5. Coalescing submit runs under the same hash into pseudo-jobs: After calculating the unique submit run hashes, the crawler then merges submit runs with the same hash value into a single pseudo-job. This consolidation assists in providing a clearer picture of the compatibility status of submit runs.
  6. Returning a list of pseudo-jobs along with assessment results: The crawler returns a list of pseudo-jobs with their respective assessment results. This allows users to quickly identify submit runs with potential compatibility issues and take appropriate action.

In summary, this PR introduces a new crawler to detect incompatible submit runs by analyzing tasks, calculating unique hashes, and consolidating submit runs under the same hash into pseudo-jobs. The crawler returns a list of pseudo-jobs along with their assessment results, which enables users to make informed decisions regarding the compatibility status of their submit runs.

@codecov
Copy link
Copy Markdown

codecov bot commented Jan 26, 2024

Codecov Report

Attention: Patch coverage is 88.54962% with 15 lines in your changes are missing coverage. Please review.

Project coverage is 88.10%. Comparing base (c849cdf) to head (7418bbf).

Files Patch % Lines
src/databricks/labs/ucx/assessment/jobs.py 88.98% 8 Missing and 5 partials ⚠️
src/databricks/labs/ucx/runtime.py 60.00% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             main     #849    +/-   ##
========================================
  Coverage   88.10%   88.10%            
========================================
  Files          45       45            
  Lines        5622     5743   +121     
  Branches     1017     1043    +26     
========================================
+ Hits         4953     5060   +107     
- Misses        450      458     +8     
- Partials      219      225     +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nfx
Copy link
Copy Markdown
Collaborator

nfx commented Jan 29, 2024

Duplicate of #395

@nfx nfx marked this as a duplicate of #395 Jan 29, 2024
@nfx nfx closed this Jan 29, 2024
@nfx nfx reopened this Jan 30, 2024
@renardeinside renardeinside changed the title Job Run Crawler Incompatible Submit runs crawler Feb 2, 2024
@renardeinside renardeinside marked this pull request as ready for review February 2, 2024 10:52
@renardeinside renardeinside requested review from a team and andrascsillag-db February 2, 2024 10:52
@renardeinside
Copy link
Copy Markdown
Contributor

@nfx and @FastLee please review the code and let me know if this part is fine, so I'll add tests and reflect the changes to dashboard code.

Copy link
Copy Markdown
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of duplicates, not acceptable shape.

Copy link
Copy Markdown
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation is incorrect and changes places it must not touch

@FastLee FastLee force-pushed the feature/crawl-job-run branch from a88f738 to 2c5ced4 Compare March 1, 2024 20:54
@FastLee FastLee marked this pull request as ready for review March 1, 2024 20:58
@FastLee FastLee requested a review from nfx March 1, 2024 20:58
"warehouse_id": "123",
"project_directory": "abc/def",
"commands": [
"Sit!",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😂🤪

Copy link
Copy Markdown
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@nfx nfx mentioned this pull request Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants