Checking pipeline cluster config and cluster policy in 'crawl_pipelines' task#864
Checking pipeline cluster config and cluster policy in 'crawl_pipelines' task#864
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #864 +/- ##
=======================================
Coverage 86.55% 86.55%
=======================================
Files 41 41
Lines 5162 5171 +9
Branches 938 943 +5
=======================================
+ Hits 4468 4476 +8
+ Misses 481 480 -1
- Partials 213 215 +2 ☔ View full report in Codecov by Sentry. |
| pipeline_config = pipeline_response.spec.configuration | ||
| if pipeline_config: | ||
| failures.extend(self.check_spark_conf(pipeline_config, "pipeline")) | ||
| pipeline_cluster = pipeline_response.spec.clusters[0] |
There was a problem hiding this comment.
Can you iterate over clusters instead of picking up the first?
There was a problem hiding this comment.
Changed the code to iterate through the cluster.
| ] | ||
|
|
||
| ws = Mock() | ||
| ws = MagicMock() |
There was a problem hiding this comment.
Can you create_autospec workspace client instead?
There was a problem hiding this comment.
Using create_autospec now
nfx
left a comment
There was a problem hiding this comment.
refactor to workspace_client_mock
| "core.windows.net": "org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider", | ||
| "spark.hadoop.fs.azure.sas.fixed.token.abcde.dfs.core.windows.net": "{{secrets/abcde_access/sasFixedToken}}", | ||
| } | ||
| pipeline_cluster = [ |
There was a problem hiding this comment.
can you move out these long responses to a separate file, so that tests are more maintainable? see https://github.com/databrickslabs/ucx/blob/main/tests/unit/assessment/__init__.py
There was a problem hiding this comment.
separated out the long response outside.
| ] | ||
| ws.pipelines.get().spec.configuration = config_dict | ||
| ws.pipelines.get().spec.clusters = pipeline_cluster | ||
| ws.cluster_policies.get().definition = ( |
There was a problem hiding this comment.
refactor to start using workspace_client_mock - see https://github.com/databrickslabs/ucx/blame/main/tests/unit/assessment/test_clusters.py#L22-L58
There was a problem hiding this comment.
replaced it with workspace_client_mock
| ws = Mock() | ||
| ws = create_autospec(WorkspaceClient) | ||
| config_dict = {} | ||
| pipeline_cluster = [ |
There was a problem hiding this comment.
refactor to start using workspace_client_mock:
There was a problem hiding this comment.
replaced it with workspace_client_mock
| ] | ||
|
|
||
| ws = Mock() | ||
| ws = create_autospec(WorkspaceClient) |
There was a problem hiding this comment.
refactor to start using workspace_client_mock:
There was a problem hiding this comment.
replaced it with workspace_client_mock
| ) | ||
| ] | ||
|
|
||
| ws = create_autospec(WorkspaceClient) |
There was a problem hiding this comment.
refactor to start using workspace_client_mock:
There was a problem hiding this comment.
replaced it with workspace_client_mock
| ws.pipelines.get().spec.configuration = config_dict | ||
| ws.pipelines.get().spec.clusters = pipeline_cluster | ||
|
|
||
| crawler = PipelinesCrawler(ws, MockBackend(), "ucx")._assess_pipelines(sample_pipelines) |
There was a problem hiding this comment.
don't invoke private methods in unit tests! it's prohibited.
There was a problem hiding this comment.
Removed the private method from unit test
| ] | ||
| ws.pipelines.get().spec.configuration = config_dict | ||
| ws.pipelines.get().spec.clusters = pipeline_cluster | ||
| crawler = PipelinesCrawler(ws, MockBackend(), "ucx")._assess_pipelines(sample_pipelines) |
There was a problem hiding this comment.
don't invoke private methods in unit tests! it's prohibited.
There was a problem hiding this comment.
Removed the private method from unit test
| ws.workspace.export().content = "JXNoCmVjaG8gIj0=" | ||
| ws.dbfs.read().data = "JXNoCmVjaG8gIj0=" | ||
|
|
||
| crawler = PipelinesCrawler(ws, MockBackend(), "ucx")._assess_pipelines(sample_pipelines) |
There was a problem hiding this comment.
don't invoke private methods in unit tests! it's prohibited.
There was a problem hiding this comment.
Removed the private method from unit test
| } | ||
| ws.pipelines.get().spec.configuration = config_dict | ||
| ws.pipelines.get().spec.clusters = mock_pipeline_cluster | ||
| ws.cluster_policies.get(policy_id="single-user-with-spn").definition = ( |
There was a problem hiding this comment.
Look at "init.py" in the same folder and do it like there. Modify that file if needed
There was a problem hiding this comment.
Modified the code accordingly
tests/unit/assessment/conftest.py
Outdated
|
|
||
| @pytest.fixture(scope="function") | ||
| def mock_pipeline_cluster(): | ||
| yield [ |
There was a problem hiding this comment.
Modify workspace_client_mock to store json in a file for pipelines
There was a problem hiding this comment.
modified the file accordingly.
tests/unit/assessment/__init__.py
Outdated
| ws = create_autospec(WorkspaceClient) | ||
| ws.clusters.list.return_value = _load_list(ClusterDetails, f"../assessment/clusters/{clusters}") | ||
| ws.cluster_policies.get = _cluster_policy | ||
| ws.pipelines.get().spec.clusters = _load_list(PipelineCluster, f"clusters/{pipeline_cluster}") |
There was a problem hiding this comment.
this is silly and makes no sense at all. mock the entire response, not just a subset - ws.pipelines.get.return_value = _load_list(...
There was a problem hiding this comment.
Now capturing the entire PipelineResponse instead of only mocking the cluster part.
tests/unit/assessment/__init__.py
Outdated
|
|
||
|
|
||
| def workspace_client_mock(clusters="no-spark-conf.json"): | ||
| def workspace_client_mock(clusters="no-spark-conf.json", pipeline_cluster="pipeline_cluster.json"): |
There was a problem hiding this comment.
this makes no sense - you're not mocking the list response. in this case - argument is not required.
There was a problem hiding this comment.
Was are having this parameter to give an option to specify the pipeline cluster json. Removed this argument now.
nfx
left a comment
There was a problem hiding this comment.
do you read the surrounding code before making changes?...
tests/unit/assessment/__init__.py
Outdated
| ws = create_autospec(WorkspaceClient) | ||
| ws.clusters.list.return_value = _load_list(ClusterDetails, f"../assessment/clusters/{clusters}") | ||
| ws.cluster_policies.get = _cluster_policy | ||
| ws.pipelines.get.return_value = _load_list(GetPipelineResponse, "clusters/pipeline_cluster.json")[0] |
There was a problem hiding this comment.
@prajin-29 , do you read the surrounding code before changing it? 🤦
why ws.pipelines.get is different than ws.cluster_policies.get? it's illogical!
There was a problem hiding this comment.
updated ws.peipeline.get similar to ws.cluster_policies.get . With this I have included the pipeline spec configuration also inside the json so that we can access everything in one shot.
nfx
left a comment
There was a problem hiding this comment.
Test implementation is incorrect and confusing
tests/unit/assessment/__init__.py
Outdated
|
|
||
|
|
||
| def _pipeline_cluster(pipeline_id: str): | ||
| pipeline_response = _load_list(GetPipelineResponse, f"clusters/{pipeline_id}.json")[0] |
There was a problem hiding this comment.
Why is it load_list and not load fixture? Why do you put pipeline test fixtures to a clusters folder? This will confuse people after you
There was a problem hiding this comment.
@nfx now the code is rebased with the main and using the pipeline test fixture from the pipeline folder.
| "spark.hadoop.fs.azure.sas.fixed.token.abcde.dfs.core.windows.net": "{{secrets/abcde_access/sasFixedToken}}", | ||
| } | ||
| ws.pipelines.get().spec.configuration = config_dict | ||
| ws = workspace_client_mock(clusters="job-source-cluster.json") |
There was a problem hiding this comment.
@prajin-29 I've extended workspace_client_mock in #923 to read pipeline spec from a json file in under tests/unit/assessment/pipelines, so you can rebase your code off of that.
There was a problem hiding this comment.
Rebased the code to use the change
| ws.workspace.export().content = "JXNoCmVjaG8gIj0=" | ||
| ws.dbfs.read().data = "JXNoCmVjaG8gIj0=" |
There was a problem hiding this comment.
do we need these for pipeline tests?
There was a problem hiding this comment.
Yup I added this to cover the negative scenario for full coverage. But with the addition of the spec in json we can remove this. Removed this from test.
# Conflicts: # tests/unit/assessment/__init__.py
* Added CLI Command `databricks labs ucx save-uc-compatible-roles` ([#863](#863)). * Added dashboard widget with table count by storage and format ([#852](#852)). * Added verification of group permissions ([#841](#841)). * Checking pipeline cluster config and cluster policy in 'crawl_pipelines' task ([#864](#864)). * Created cluster policy (ucx-policy) to be used by all UCX compute. This may require customers to reinstall UCX. ([#853](#853)). * Skip scanning objects that were removed on platform side since the last scan time, so that integration tests are less flaky ([#922](#922)). * Updated assessment documentation ([#873](#873)). Dependency updates: * Updated databricks-sdk requirement from ~=0.18.0 to ~=0.19.0 ([#930](#930)).
* Added CLI Command `databricks labs ucx save-uc-compatible-roles` ([#863](#863)). * Added dashboard widget with table count by storage and format ([#852](#852)). * Added verification of group permissions ([#841](#841)). * Checking pipeline cluster config and cluster policy in 'crawl_pipelines' task ([#864](#864)). * Created cluster policy (ucx-policy) to be used by all UCX compute. This may require customers to reinstall UCX. ([#853](#853)). * Skip scanning objects that were removed on platform side since the last scan time, so that integration tests are less flaky ([#922](#922)). * Updated assessment documentation ([#873](#873)). Dependency updates: * Updated databricks-sdk requirement from ~=0.18.0 to ~=0.19.0 ([#930](#930)).
* Added CLI Command `databricks labs ucx save-uc-compatible-roles` ([#863](#863)). * Added dashboard widget with table count by storage and format ([#852](#852)). * Added verification of group permissions ([#841](#841)). * Checking pipeline cluster config and cluster policy in 'crawl_pipelines' task ([#864](#864)). * Created cluster policy (ucx-policy) to be used by all UCX compute. This may require customers to reinstall UCX. ([#853](#853)). * Skip scanning objects that were removed on platform side since the last scan time, so that integration tests are less flaky ([#922](#922)). * Updated assessment documentation ([#873](#873)). Dependency updates: * Updated databricks-sdk requirement from ~=0.18.0 to ~=0.19.0 ([#930](#930)).
Changes
Checking pipeline cluster config and cluster policy in Crawl Pipeline
Linked issues
closes #844
Resolves #844
Functionality
databricks labs ucx .........Tests