Extend error handling of delta issues in crawlers and hive metastore#795
Extend error handling of delta issues in crawlers and hive metastore#795
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #795 +/- ##
==========================================
- Coverage 82.91% 82.87% -0.05%
==========================================
Files 39 39
Lines 4571 4577 +6
Branches 850 853 +3
==========================================
+ Hits 3790 3793 +3
- Misses 579 581 +2
- Partials 202 203 +1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Please test forPy4JJavaError that gets raised per #778
Please ensure each log message has context (the schemaname.tablename in this case)
(I left an erroneous comment a few minutes ago that Py4JJavaError isn't under the Exception hierarchy, I was actually thinking of the Py4JSecurityException)
| @@ -70,10 +70,11 @@ def _safe_get_table_size(self, table_full_name: str) -> int | None: | |||
| try: | |||
| return self._spark._jsparkSession.table(table_full_name).queryExecution().analyzed().stats().sizeInBytes() | |||
| except Exception as e: | |||
There was a problem hiding this comment.
Corrected but shouldn't Exception catch all errors including Py4JJavaError?
| @@ -293,6 +307,18 @@ def test_execute(mocker): | |||
| with pytest.raises(NotFound): | |||
There was a problem hiding this comment.
Please test Py4JJavaError being thrown as this was called out in #778
Recommend adding a catch all error handling and test.
There was a problem hiding this comment.
Mocking Py4JJavaError is quite complex. I added a case for Exception as it should cover all possible errors.
| if "[TABLE_OR_VIEW_NOT_FOUND]" in str(nf) or "[DELTA_TABLE_NOT_FOUND]" in str(nf): | ||
| logger.error(f"Failed to apply skip marker for Table {schema}.{table}. Table not found.") | ||
| else: | ||
| logger.error(nf) |
There was a problem hiding this comment.
This error log needs context (the table name)
| if "[TABLE_OR_VIEW_NOT_FOUND]" in str(err) or "[DELTA_TABLE_NOT_FOUND]" in str(err): | ||
| logger.error(f"Could not find table {from_table_name}. Table not found.") | ||
| else: | ||
| logger.error(err) |
There was a problem hiding this comment.
This logger.error needs context (the table name)
| logger.warning(f"Delta table {table_full_name} is corrupted: missing transaction log.") | ||
| return None | ||
| raise RuntimeError(str(e)) from e | ||
| logger.error(e) |
There was a problem hiding this comment.
This logger.error needs context (the table name)
* Added `databricks labs ucx validate-groups-membership` command to validate groups to see if they have same membership across acount and workspace level ([#772](#772)). * Added baseline for getting Azure Resource Role Assignments ([#764](#764)). * Added issue and pull request templates ([#791](#791)). * Added linked issues to PR template ([#793](#793)). * Added optional `debug_truncate_bytes` parameter to the config and extend the default log truncation limit ([#782](#782)). * Added support for crawling grants and applying Hive Metastore UDF ACLs ([#812](#812)). * Changed Python requirement from 3.10.6 to 3.10 ([#805](#805)). * Extend error handling of delta issues in crawlers and hive metastore ([#795](#795)). * Fixed `databricks labs ucx repair-run` command to execute correctly ([#801](#801)). * Fixed handling of `DELTASHARING` table format ([#802](#802)). * Fixed listing of workflows via CLI ([#811](#811)). * Fixed logger import path for DEBUG notebook ([#792](#792)). * Fixed move table command to delete table/view regardless if permissions are present, skipping corrupted tables when crawling table size and making existing tests more stable ([#777](#777)). * Fixed the issue of `databricks labs ucx installations` and `databricks labs ucx manual-workspace-info` ([#814](#814)). * Increase the unit test coverage for cli.py ([#800](#800)). * Mount Point crawler lists /Volume with four variations which is confusing ([#779](#779)). * Updated README.md to remove mention of deprecated install.sh ([#781](#781)). * Updated `bug` issue template ([#797](#797)). * Fixed writing log readme in multiprocess safe way ([#794](#794)).
* Added `databricks labs ucx validate-groups-membership` command to validate groups to see if they have same membership across acount and workspace level ([#772](#772)). * Added baseline for getting Azure Resource Role Assignments ([#764](#764)). * Added issue and pull request templates ([#791](#791)). * Added linked issues to PR template ([#793](#793)). * Added optional `debug_truncate_bytes` parameter to the config and extend the default log truncation limit ([#782](#782)). * Added support for crawling grants and applying Hive Metastore UDF ACLs ([#812](#812)). * Changed Python requirement from 3.10.6 to 3.10 ([#805](#805)). * Extend error handling of delta issues in crawlers and hive metastore ([#795](#795)). * Fixed `databricks labs ucx repair-run` command to execute correctly ([#801](#801)). * Fixed handling of `DELTASHARING` table format ([#802](#802)). * Fixed listing of workflows via CLI ([#811](#811)). * Fixed logger import path for DEBUG notebook ([#792](#792)). * Fixed move table command to delete table/view regardless if permissions are present, skipping corrupted tables when crawling table size and making existing tests more stable ([#777](#777)). * Fixed the issue of `databricks labs ucx installations` and `databricks labs ucx manual-workspace-info` ([#814](#814)). * Increase the unit test coverage for cli.py ([#800](#800)). * Mount Point crawler lists /Volume with four variations which is confusing ([#779](#779)). * Updated README.md to remove mention of deprecated install.sh ([#781](#781)). * Updated `bug` issue template ([#797](#797)). * Fixed writing log readme in multiprocess safe way ([#794](#794)).
* Added `databricks labs ucx validate-groups-membership` command to validate groups to see if they have same membership across acount and workspace level ([#772](#772)). * Added baseline for getting Azure Resource Role Assignments ([#764](#764)). * Added issue and pull request templates ([#791](#791)). * Added linked issues to PR template ([#793](#793)). * Added optional `debug_truncate_bytes` parameter to the config and extend the default log truncation limit ([#782](#782)). * Added support for crawling grants and applying Hive Metastore UDF ACLs ([#812](#812)). * Changed Python requirement from 3.10.6 to 3.10 ([#805](#805)). * Extend error handling of delta issues in crawlers and hive metastore ([#795](#795)). * Fixed `databricks labs ucx repair-run` command to execute correctly ([#801](#801)). * Fixed handling of `DELTASHARING` table format ([#802](#802)). * Fixed listing of workflows via CLI ([#811](#811)). * Fixed logger import path for DEBUG notebook ([#792](#792)). * Fixed move table command to delete table/view regardless if permissions are present, skipping corrupted tables when crawling table size and making existing tests more stable ([#777](#777)). * Fixed the issue of `databricks labs ucx installations` and `databricks labs ucx manual-workspace-info` ([#814](#814)). * Increase the unit test coverage for cli.py ([#800](#800)). * Mount Point crawler lists /Volume with four variations which is confusing ([#779](#779)). * Updated README.md to remove mention of deprecated install.sh ([#781](#781)). * Updated `bug` issue template ([#797](#797)). * Fixed writing log readme in multiprocess safe way ([#794](#794)).

Changes
Extend error handling of delta issues in crawlers and hive metastore by catching:
DELTA_TABLE_NOT_FOUNDandDELTA_MISSING_TRANSACTION_LOGLinked issues
Resolves #778
Functionality
Tests