Skip to content

Extend error handling of delta issues in crawlers and hive metastore#795

Merged
nfx merged 8 commits intomainfrom
fix/issue_778
Jan 17, 2024
Merged

Extend error handling of delta issues in crawlers and hive metastore#795
nfx merged 8 commits intomainfrom
fix/issue_778

Conversation

@mwojtyczka
Copy link
Copy Markdown
Contributor

@mwojtyczka mwojtyczka commented Jan 16, 2024

Changes

Extend error handling of delta issues in crawlers and hive metastore by catching: DELTA_TABLE_NOT_FOUND and DELTA_MISSING_TRANSACTION_LOG

Linked issues

Resolves #778

Functionality

  • extend error handling

Tests

  • added unit tests

@mwojtyczka mwojtyczka requested review from a team and priyal-c January 16, 2024 13:37
@codecov
Copy link
Copy Markdown

codecov bot commented Jan 16, 2024

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (7b4911a) 82.91% compared to head (bb0f75f) 82.87%.

Files Patch % Lines
src/databricks/labs/ucx/hive_metastore/mapping.py 0.00% 2 Missing and 1 partial ⚠️
...c/databricks/labs/ucx/hive_metastore/table_size.py 50.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #795      +/-   ##
==========================================
- Coverage   82.91%   82.87%   -0.05%     
==========================================
  Files          39       39              
  Lines        4571     4577       +6     
  Branches      850      853       +3     
==========================================
+ Hits         3790     3793       +3     
- Misses        579      581       +2     
- Partials      202      203       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mwojtyczka mwojtyczka changed the title Improve error handling of crawlers to handle delta tables issues Improve error handling of crawlers and hive metastore operations to handle delta tables issues Jan 16, 2024
@mwojtyczka mwojtyczka changed the title Improve error handling of crawlers and hive metastore operations to handle delta tables issues Extend error handling of delta issues in crawlers and hive metastore Jan 16, 2024
Copy link
Copy Markdown
Contributor

@dmoore247 dmoore247 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please test forPy4JJavaError that gets raised per #778
Please ensure each log message has context (the schemaname.tablename in this case)

(I left an erroneous comment a few minutes ago that Py4JJavaError isn't under the Exception hierarchy, I was actually thinking of the Py4JSecurityException)

@@ -70,10 +70,11 @@ def _safe_get_table_size(self, table_full_name: str) -> int | None:
try:
return self._spark._jsparkSession.table(table_full_name).queryExecution().analyzed().stats().sizeInBytes()
except Exception as e:
Copy link
Copy Markdown
Contributor

@dmoore247 dmoore247 Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code above will catch the Py4JJavaError reported in the issue.

image
(My earlier comment had in mind the Py4JSecurityException)
To catch that exception, except: will do, as in:

try:
   self._spark._jsparkSession.table....
except:
    logger.error(....{tablename}...)

Copy link
Copy Markdown
Contributor Author

@mwojtyczka mwojtyczka Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -293,6 +307,18 @@ def test_execute(mocker):
with pytest.raises(NotFound):
Copy link
Copy Markdown
Contributor

@dmoore247 dmoore247 Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please test Py4JJavaError being thrown as this was called out in #778

Recommend adding a catch all error handling and test.

Copy link
Copy Markdown
Contributor Author

@mwojtyczka mwojtyczka Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mocking Py4JJavaError is quite complex. I added a case for Exception as it should cover all possible errors.

if "[TABLE_OR_VIEW_NOT_FOUND]" in str(nf) or "[DELTA_TABLE_NOT_FOUND]" in str(nf):
logger.error(f"Failed to apply skip marker for Table {schema}.{table}. Table not found.")
else:
logger.error(nf)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error log needs context (the table name)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

if "[TABLE_OR_VIEW_NOT_FOUND]" in str(err) or "[DELTA_TABLE_NOT_FOUND]" in str(err):
logger.error(f"Could not find table {from_table_name}. Table not found.")
else:
logger.error(err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logger.error needs context (the table name)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

logger.warning(f"Delta table {table_full_name} is corrupted: missing transaction log.")
return None
raise RuntimeError(str(e)) from e
logger.error(e)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logger.error needs context (the table name)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Copy link
Copy Markdown
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@nfx nfx merged commit b249e86 into main Jan 17, 2024
@nfx nfx deleted the fix/issue_778 branch January 17, 2024 02:18
nfx added a commit that referenced this pull request Jan 19, 2024
* Added `databricks labs ucx validate-groups-membership` command to validate groups to see if they have same membership across acount and workspace level ([#772](#772)).
* Added baseline for getting Azure Resource Role Assignments ([#764](#764)).
* Added issue and pull request templates ([#791](#791)).
* Added linked issues to PR template ([#793](#793)).
* Added optional `debug_truncate_bytes` parameter to the config and extend the default log truncation limit ([#782](#782)).
* Added support for crawling grants and applying Hive Metastore UDF ACLs ([#812](#812)).
* Changed Python requirement from 3.10.6 to 3.10 ([#805](#805)).
* Extend error handling of delta issues in crawlers and hive metastore ([#795](#795)).
* Fixed `databricks labs ucx repair-run` command to execute correctly ([#801](#801)).
* Fixed handling of `DELTASHARING` table format ([#802](#802)).
* Fixed listing of workflows via CLI ([#811](#811)).
* Fixed logger import path for DEBUG notebook ([#792](#792)).
* Fixed move table command to delete table/view regardless if permissions are present, skipping corrupted tables when crawling table size and making existing tests more stable ([#777](#777)).
* Fixed the issue of `databricks labs ucx installations` and `databricks labs ucx manual-workspace-info` ([#814](#814)).
* Increase the unit test coverage for cli.py ([#800](#800)).
* Mount Point crawler lists /Volume with four variations which is confusing ([#779](#779)).
* Updated README.md to remove mention of deprecated install.sh ([#781](#781)).
* Updated `bug` issue template ([#797](#797)).
* Fixed writing log readme in multiprocess safe way ([#794](#794)).
@nfx nfx mentioned this pull request Jan 19, 2024
nfx added a commit that referenced this pull request Jan 19, 2024
* Added `databricks labs ucx validate-groups-membership` command to
validate groups to see if they have same membership across acount and
workspace level
([#772](#772)).
* Added baseline for getting Azure Resource Role Assignments
([#764](#764)).
* Added issue and pull request templates
([#791](#791)).
* Added linked issues to PR template
([#793](#793)).
* Added optional `debug_truncate_bytes` parameter to the config and
extend the default log truncation limit
([#782](#782)).
* Added support for crawling grants and applying Hive Metastore UDF ACLs
([#812](#812)).
* Changed Python requirement from 3.10.6 to 3.10
([#805](#805)).
* Extend error handling of delta issues in crawlers and hive metastore
([#795](#795)).
* Fixed `databricks labs ucx repair-run` command to execute correctly
([#801](#801)).
* Fixed handling of `DELTASHARING` table format
([#802](#802)).
* Fixed listing of workflows via CLI
([#811](#811)).
* Fixed logger import path for DEBUG notebook
([#792](#792)).
* Fixed move table command to delete table/view regardless if
permissions are present, skipping corrupted tables when crawling table
size and making existing tests more stable
([#777](#777)).
* Fixed the issue of `databricks labs ucx installations` and `databricks
labs ucx manual-workspace-info`
([#814](#814)).
* Increase the unit test coverage for cli.py
([#800](#800)).
* Mount Point crawler lists /Volume with four variations which is
confusing ([#779](#779)).
* Updated README.md to remove mention of deprecated install.sh
([#781](#781)).
* Updated `bug` issue template
([#797](#797)).
* Fixed writing log readme in multiprocess safe way
([#794](#794)).
dmoore247 pushed a commit that referenced this pull request Mar 23, 2024
* Added `databricks labs ucx validate-groups-membership` command to
validate groups to see if they have same membership across acount and
workspace level
([#772](#772)).
* Added baseline for getting Azure Resource Role Assignments
([#764](#764)).
* Added issue and pull request templates
([#791](#791)).
* Added linked issues to PR template
([#793](#793)).
* Added optional `debug_truncate_bytes` parameter to the config and
extend the default log truncation limit
([#782](#782)).
* Added support for crawling grants and applying Hive Metastore UDF ACLs
([#812](#812)).
* Changed Python requirement from 3.10.6 to 3.10
([#805](#805)).
* Extend error handling of delta issues in crawlers and hive metastore
([#795](#795)).
* Fixed `databricks labs ucx repair-run` command to execute correctly
([#801](#801)).
* Fixed handling of `DELTASHARING` table format
([#802](#802)).
* Fixed listing of workflows via CLI
([#811](#811)).
* Fixed logger import path for DEBUG notebook
([#792](#792)).
* Fixed move table command to delete table/view regardless if
permissions are present, skipping corrupted tables when crawling table
size and making existing tests more stable
([#777](#777)).
* Fixed the issue of `databricks labs ucx installations` and `databricks
labs ucx manual-workspace-info`
([#814](#814)).
* Increase the unit test coverage for cli.py
([#800](#800)).
* Mount Point crawler lists /Volume with four variations which is
confusing ([#779](#779)).
* Updated README.md to remove mention of deprecated install.sh
([#781](#781)).
* Updated `bug` issue template
([#797](#797)).
* Fixed writing log readme in multiprocess safe way
([#794](#794)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TableSizeCrawler explodes on an expired delta share; it also has gap in exception handling & logging

3 participants