-
Notifications
You must be signed in to change notification settings - Fork 101
TableSizeCrawler explodes on an expired delta share; it also has gap in exception handling & logging #778
Copy link
Copy link
Closed
Description
The newly implemented estimate_table_size_for_migration will propagate errors and crash the entire task via uncatchable Py4JJavaError.
Logically, delta shares should be skipped when crawling for size estimates.
Advice: all table level calls:
- Avoid calls to APIs with uncatchable Py4JJavaError
- Caught and logged.
- Do not re-raise
- Log the schema_name.tablename and exception message.
- Logged under crawler exceptions
com.google.common.util.concurrent.UncheckedExecutionException: io.delta.sharing.client.util.UnexpectedHttpStatus: HTTP request failed with status: HTTP/1.1 401 Unauthorized {"errorCode":"CUSTOMER_UNAUTHORIZED","message":"Unauthorized"}. It may be caused by an expired token as it has expired at 2023-04-01T00:00:00.000Z
---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/databricks/labs/ucx/hive_metastore/table_size.py:71, in TableSizeCrawler._safe_get_table_size(self, table_full_name)
70 try:
---> 71 return self._spark._jsparkSession.table(table_full_name).queryExecution().analyzed().stats().sizeInBytes()
72 except Exception as e:
File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args)
1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
1323 answer, self.gateway_client, self.target_id, self.name)
1325 for temp_arg in temp_args:
File /databricks/spark/python/pyspark/errors/exceptions/captured.py:188, in capture_sql_exception.<locals>.deco(*a, **kw)
187 try:
--> 188 return f(*a, **kw)
189 except Py4JJavaError as e:
File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
325 if answer[1] == REFERENCE_TYPE:
--> 326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)
329 else:
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels