-
Notifications
You must be signed in to change notification settings - Fork 50
Improve the logs in case native transfers fallbacks to Pandas #1263
Description
Please describe the feature you'd like to see
When loading a CSV file from S3 to Snowflake a problem occurs, task fails but the table gets created and the data populated. However logs are not clear enough about it, please see below:
[2022-11-16, 11:07:46 EST] {base.py:517} WARNING - Loading files failed with Native Support. Falling back to Pandas-based load
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/astro/databases/base.py", line 508, in load_file_to_table_natively_with_fallback
self.load_file_to_table_natively(
File "/usr/local/lib/python3.9/site-packages/astro/databases/snowflake.py", line 623, in load_file_to_table_natively
self.evaluate_results(rows)
File "/usr/local/lib/python3.9/site-packages/astro/databases/snowflake.py", line 630, in evaluate_results
raise DatabaseCustomError(rows)
astro.exceptions.DatabaseCustomError: [{'file': 's3://s3-dev-etldata-001/inbound/test/Out_CM_CU.csv', 'status': 'LOAD_FAILED', 'rows_parsed': 168598, 'rows_loaded': 0, 'error_limit': 168598, 'errors_seen': 168598, 'first_error': 'Numeric value \'"RECORDID"\' is not recognized', 'first_error_line': 1, 'first_error_character': 1, 'first_error_column_name': '"OUT_CM_CU"["RECORDID":1]'}]
The logs show 0 rows loaded but it's not true.
Refer to https://astronomer.slack.com/archives/C02B8SPT93K/p1668615614891269
Describe the solution you'd like
-
I'd love to see the logs improved to reflect the action properly - meaning mention fallback to Pandas and show how many rows were populated in fact.
-
The customer came back and said that he used
enable_native_fallback=Falseand the data got loaded nevertheless. It's not an expected behaviour, right?
He also mentioned that the documentation confuses him, particularly this part (it doesn't mention fallbacking to Python - and by the way, what do you mean by that?):
Additional context
Task that was used:
s3_to_snowflake = aql.load_file(
task_id="s3_to_snowflake",
input_file=File(path=f"s3://{S3_BUCKET_NAME}/{S3_FILE_NAME}", filetype=FileType.CSV),
output_table=Table(
conn_id=SNOWFLAKE_CONN_ID,
metadata=Metadata(database=SNOWFLAKE_DATABASE, schema=SNOWFLAKE_SCHEMA),
name=SNOWFLAKE_TABLE_NAME,
),
if_exists="replace",
)Acceptance Criteria
- Test if
enable_native_fallback=Falseworks as expected. - All checks and tests in the CI should pass
- Unit tests (90% code coverage or more, once available)
- Integration tests (if the feature relates to a new database or external service)
- Example DAG
- Docstrings in reStructuredText for each of methods, classes, functions and module-level attributes (including Example DAG on how it should be used)
- Exception handling in case of errors
- Logging (are we exposing useful information to the user? e.g. source and destination)
- Improve the documentation (README, Sphinx, and any other relevant)
- How to use Guide for the feature (example)