Skip to content
This repository was archived by the owner on May 22, 2025. It is now read-only.
This repository was archived by the owner on May 22, 2025. It is now read-only.

Improve the logs in case native transfers fallbacks to Pandas #1263

@magdagultekin

Description

@magdagultekin

Please describe the feature you'd like to see
When loading a CSV file from S3 to Snowflake a problem occurs, task fails but the table gets created and the data populated. However logs are not clear enough about it, please see below:

[2022-11-16, 11:07:46 EST] {base.py:517} WARNING - Loading files failed with Native Support. Falling back to Pandas-based load
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/astro/databases/base.py", line 508, in load_file_to_table_natively_with_fallback
    self.load_file_to_table_natively(
  File "/usr/local/lib/python3.9/site-packages/astro/databases/snowflake.py", line 623, in load_file_to_table_natively
    self.evaluate_results(rows)
  File "/usr/local/lib/python3.9/site-packages/astro/databases/snowflake.py", line 630, in evaluate_results
    raise DatabaseCustomError(rows)
astro.exceptions.DatabaseCustomError: [{'file': 's3://s3-dev-etldata-001/inbound/test/Out_CM_CU.csv', 'status': 'LOAD_FAILED', 'rows_parsed': 168598, 'rows_loaded': 0, 'error_limit': 168598, 'errors_seen': 168598, 'first_error': 'Numeric value \'"RECORDID"\' is not recognized', 'first_error_line': 1, 'first_error_character': 1, 'first_error_column_name': '"OUT_CM_CU"["RECORDID":1]'}]

The logs show 0 rows loaded but it's not true.

Refer to https://astronomer.slack.com/archives/C02B8SPT93K/p1668615614891269
Describe the solution you'd like

  1. I'd love to see the logs improved to reflect the action properly - meaning mention fallback to Pandas and show how many rows were populated in fact.

  2. The customer came back and said that he used enable_native_fallback=False and the data got loaded nevertheless. It's not an expected behaviour, right?
    He also mentioned that the documentation confuses him, particularly this part (it doesn't mention fallbacking to Python - and by the way, what do you mean by that?):

Additional context
Task that was used:

s3_to_snowflake = aql.load_file(
    task_id="s3_to_snowflake",
    input_file=File(path=f"s3://{S3_BUCKET_NAME}/{S3_FILE_NAME}", filetype=FileType.CSV),
    output_table=Table(
        conn_id=SNOWFLAKE_CONN_ID,
        metadata=Metadata(database=SNOWFLAKE_DATABASE, schema=SNOWFLAKE_SCHEMA),
        name=SNOWFLAKE_TABLE_NAME,
    ),
    if_exists="replace",
)

Acceptance Criteria

  • Test if enable_native_fallback=False works as expected.
  • All checks and tests in the CI should pass
  • Unit tests (90% code coverage or more, once available)
  • Integration tests (if the feature relates to a new database or external service)
  • Example DAG
  • Docstrings in reStructuredText for each of methods, classes, functions and module-level attributes (including Example DAG on how it should be used)
  • Exception handling in case of errors
  • Logging (are we exposing useful information to the user? e.g. source and destination)
  • Improve the documentation (README, Sphinx, and any other relevant)
  • How to use Guide for the feature (example)

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions