Skip to content

BigQueryToGCSOperator does not wait for completion #29912

@benjyblack

Description

@benjyblack

Apache Airflow Provider(s)

google

Versions of Apache Airflow Providers

apache-airflow-providers-google==7.0.0

Apache Airflow version

2.3.2

Operating System

Debian GNU/Linux

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

What happened

Deferrable mode for BigQueryToGCSOperator #27683 changed the functionality of the BigQueryToGCSOperator so that it no longer waits for the completion of the operation. This is because the nowait=True parameter is now being set.

What you think should happen instead

This is unexpected behavior. Any downstream tasks of the BigQueryToGCSOperator that expect the CSVs to have been written by the time they are called may result in errors (and have done so in our own operations).

The property should at least be configurable.

How to reproduce

  1. Leverage the BigQueryToGcsOperator in your DAG.
  2. Have it write a large table to a CSV somewhere in GCS
  3. Notice that the task completes almost immediately but the CSVs may not exist in GCS until later.

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions