Skip to content

BigQueryInsertJobOperator is broken on any type of job except query #23826

@vaaalik

Description

@vaaalik

Apache Airflow Provider(s)

google

Versions of Apache Airflow Providers

apache-airflow-providers-google==7.0.0

Apache Airflow version

2.2.5

Operating System

MacOS 12.2.1

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

What happened

We are using BigQueryInsertJobOperator to load data from parquet files in Google Cloud Storage with this kind of configuration:

BigQueryInsertJobOperator(
        task_id="load_to_bq",
        configuration={
            "load": {
                "writeDisposition": "WRITE_APPEND",
                "createDisposition": "CREATE_IF_NEEDED",
                "destinationTable": destination_table,
                "sourceUris": source_files
                "sourceFormat": "PARQUET"
            }
        }

After upgrade to apache-airflow-providers-google==7.0.0 all load jobs are now broken. I believe that problem lies in this line:

table = job.to_api_repr()["configuration"]["query"]["destinationTable"]

So it's trying to get the destination table from query job config and makes it impossible to use any other type of job.

What you think should happen instead

No response

How to reproduce

Use BigQueryInsertJobOperator to submit any type of job except query

Anything else

Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/google/cloud/operators/bigquery.py", line 2170, in execute
    table = job.to_api_repr()["configuration"]["query"]["destinationTable"]
KeyError: 'query'

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions