Skip to content

BigQueryInsertJobOperator fails when there are templated variables in default args #23129

@leahecole

Description

@leahecole

Apache Airflow Provider(s)

google

Versions of Apache Airflow Providers

apache-airflow-providers-google==6.8.0

Apache Airflow version

2.2.3

Operating System

n/a

Deployment

Composer

Deployment details

Hi! I'm using composer-2.0.6-airflow-2.2.3 - it's a Public IP environment without any configuration overrides. This is a super basic sandbox environment I use for testing.

What happened

I was experimenting with the BigQueryInsertJobOperator and had a failure when I tried to utilize Airflow variables within a Job configuration.
Error

google.api_core.exceptions.BadRequest: 400 POST https://bigquery.googleapis.com/bigquery/v2/projects/%7B%7Bvar.value.gcp_project%7D%7D/jobs?prettyPrint=false: Invalid project ID '{{var.value.gcp_project}}'. Project IDs must contain 6-63 lowercase letters, digits, or dashes. Some project IDs also include domain name separated by a colon. IDs must start with a letter and may not end with a dash.

DAG pseudocode
(I copy pasted the relevant bits of my DAG)

  • BQ_DESTINATION_TABLE_NAME and BQ_DESTINATION_DATASET_NAME are strings, not Airflow variables, so they're doing great. WEATHER_HOLIDAYS_JOIN_QUERY is a SQL query also defined as a string and as far as I can tell is also doing great. PROJECT_NAME is using a templated Airflow variable that is defined and is successfully being used in other operators in this and other DAGs.
PROJECT_NAME = '{{var.value.gcp_project}}'
 bq_join_holidays_weather_data = bigquery.BigQueryInsertJobOperator(
        task_id="bq_join_holidays_weather_data",
        configuration={
            "query": {
                "query": WEATHER_HOLIDAYS_JOIN_QUERY,
                "useLegacySql": False,
                "destinationTable": {
                        "projectId": PROJECT_NAME,
                        "datasetId": BQ_DESTINATION_DATASET_NAME,
                        "tableId": BQ_DESTINATION_TABLE_NAME
                    }
            }
        },
        location="US", 
    )

Some things I tried/researched
I experimented a little bit with adding "configuration.query.destination_table": "json" to this line but did not have success. Additionally, I checked out the DataprocSubmitJobOperator to see if I could find some clues because I know Dataproc configurations also often have many nested dictionaries and I'm like 90% certain I've templated values there. I had to timebox this though because I do have a workaround (just not using the Airflow variable) and I thought I'd open an issue to see if someone who is more familiar with the underlying template rendering might be able to more easily decipher what's happening

What you think should happen instead

I think that I should be allowed to use an Airflow variable here 😁

How to reproduce

Run a query job using the BigQueryInsertJobOperator that writes the query to a destination table with a fully qualified TableReference object and pass in the projectId parameter as a templated Airflow variable, also having project as a default arg pointing to a templated variable

Anything else

I am willing to submit a PR, but if someone else also wants to, they might be faster than I will, especially between now and the summit

Also, it's been awhile since I submitted an issue and this form is INCREDIBLE well done friends

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions