-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Description
Apache Airflow Provider(s)
Versions of Apache Airflow Providers
apache-airflow-providers-google==6.8.0
Apache Airflow version
2.2.3
Operating System
n/a
Deployment
Composer
Deployment details
Hi! I'm using composer-2.0.6-airflow-2.2.3 - it's a Public IP environment without any configuration overrides. This is a super basic sandbox environment I use for testing.
What happened
I was experimenting with the BigQueryInsertJobOperator and had a failure when I tried to utilize Airflow variables within a Job configuration.
Error
google.api_core.exceptions.BadRequest: 400 POST https://bigquery.googleapis.com/bigquery/v2/projects/%7B%7Bvar.value.gcp_project%7D%7D/jobs?prettyPrint=false: Invalid project ID '{{var.value.gcp_project}}'. Project IDs must contain 6-63 lowercase letters, digits, or dashes. Some project IDs also include domain name separated by a colon. IDs must start with a letter and may not end with a dash.
DAG pseudocode
(I copy pasted the relevant bits of my DAG)
BQ_DESTINATION_TABLE_NAMEandBQ_DESTINATION_DATASET_NAMEare strings, not Airflow variables, so they're doing great.WEATHER_HOLIDAYS_JOIN_QUERYis a SQL query also defined as a string and as far as I can tell is also doing great.PROJECT_NAMEis using a templated Airflow variable that is defined and is successfully being used in other operators in this and other DAGs.
PROJECT_NAME = '{{var.value.gcp_project}}'
bq_join_holidays_weather_data = bigquery.BigQueryInsertJobOperator(
task_id="bq_join_holidays_weather_data",
configuration={
"query": {
"query": WEATHER_HOLIDAYS_JOIN_QUERY,
"useLegacySql": False,
"destinationTable": {
"projectId": PROJECT_NAME,
"datasetId": BQ_DESTINATION_DATASET_NAME,
"tableId": BQ_DESTINATION_TABLE_NAME
}
}
},
location="US",
)Some things I tried/researched
I experimented a little bit with adding "configuration.query.destination_table": "json" to this line but did not have success. Additionally, I checked out the DataprocSubmitJobOperator to see if I could find some clues because I know Dataproc configurations also often have many nested dictionaries and I'm like 90% certain I've templated values there. I had to timebox this though because I do have a workaround (just not using the Airflow variable) and I thought I'd open an issue to see if someone who is more familiar with the underlying template rendering might be able to more easily decipher what's happening
What you think should happen instead
I think that I should be allowed to use an Airflow variable here 😁
How to reproduce
Run a query job using the BigQueryInsertJobOperator that writes the query to a destination table with a fully qualified TableReference object and pass in the projectId parameter as a templated Airflow variable, also having project as a default arg pointing to a templated variable
Anything else
I am willing to submit a PR, but if someone else also wants to, they might be faster than I will, especially between now and the summit
Also, it's been awhile since I submitted an issue and this form is INCREDIBLE well done friends
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct