Create Airflow cluster on GKE along with Rodel van Rooijen's post on Medium without gcloud and kubectl commands by using Terraform and helm.
A few differences are made as follows:
- Airflow WebUI is disclosed over the Internet.
- No need kubectl apply Kubernetes Persistent Volume. Compute Disk will be created by Kubernetes Persistent Volume Claim. See Persistent volumes and dynamic provisioning in detail.
- Logs will only be available during the lifetime of the pod. If you keep logs, see Airflow Helm: Manage logs
- Compute Engine Disk is not used for managing DAG files. If you want to manage DAG files well, see Airflow Helm: Manage DAGs files
Prepare the following command line tools.
- gcloud
- docker
- kubectl
- helm
Please be aware of the following points.
- Terraform code will delete
~/.kubedirectory for overwriting GKE credential. Back up~/.kubedirectory if necessary. - GCP projects needs be created before terraform apply and all mandatory GCP APIs are enabled.
This is not production level code in terms of the followings:
- Default VPC network should be replaced with custom VPC for security.
- CloudSQL password should be handed through in more secure way such as Secret Manager, not terraform variable.
- Terraform state file should be on GCS, not in local environment.
- Managing DAG files with Git-sync is more handy.
- Logs will only be available during the lifetime of the pod.
- And, etc...
You might use this Airflow environment for developing DAGs.
- Two types of Airflow Helm chart exist. Official Helm chart and Community version. Use official Helm chart in this repo.
Deploying Airflow on GKE using Helm
Workload Identity in GKE with Terraform
Dynamic Provisioning and Storage Classes in Kubernetes
Persistent volumes and dynamic provisioning
Apache Airflow ETL in Google Cloud
Alternative: link Kubernetes ServiceAccounts to IAM
Deploying Airflow on Google Kubernetes Engine with Helm
Deploying Airflow on Google Kubernetes Engine with Helm — Part Two
Airflow Helm: Production Guide
This is just memo what I discovered during the development.
- Compute Engine Disk does not support ReadWriteMany of PVC.
- standards-rwo Storage Class does not create Compute Engine immediately after creating PVC.