-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Description
Apache Airflow version
2.3.0 (latest released)
What happened
The web UI is slow to load the default (grid) view for DAGs when there are mapped tasks with a high number of expansions.
I did some testing with DAGs that have a variable number of tasks, along with changing the webserver resources to see how this affects the load times.
Here is a graph showing that testing. Let me know if you have any other questions about this.
My findings based on what I'm seeing here:
The jump from 5->10 AUs makes a difference but 10 to 20 does not make a difference. There are diminishing returns when bumping up the webserver resources which leads me to believe that this could be a factor of database performance after the webserver is scaled to a certain point.
If we look at the graph on a log scale, it's almost perfectly linear for the 10 and 20AU lines on the plot. This leads me to believe that the time that it takes to load is a direct function of the number of task expansions that we have for a mapped task.
What you think should happen instead
Web UI loads in a reasonable amount of time, anything less than 10 seconds would be acceptable relatively speaking with the performance that we're getting now, ideally somewhere under 2-3 second I think would be best, if possible.
How to reproduce
from datetime import datetime
from airflow.models import DAG
from airflow.operators.empty import EmptyOperator
from airflow.operators.python import PythonOperator
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'email_on_failure': False,
'email_on_retry': False,
}
initial_scale = 7
max_scale = 12
scaling_factor = 2
for scale in range(initial_scale, max_scale + 1):
dag_id = f"dynamic_task_mapping_{scale}"
with DAG(
dag_id=dag_id,
default_args=default_args,
catchup=False,
schedule_interval=None,
start_date=datetime(1970, 1, 1),
render_template_as_native_obj=True,
) as dag:
start = EmptyOperator(task_id="start")
mapped = PythonOperator.partial(
task_id="mapped",
python_callable=lambda m: print(m),
).expand(
op_args=[[x] for x in list(range(2**scale))]
)
end = EmptyOperator(task_id="end")
start >> mapped >> end
globals()[dag_id] = dag
Operating System
Debian
Versions of Apache Airflow Providers
n/a
Deployment
Astronomer
Deployment details
No response
Anything else
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct
