Skip to content

get_tree_view can consume extreme amounts of memory. #41505

@mobuchowski

Description

@mobuchowski

Apache Airflow version

2.10.0rc1

If "Other Airflow 2 version" selected, which one?

No response

What happened?

get_tree_view in degenerated case can take a lot of memory.

For a DAG

    with DAG("aaa_big_get_tree_view", schedule=None) as dag:
        first_set = [LongEmptyOperator(task_id=f"hello_{i}_{'a' * 230}") for i in range(900)]
        chain(*first_set)

        last_task_in_first_set = first_set[-1]

        chain(
            last_task_in_first_set, [LongEmptyOperator(task_id=f"world_{i}_{'a' * 230}") for i in range(900)]
        )

        chain(
            last_task_in_first_set, [LongEmptyOperator(task_id=f"this_{i}_{'a' * 230}") for i in range(900)]
        )

        chain(last_task_in_first_set, [LongEmptyOperator(task_id=f"is_{i}_{'a' * 230}") for i in range(900)])

        chain(
            last_task_in_first_set, [LongEmptyOperator(task_id=f"silly_{i}_{'a' * 230}") for i in range(900)]
        )

        chain(
            last_task_in_first_set, [LongEmptyOperator(task_id=f"stuff_{i}_{'a' * 230}") for i in range(900)]
        )

serializing it can take 2.7GB

root@a24bae3584cb:/opt/airflow# pytest --memray tests/providers/openlineage/utils/test_utils.py::test_get_dag_tree_large_dag
=========================================================================================================================================================================== test session starts ============================================================================================================================================================================
platform linux -- Python 3.12.5, pytest-8.3.2, pluggy-1.5.0 -- /usr/local/bin/python
cachedir: .pytest_cache
rootdir: /opt/airflow
configfile: pyproject.toml
plugins: memray-1.7.0, timeouts-1.2.1, icdiff-0.9, mock-3.14.0, rerunfailures-14.0, requests-mock-1.12.1, xdist-3.6.1, asyncio-0.23.8, anyio-4.4.0, instafail-0.5.0, cov-5.0.0, time-machine-2.15.0, custom-exit-code-0.3.0
asyncio: mode=Mode.STRICT
setup timeout: 0.0s, execution timeout: 0.0s, teardown timeout: 0.0s
collected 1 item

tests/providers/openlineage/utils/test_utils.py::test_get_dag_tree_large_dag PASSED                                                                                                                                                                                                                                                                                  [100%]


============================================================================================================================================================================== MEMRAY REPORT ===============================================================================================================================================================================
Allocation results for tests/providers/openlineage/utils/test_utils.py::test_get_dag_tree_large_dag at the high watermark

	 📦 Total memory allocated: 5.4GiB
	 📏 Total allocations: 23
	 📊 Histogram of allocation sizes: |▁▁█  |
	 🥇 Biggest allocating functions:
		- _safe_get_dag_tree_view:/opt/airflow/airflow/providers/openlineage/utils/utils.py:446 -> 2.7GiB
		- get_tree_view:/opt/airflow/airflow/models/dag.py:2445 -> 2.7GiB
		- __setattr__:/opt/airflow/airflow/models/baseoperator.py:1191 -> 1.3MiB
		- __setattr__:/opt/airflow/airflow/models/baseoperator.py:1191 -> 1.3MiB
		- __setattr__:/opt/airflow/airflow/models/baseoperator.py:1191 -> 1.3MiB


=================================================================================================================================================================== Warning summary. Total: 3, Unique: 3 ===================================================================================================================================================================
airflow: total 1, unique 1
  collect: total 1, unique 1
other: total 2, unique 2
  collect: total 2, unique 2
Warnings saved into /opt/airflow/tests/warnings.txt file.
============================================================================================================================================================================ 1 passed in 8.60s =============================================================================================================================================================================

#41494

What you think should happen instead?

I think tree_view format should be changed to one that does not require extraordinary amount of whitespace in deeply nested cases.

Would be good to know in which cases it's being used though.

How to reproduce

You can use above dag.

Operating System

Docker/breeze on MacOS

Versions of Apache Airflow Providers

No response

Deployment

Other

Deployment details

No response

Anything else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions