Skip to content

Airflow 3 - DAG schedule with multiple Assets is causing parsing errors #47615

@kacpermuda

Description

@kacpermuda

Apache Airflow version

main (development)

If "Other Airflow 2 version" selected, which one?

No response

What happened?

When testing asset scheduling on Airflow 3, i encountered some errors. DAGs with multiple datasets like:

from airflow.providers.standard.operators.bash import BashOperator
from airflow.datasets import Dataset
from airflow import DAG
import datetime as dt

with DAG(
    dag_id='dataset_consumer_two_datasets_as_list',
    start_date=dt.datetime(2024, 7, 3),
    schedule=[Dataset("ds1", {"some_extra": 1}), Dataset("ds2")],
) as dag2:
    task_2 = BashOperator(
        task_id="task_2",
        bash_command="exit 0;",
    )

display this error in the UI:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/urllib/parse.py", line 121, in _decode_args
    return tuple(x.decode(encoding, errors) if x else '' for x in args)
  File "/usr/local/lib/python3.9/urllib/parse.py", line 121, in <genexpr>
    return tuple(x.decode(encoding, errors) if x else '' for x in args)
AttributeError: 'dict' object has no attribute 'decode'

The same happens for any schedule using more than one dataset:


with DAG(
    dag_id='dataset_consumer_two_datasets_as_list',
    start_date=dt.datetime(2024, 7, 3),
    schedule=[Dataset("ds1", {"some_extra": 1}), Dataset("ds2")],
) as dag2:
    task_2 = BashOperator(
        task_id="task_2",
        bash_command="exit 0;",
    )

with DAG(
    dag_id='dataset_consumer_two_datasets_with_or',
    start_date=dt.datetime(2024, 7, 3),
    schedule=(Dataset("ds1", {"some_extra": 1}) | Dataset("ds2")),
) as dag3:
    task_3 = BashOperator(
        task_id="task_3",
        bash_command="exit 0;",
    )

with DAG(
    dag_id='dataset_consumer_complex_logical_dependency',
    start_date=dt.datetime(2024, 7, 3),
    schedule=(
        (Dataset("ds1", {"some_extra": 1}) | Dataset("ds2"))
        & (Dataset("ds3") | Dataset("ds4"))
    ),
) as dag4:
    task_4 = BashOperator(
        task_id="task_4",
        bash_command="exit 0;",
    )

with DAG(
    dag_id='dataset_consumer_dataset_or_time_schedule',
    start_date=dt.datetime(2024, 7, 3),
    schedule=AssetOrTimeSchedule(
        timetable=CronTriggerTimetable("0 0 * 4 *", timezone="UTC"),
        # datasets=(
        assets=(
        (Dataset("ds1", {"some_extra": 1}) | Dataset("ds2"))
        & (Dataset("ds3") | Dataset("ds4", {"another_extra": 345}))
        )
    ),
) as dag5:
    task_5 = BashOperator(
        task_id="task_5",
        bash_command="exit 0;",
    )

The interesting part is that single Dataset seems to be working just fine:

with DAG(
    dag_id='dataset_consumer_single_dataset',
    start_date=dt.datetime(2024, 7, 3),
    schedule=Dataset("ds1"),
) as dag0:
    task_0 = BashOperator(
        task_id="task_0",
        bash_command="exit 0;",
    )

with DAG(
    dag_id='dataset_consumer_single_dataset_as_list',
    start_date=dt.datetime(2024, 7, 3),
    schedule=[Dataset("ds1")],
) as dag1:
    task_1 = BashOperator(
        task_id="task_1",
        bash_command="exit 0;",
    )

with DAG(
    dag_id='dataset_consumer_dataset_or_time_schedule',
    start_date=dt.datetime(2024, 7, 3),
    schedule=AssetOrTimeSchedule(
        timetable=CronTriggerTimetable("0 0 * 4 *", timezone="UTC"),
        assets=Dataset("ds1")
    ),
) as dag5:
    task_5 = BashOperator(
        task_id="task_5",
        bash_command="exit 0;",
    )

What you think should happen instead?

No response

How to reproduce

Run breeze on main and try running any of the above DAGs.

Operating System

MacOs 15.3.1 (24D70)

Versions of Apache Airflow Providers

No response

Deployment

Virtualenv installation

Deployment details

Breeze

Anything else?

Maybe I'm doing something wrong or my installation is somehow broken, let me know.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions