-
Notifications
You must be signed in to change notification settings - Fork 16.5k
Closed
Labels
area:corearea:data-aware-schedulingassets, datasets, AIP-48assets, datasets, AIP-48kind:bugThis is a clearly a bugThis is a clearly a bugneeds-triagelabel for new issues that we didn't triage yetlabel for new issues that we didn't triage yet
Description
Apache Airflow version
main (development)
If "Other Airflow 2 version" selected, which one?
No response
What happened?
When testing asset scheduling on Airflow 3, i encountered some errors. DAGs with multiple datasets like:
from airflow.providers.standard.operators.bash import BashOperator
from airflow.datasets import Dataset
from airflow import DAG
import datetime as dt
with DAG(
dag_id='dataset_consumer_two_datasets_as_list',
start_date=dt.datetime(2024, 7, 3),
schedule=[Dataset("ds1", {"some_extra": 1}), Dataset("ds2")],
) as dag2:
task_2 = BashOperator(
task_id="task_2",
bash_command="exit 0;",
)
display this error in the UI:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/urllib/parse.py", line 121, in _decode_args
return tuple(x.decode(encoding, errors) if x else '' for x in args)
File "/usr/local/lib/python3.9/urllib/parse.py", line 121, in <genexpr>
return tuple(x.decode(encoding, errors) if x else '' for x in args)
AttributeError: 'dict' object has no attribute 'decode'
The same happens for any schedule using more than one dataset:
with DAG(
dag_id='dataset_consumer_two_datasets_as_list',
start_date=dt.datetime(2024, 7, 3),
schedule=[Dataset("ds1", {"some_extra": 1}), Dataset("ds2")],
) as dag2:
task_2 = BashOperator(
task_id="task_2",
bash_command="exit 0;",
)
with DAG(
dag_id='dataset_consumer_two_datasets_with_or',
start_date=dt.datetime(2024, 7, 3),
schedule=(Dataset("ds1", {"some_extra": 1}) | Dataset("ds2")),
) as dag3:
task_3 = BashOperator(
task_id="task_3",
bash_command="exit 0;",
)
with DAG(
dag_id='dataset_consumer_complex_logical_dependency',
start_date=dt.datetime(2024, 7, 3),
schedule=(
(Dataset("ds1", {"some_extra": 1}) | Dataset("ds2"))
& (Dataset("ds3") | Dataset("ds4"))
),
) as dag4:
task_4 = BashOperator(
task_id="task_4",
bash_command="exit 0;",
)
with DAG(
dag_id='dataset_consumer_dataset_or_time_schedule',
start_date=dt.datetime(2024, 7, 3),
schedule=AssetOrTimeSchedule(
timetable=CronTriggerTimetable("0 0 * 4 *", timezone="UTC"),
# datasets=(
assets=(
(Dataset("ds1", {"some_extra": 1}) | Dataset("ds2"))
& (Dataset("ds3") | Dataset("ds4", {"another_extra": 345}))
)
),
) as dag5:
task_5 = BashOperator(
task_id="task_5",
bash_command="exit 0;",
)
The interesting part is that single Dataset seems to be working just fine:
with DAG(
dag_id='dataset_consumer_single_dataset',
start_date=dt.datetime(2024, 7, 3),
schedule=Dataset("ds1"),
) as dag0:
task_0 = BashOperator(
task_id="task_0",
bash_command="exit 0;",
)
with DAG(
dag_id='dataset_consumer_single_dataset_as_list',
start_date=dt.datetime(2024, 7, 3),
schedule=[Dataset("ds1")],
) as dag1:
task_1 = BashOperator(
task_id="task_1",
bash_command="exit 0;",
)
with DAG(
dag_id='dataset_consumer_dataset_or_time_schedule',
start_date=dt.datetime(2024, 7, 3),
schedule=AssetOrTimeSchedule(
timetable=CronTriggerTimetable("0 0 * 4 *", timezone="UTC"),
assets=Dataset("ds1")
),
) as dag5:
task_5 = BashOperator(
task_id="task_5",
bash_command="exit 0;",
)
What you think should happen instead?
No response
How to reproduce
Run breeze on main and try running any of the above DAGs.
Operating System
MacOs 15.3.1 (24D70)
Versions of Apache Airflow Providers
No response
Deployment
Virtualenv installation
Deployment details
Breeze
Anything else?
Maybe I'm doing something wrong or my installation is somehow broken, let me know.
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area:corearea:data-aware-schedulingassets, datasets, AIP-48assets, datasets, AIP-48kind:bugThis is a clearly a bugThis is a clearly a bugneeds-triagelabel for new issues that we didn't triage yetlabel for new issues that we didn't triage yet