-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Description
Apache Airflow version
2.3.3
What happened
If you try to backfill a DAG that uses any deferrable operators, those tasks will get indefinitely stuck in a "scheduled" state.
If I watch the Grid View, I can see the task state change: "scheduled" (or sometimes "queued") -> "deferred" -> "scheduled". I've tried leaving in this state for over an hour, but there are no further state changes.
When the task is stuck like this, the log appears as empty in the web UI. The corresponding log file does exist on the worker, but it does not contain any errors or warnings that might point to the source of the problem.
Ctrl-C-ing the backfill at this point seems to hang on "Shutting down LocalExecutor; waiting for running tasks to finish." Force-killing and restarting the backfill will "unstick" the stuck tasks. However, any deferrable operators downstream of the first will get back into that stuck state, requiring multiple restarts to get everything to complete successfully.
What you think should happen instead
Deferrable operators should work as normal when backfilling.
How to reproduce
#!/usr/bin/env python3
import datetime
import logging
import pendulum
from airflow.decorators import dag, task
from airflow.sensors.time_sensor import TimeSensorAsync
logger = logging.getLogger(__name__)
@dag(
schedule_interval='@daily',
start_date=datetime.datetime(2022, 8, 10),
)
def test_backfill():
time_sensor = TimeSensorAsync(
task_id='time_sensor',
target_time=datetime.time(0).replace(tzinfo=pendulum.UTC), # midnight - should succeed immediately when the trigger first runs
)
@task
def some_task():
logger.info('hello')
time_sensor >> some_task()
dag = test_backfill()
if __name__ == '__main__':
dag.cli()
airflow dags backfill test_backfill -s 2022-08-01 -e 2022-08-04
Operating System
CentOS Stream 8
Versions of Apache Airflow Providers
None
Deployment
Other
Deployment details
Self-hosted/standalone
Anything else
I was able to reproduce this with the following configurations:
standalonemode + SQLite backend +SequentialExecutorstandalonemode + Postgres backend +LocalExecutor- Production deployment (self-hosted) + Postgres backend +
CeleryExecutor
I have not yet found anything telling in any of the backend logs.
Possibly related:
- Airflow backfill command not working; Tasks are stuck in scheduled forever. #23693
- Task stuck in "scheduled" when running in backfill job #23145
- Task stuck in "scheduled" or "queued" state, pool has all slots queued, nothing is executing #13542
- A modified version of the workaround mentioned in this comment works to unstick the first stuck task. However if you run it multiple times to try to unstick any downstream tasks, it causes the backfill command to crash.
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct