Skip to content

Conversation

@ephraimbuddy
Copy link
Contributor

@ephraimbuddy ephraimbuddy commented Nov 4, 2022

We have a case where the mini scheduler tries to expand a mapped task even when the downstream tasks are not yet done.

The mini scheduler extracts a partial subset of a dag and in the process, some upstream tasks are dropped.
If the task happens to be a mapped task, the expansion will fail since it needs the upstream output to make the expansion. When the expansion fails, the task is marked as upstream_failed. This leads to other downstream tasks being marked as upstream failed.

The solution was to ignore this error and not mark the mapped task as upstream_failed when the expansion fails and the dag is a partial subset

closes: #27449

@boring-cyborg boring-cyborg bot added the area:Scheduler including HA (high availability) scheduler label Nov 4, 2022
@ephraimbuddy ephraimbuddy force-pushed the fix-mini-scheduler branch 3 times, most recently from fafa450 to c381035 Compare November 8, 2022 08:26
We have a case where the mini scheduler tries to expand a mapped task even when the downstream tasks are not yet done.

The mini scheduler extracts a partial subset of a dag and in the process, some upstream tasks are dropped.
If the task happens to be a mapped task, the expansion will fail since it needs the upstream output to make the expansion. When the expansion fails, the task is marked as `upstream_failed`. This leads to other downstream tasks being marked as upstream failed.

The solution was to ignore this error and not mark the mapped task as upstream_failed when the expansion fails and the dag is a partial subset
Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor points, but LGTM!

@ashb ashb added this to the Airflow 2.4.3 milestone Nov 9, 2022
@ephraimbuddy ephraimbuddy merged commit ed92e5d into apache:main Nov 9, 2022
@ephraimbuddy ephraimbuddy deleted the fix-mini-scheduler branch November 9, 2022 14:06
ephraimbuddy added a commit that referenced this pull request Nov 9, 2022
We have a case where the mini scheduler tries to expand a mapped task even when the downstream tasks are not yet done.

The mini scheduler extracts a partial subset of a dag and in the process, some upstream tasks are dropped.
If the task happens to be a mapped task, the expansion will fail since it needs the upstream output to make the expansion. When the expansion fails, the task is marked as `upstream_failed`. This leads to other downstream tasks being marked as upstream failed.

The solution was to ignore this error and not mark the mapped task as upstream_failed when the expansion fails and the dag is a partial subset

Co-authored-by: Ash Berlin-Taylor <[email protected]>
(cherry picked from commit ed92e5d)
@ephraimbuddy ephraimbuddy added the type:bug-fix Changelog: Bug Fixes label Nov 9, 2022
ephraimbuddy added a commit that referenced this pull request Nov 9, 2022
We have a case where the mini scheduler tries to expand a mapped task even when the downstream tasks are not yet done.

The mini scheduler extracts a partial subset of a dag and in the process, some upstream tasks are dropped.
If the task happens to be a mapped task, the expansion will fail since it needs the upstream output to make the expansion. When the expansion fails, the task is marked as `upstream_failed`. This leads to other downstream tasks being marked as upstream failed.

The solution was to ignore this error and not mark the mapped task as upstream_failed when the expansion fails and the dag is a partial subset

Co-authored-by: Ash Berlin-Taylor <[email protected]>
(cherry picked from commit ed92e5d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Scheduler including HA (high availability) scheduler type:bug-fix Changelog: Bug Fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dynamic tasks marked as upstream_failed when none of their upstream tasks are failed or upstream_failed

4 participants