Skip to content

Retry for TaskGroup #21867

@potiuk

Description

@potiuk

Discussed in #21333

Originally posted by sartyukhov February 4, 2022

Description

Hello!

Previously, a SubDag was used to organize tasks into groups. Now you've introduced a TaskGroups to the world .
It's nice and very clever. But it has a one big disadvantage over the SubDag - it cant be repeated.

Use case/motivation

For example:

In a project I have two task (A >> B):
A - collect data (PythonOperator)
B - update material view in postgres (PostgresOperator)

'A' could collect only part of data and mark itself as failed (there is no "half-failed" status as I know). But task 'B' should run regardless of A`s result (trigger_rule="all_done" for example) to update matview with part of data.
In an ~ hour I would like to repeat that process (A >> B).

With SubDag I could do that:

  • initiate SubDag with parameter retries=10
  • add DummyTask 'C' with trigger_rule="all_success"
  • change flow to A >> B >> C and A >> C

and that's it, C marks dag as failed and trigger it to retry.

But TaskGroup does not have retry parameter.
I also can't retry whole DAG, because it's big.
I also don't want to update material view inside task 'A' because in that way I can't do [A0, A1..An] >> B (update material view just once for several collects).

I hope it's possible. Or maybe it could be done some other way.
Thanks in advance.

Additional explanation on the use case (from #21333)

I have a specific use case where this feature would be useful. It is like:

There is a task to do one thing
There a second task (which depends on the first one) that does another thing, if this one fails I'll need to re-run the entire dag. I can't do both processes in the same task due to some limitations (I work with different java drivers on each one) and retrying the same task doesn't solve the problem because the result of this task will imply whether or not the first dag would need a re-execution.
Clear the previous task(s) also isn't good because it'll cause an infinite loop until everything succeeds, which is not exactly good, at least for me I would need only some 3-5 retries until it keeps a failed state.

My workaround for this was creating a dag that will trigger this dag, so if the triggered dag state is failed it'll re-execute the amount of times I set. However as you can see, it makes necessary the creation of 2 dags for solving the problem.

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions