Perform "mini scheduling run" after task has finished #11589

jhtimmins · 2020-10-16T16:20:13Z

In order to further reduce intra-dag task scheduling lag we add an
optimization: when a task has just finished executing (success or
failure) we can look at the downstream tasks of just that task, and then
make scheduling decisions for those tasks there -- we've already got the
dag loaded, and we know they are likely actionable as we just finished.

We should set tasks to scheduled if we can (but no further, i.e. not to
queued, as the scheduler has to make that decision with info about the
Pool usage etc.).

Co-authored-by: Ash Berlin-Taylor [email protected]

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

github-actions · 2020-10-16T16:33:32Z

The Workflow run is cancelling this PR. It in earlier duplicate of 1029499 run.

github-actions · 2020-10-16T16:33:33Z

The Workflow run is cancelling this PR. It in earlier duplicate of 2794935 run.

github-actions · 2020-10-16T17:29:20Z

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks$,^Build docs$,^Spell check docs$,^Backport packages$,^Checks: Helm tests$,^Test OpenAPI*.

mik-laj · 2020-10-17T00:27:11Z

airflow/jobs/scheduler_job.py

It seems to me that we should also move the comment. Now it has lost context.

This comment doesn't make sense when called on a (instance) method on DagRun, as that almost by definition only operators on a single dag run. The comment is kept here in the scheduler because that's where might think we want to batch the queries up, but shouldn't.

mik-laj · 2020-10-17T00:28:53Z

airflow/models/dagrun.py

Is there a reason we need to use TypedDict? NamedTuple is much easier to use in many cases.

No reason -- just happened to be the example we were looking at in PoolStats and copied that.

github-actions · 2020-10-27T06:12:16Z

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks$,^Build docs$,^Spell check docs$,^Backport packages$,^Checks: Helm tests$,^Test OpenAPI*.

airflow/models/dagrun.py

github-actions · 2020-10-28T04:33:36Z

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

ashb

We can also possibly reduce the duplication in tests between test_dagrun_fast_follow and test_dagrun_fast_follow_deactiveated by using @parameterized.expand and with conf_vars(...):

Oh also, these new tests should probably be in tests/models/test_taskinstance.py -- as the code we are testing is in TaskInstance primarily.

ashb · 2020-10-28T11:40:21Z

tests/models/test_dagrun.py

Suggested change

task_instance_a.task = dag.get_task(task_a.task_id)

task_instance_a.task = task_a

ashb · 2020-10-28T11:40:36Z

tests/models/test_dagrun.py

Suggested change

task_instance_b.task = dag.get_task(task_b.task_id)

task_instance_b.task = task_b

ashb · 2020-10-28T11:43:42Z

tests/models/test_dagrun.py

Do we need this block? I think this test would be clearer if we instead just directly set task_instance_a to a runnable state: For example:

task_instance_a.state = State.QUEUED session.commit()

My reason here is that this is the "pre-condition/setup" for the test, not part of what are actually testing here, so by having these asserts and calling the scheduler job code we are not testing this feature in isolation.

(For this to work the TI would need to be attached to the session you would need to pass session=session to dag_run.get_task_instance)

github-actions · 2020-10-29T19:09:53Z

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

kaxil · 2020-10-30T21:15:38Z

Whoops, alot of tests are failing: https://github.com/apache/airflow/pull/11589/checks?check_run_id=1330520635

ashb · 2020-10-31T10:28:54Z

airflow/models/dag.py

Suggested change

if include_direct_parents and not include_upstream:

also_include += t.get_flat_relatives(upstream=True, recurse=False)

elif include_direct_parents:

also_include += t.upstream_list

And then we don't need to add recurse=True parameter to get_flat_relatives (the "flat" part is to do the flattening/recursing part)

ashb · 2020-10-31T10:30:08Z

airflow/models/dagrun.py

Suggested change

class _TISchedulingDecision(NamedTuple):

class TISchedulingDecision(NamedTuple):

I don't know why I made this "private" -- and it looks a bit odd to use it in a return value as it was

ashb · 2020-10-31T10:31:00Z

tests/models/test_dagrun.py

Is this still needed, or can we revert the changes to this file?

ashb · 2020-10-31T10:32:39Z

tests/models/test_taskinstance.py

Could you add a comment here saying what we're testing, why we expect B to not be scheduled etc -- this will help Future Us

github-actions · 2020-11-02T15:57:24Z

The PR needs to run all tests because it modifies core of Airflow! Please rebase it to latest master or ask committer to re-run it!

turbaszek · 2020-11-02T17:50:16Z

airflow/models/taskinstance.py

How about encapsulating this into separate method? The _run_raw_task is already a long one 😉

Yeah that's a good point. Will do.

turbaszek · 2020-11-02T17:55:42Z

airflow/models/taskinstance.py

As I user I would be worried about seeing such info logs. Should it be debug?

I think it depends what the exception was

It always will be a "database exception" which is rather critical one imho. And here we are telling users "your database refused something but you don't have to worry about it". I think we either have to make it less "critical" like Skipping mini scheduling run due to exception: %s. But still, logging the exception will show problem with database...

The most likely case I expect here is a "cannot reach DB" network error.

But yeah, I like your message better.

turbaszek · 2020-11-02T17:56:54Z

airflow/models/dagrun.py

Will it work for operators that inherit from DummyOperator?

No, there is already an issue for that though

turbaszek · 2020-11-02T17:57:40Z

airflow/models/dagrun.py

Is this comment still valid?

No, not any more. Good catch

In order to further reduce intra-dag task scheduling lag we add an optimization: when a task has just finished executing (success or failure) we can look at the downstream tasks of just that task, and then make scheduling decisions for those tasks there -- we've already got the dag loaded, and we know they are likely actionable as we just finished. We should set tasks to scheduled if we can (but no further, i.e. not to queued, as the scheduler has to make that decision with info about the Pool usage etc.). Co-authored-by: Ash Berlin-Taylor <[email protected]>

turbaszek · 2020-11-03T09:33:23Z

airflow/models/taskinstance.py

        session.commit()

+        self._run_mini_scheduler_on_child_tasks(session)


It may happened that we will do rollback on session we already committed, is is expected @jhtimmins ?

boring-cyborg bot added the area:Scheduler including HA (high availability) scheduler label Oct 16, 2020

jhtimmins closed this Oct 16, 2020

jhtimmins reopened this Oct 16, 2020

jhtimmins changed the title ~~Perform "mini scheduling run" after task has finished~~ WIP: Perform "mini scheduling run" after task has finished Oct 16, 2020

mik-laj reviewed Oct 17, 2020

View reviewed changes

kaxil added the AIP-15 label Oct 22, 2020

jhtimmins force-pushed the mini-scheduler-after-task-completed branch from 6a08a30 to 4a1d48d Compare October 27, 2020 05:40

ashb reviewed Oct 27, 2020

View reviewed changes

airflow/models/dagrun.py Outdated Show resolved Hide resolved

jhtimmins changed the title ~~WIP: Perform "mini scheduling run" after task has finished~~ Perform "mini scheduling run" after task has finished Oct 27, 2020

jhtimmins commented Oct 27, 2020

View reviewed changes

airflow/models/dagrun.py Outdated Show resolved Hide resolved

jhtimmins force-pushed the mini-scheduler-after-task-completed branch from 58894bc to df73a15 Compare October 28, 2020 03:20

ashb requested changes Oct 28, 2020

View reviewed changes

jhtimmins force-pushed the mini-scheduler-after-task-completed branch 2 times, most recently from a2cc203 to 929e47d Compare October 29, 2020 18:50

jhtimmins force-pushed the mini-scheduler-after-task-completed branch from 929e47d to d12b898 Compare October 30, 2020 05:47

jhtimmins force-pushed the mini-scheduler-after-task-completed branch 2 times, most recently from 1c060f0 to 78668d9 Compare October 31, 2020 01:27

ashb reviewed Oct 31, 2020

View reviewed changes

tests/models/test_dagrun.py Outdated

Copy link

Member

ashb Oct 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still needed, or can we revert the changes to this file?

ashb reviewed Oct 31, 2020

View reviewed changes

jhtimmins force-pushed the mini-scheduler-after-task-completed branch from b407901 to c5b046b Compare November 1, 2020 00:11

jhtimmins force-pushed the mini-scheduler-after-task-completed branch from c5b046b to 21d6b47 Compare November 2, 2020 15:19

paolaperaza added this to the Airflow 2.0.0-beta1 milestone Nov 2, 2020

ashb approved these changes Nov 2, 2020

View reviewed changes

github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Nov 2, 2020

turbaszek reviewed Nov 2, 2020

View reviewed changes

houqp approved these changes Nov 2, 2020

View reviewed changes

jhtimmins and others added 13 commits November 2, 2020 12:35

Integrate merge conflict fixes.

84d4348

Add fast follow tests.

2bf9f8d

Remove dupe 'Upstream Failed'.

c72109c

Simplify fast follow tests.

7640757

Fast follow should handle parent tasks properly.

76c1651

Us namedtuple to store scheduling decision.

b7a4555

Add DagRuns to tests.

1084bd1

Turn off fast-follow for tests.

08e90f2

Fix bad merge.

ca350ff

Simplify getting siblings.

c8387a6

Simplify include_direct_upstream.

6490f40

Move mini scheduler to standalone method.

613da86

jhtimmins force-pushed the mini-scheduler-after-task-completed branch from 21d6b47 to 613da86 Compare November 2, 2020 20:36

ashb merged commit eea6c4f into apache:master Nov 3, 2020

turbaszek reviewed Nov 3, 2020

View reviewed changes

ashb deleted the mini-scheduler-after-task-completed branch November 12, 2020 12:37

luoyuliuyin mentioned this pull request Oct 4, 2024

fix schedule_downstream_tasks bug #42582

Merged

	task_instance_a.task = dag.get_task(task_a.task_id)
	task_instance_a.task = task_a

	task_instance_b.task = dag.get_task(task_b.task_id)
	task_instance_b.task = task_b

-            if include_direct_parents and not include_upstream:
-                also_include += t.get_flat_relatives(upstream=True, recurse=False)
+            elif include_direct_parents:
+                also_include += t.upstream_list

	class _TISchedulingDecision(NamedTuple):
	class TISchedulingDecision(NamedTuple):

		session.commit()

		self._run_mini_scheduler_on_child_tasks(session)

Perform "mini scheduling run" after task has finished #11589

Perform "mini scheduling run" after task has finished #11589

Uh oh!

Conversation

jhtimmins commented Oct 16, 2020 • edited by kaxil Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 16, 2020

Uh oh!

github-actions bot commented Oct 16, 2020

Uh oh!

github-actions bot commented Oct 16, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 27, 2020

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Oct 28, 2020

Uh oh!

ashb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 29, 2020

Uh oh!

kaxil commented Oct 30, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 2, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

jhtimmins commented Oct 16, 2020 •

edited by kaxil

Loading