Allow CeleryExecutor to "adopt" an orphaned queued or running task #10949

ashb · 2020-09-15T09:32:06Z

This can happen when a task is enqueued by one executor, and then that
scheduler dies/exits.

The default fallback behaviour is unchanged -- that queued tasks are
cleared and then and then later rescheduled.

But for Celery we can do better -- if we record the Celery-generated
task_id, we can then re-create the AsyncResult objects for orphaned
tasks at a later date.

However since Celery just reports all AsyncResult as "PENDING", even if
they aren't tasks currently in the broker queue, we need to apply a
timeout to "unblock" these tasks in case they never actually made it to
the Celery broker.

This all means that we can adopt tasks that have been enqueued another
CeleryExecutor if it dies, without having to clear the task and slow
down. This is especially useful as the task may have already started
running, and while clearing it would stop it, it's better if we don't
have to reset it!

Part of #9630

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

XD-DENG

A minor suggestion

airflow/executors/celery_executor.py

airflow/migrations/versions/e1a11ece99cc_add_external_executor_id_to_ti.py

ashb · 2020-09-16T09:40:23Z

The doc tests are now failing with:

/opt/airflow/docs/_api/airflow/executors/base_executor/index.rst:198:more than one target found for cross-reference 'TaskInstance': airflow.models.TaskInstance, airflow.models.taskinstance.TaskInstance

But I didn't change the imports in that file :/

Oh I missed Kaxil's earlier fix in my rebase+force-push

airflow/executors/base_executor.py

airflow/config_templates/config.yml

airflow/executors/celery_executor.py

airflow/jobs/scheduler_job.py

airflow/utils/sqlalchemy.py

turbaszek

Looks good to me 👍

XD-DENG

One more minor comment.

airflow/executors/celery_executor.py

ashb · 2020-09-16T16:45:07Z

Right, final rebase done, will merge once tests are green.

(I wish Github had a button for that)

This can happen when a task is enqueued by one executor, and then that scheduler dies/exits. The default fallback behaviour is unchanged -- that queued tasks are cleared and then and then later rescheduled. But for Celery we can do better -- if we record the Celery-generated task_id, we can then re-create the AsyncResult objects for orphaned tasks at a later date. However since Celery just reports all AsyncResult as "PENDING", even if they aren't tasks currently in the broker queue, we need to apply a timeout to "unblock" these tasks in case they never actually made it to the Celery broker. This all means that we can adopt tasks that have been enqueued another CeleryExecutor if it dies, without having to clear the task and slow down. This is especially useful as the task may have already started running, and while clearing it would stop it, it's better if we don't have to reset it! Co-authored-by: Kaxil Naik <[email protected]>

boring-cyborg bot added area:docs area:Scheduler including HA (high availability) scheduler labels Sep 15, 2020

ashb requested review from mik-laj, potiuk and turbaszek and removed request for mik-laj and turbaszek September 15, 2020 09:32

ashb added the AIP-15 label Sep 15, 2020

ashb force-pushed the adopt-dont-reset-celery-tasks branch from b58fac4 to c1a3f9f Compare September 15, 2020 09:33

ashb requested review from mik-laj and turbaszek September 15, 2020 09:40

kaxil force-pushed the adopt-dont-reset-celery-tasks branch 4 times, most recently from 4f3cf22 to 627207a Compare September 15, 2020 12:17

kaxil mentioned this pull request Sep 15, 2020

Remove test dependency in TestApiKerberos #10950

Merged

kaxil force-pushed the adopt-dont-reset-celery-tasks branch from 627207a to 2871b48 Compare September 15, 2020 13:24

ashb mentioned this pull request Sep 15, 2020

Fully support running more than one scheduler concurrently #10956

Merged

8 tasks

ashb requested review from XD-DENG and houqp September 15, 2020 19:10

XD-DENG reviewed Sep 15, 2020

View reviewed changes

airflow/executors/celery_executor.py Outdated Show resolved Hide resolved

XD-DENG reviewed Sep 15, 2020

View reviewed changes

airflow/executors/celery_executor.py Outdated Show resolved Hide resolved

XD-DENG reviewed Sep 15, 2020

View reviewed changes

airflow/migrations/versions/e1a11ece99cc_add_external_executor_id_to_ti.py Outdated Show resolved Hide resolved

ashb force-pushed the adopt-dont-reset-celery-tasks branch from 2871b48 to 12debd5 Compare September 16, 2020 09:01

ashb requested a review from XD-DENG September 16, 2020 09:02

ashb mentioned this pull request Sep 16, 2020

Officially support HA for scheduler component (AIP-15) #9630

Closed

10 tasks

ashb force-pushed the adopt-dont-reset-celery-tasks branch 2 times, most recently from bbb6f5f to 03a9290 Compare September 16, 2020 10:17

XD-DENG requested changes Sep 16, 2020

View reviewed changes

airflow/executors/base_executor.py Outdated Show resolved Hide resolved

ashb force-pushed the adopt-dont-reset-celery-tasks branch from 03a9290 to 7d13295 Compare September 16, 2020 11:49

ashb requested a review from XD-DENG September 16, 2020 13:02

ashb force-pushed the adopt-dont-reset-celery-tasks branch 2 times, most recently from 64c2215 to 7c05b37 Compare September 16, 2020 13:57

turbaszek reviewed Sep 16, 2020

View reviewed changes

airflow/config_templates/config.yml Outdated Show resolved Hide resolved

turbaszek reviewed Sep 16, 2020

View reviewed changes

airflow/executors/celery_executor.py Outdated Show resolved Hide resolved

turbaszek reviewed Sep 16, 2020

View reviewed changes

airflow/executors/celery_executor.py Outdated Show resolved Hide resolved

turbaszek reviewed Sep 16, 2020

View reviewed changes

airflow/executors/celery_executor.py Outdated Show resolved Hide resolved

turbaszek reviewed Sep 16, 2020

View reviewed changes

airflow/jobs/scheduler_job.py Show resolved Hide resolved

turbaszek reviewed Sep 16, 2020

View reviewed changes

airflow/utils/sqlalchemy.py Outdated Show resolved Hide resolved

ashb requested a review from turbaszek September 16, 2020 15:58

turbaszek approved these changes Sep 16, 2020

View reviewed changes

XD-DENG reviewed Sep 16, 2020

View reviewed changes

airflow/executors/celery_executor.py Outdated Show resolved Hide resolved

XD-DENG approved these changes Sep 16, 2020

View reviewed changes

ashb force-pushed the adopt-dont-reset-celery-tasks branch from 3f4b2ea to f1967f6 Compare September 16, 2020 16:44

ashb force-pushed the adopt-dont-reset-celery-tasks branch from f1967f6 to 092b9c0 Compare September 16, 2020 17:45

kaxil merged commit 59dad1a into apache:master Sep 16, 2020

kaxil mentioned this pull request Sep 23, 2020

Kubernetes executor can adopt tasks from other schedulers #10996

Merged

ashb deleted the adopt-dont-reset-celery-tasks branch December 7, 2020 17:26

Kytha mentioned this pull request Sep 4, 2024

Remove state sync during celery task processing #41870

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow CeleryExecutor to "adopt" an orphaned queued or running task #10949

Allow CeleryExecutor to "adopt" an orphaned queued or running task #10949

Uh oh!

ashb commented Sep 15, 2020 •

edited

Loading

Uh oh!

XD-DENG left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ashb commented Sep 16, 2020 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

turbaszek left a comment •

edited

Loading

Uh oh!

XD-DENG left a comment

Uh oh!

Uh oh!

ashb commented Sep 16, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Allow CeleryExecutor to "adopt" an orphaned queued or running task #10949

Allow CeleryExecutor to "adopt" an orphaned queued or running task #10949

Uh oh!

Conversation

ashb commented Sep 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

XD-DENG left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ashb commented Sep 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

turbaszek left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

XD-DENG left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ashb commented Sep 16, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ashb commented Sep 15, 2020 •

edited

Loading

ashb commented Sep 16, 2020 •

edited

Loading

turbaszek left a comment •

edited

Loading