Skip to content

Conversation

@pingzh
Copy link
Contributor

@pingzh pingzh commented Aug 31, 2020

it consists of CeleryExecutor and KubernetesExecutor, which allows users
to route their tasks to either Kubernetes or Celery based on the queue
defined on a task


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@boring-cyborg boring-cyborg bot added the area:Scheduler including HA (high availability) scheduler label Aug 31, 2020
@pingzh
Copy link
Contributor Author

pingzh commented Aug 31, 2020

Hi @kaxil and @dimberman

could you please take a look at this PR? I think I also need another pr to indicate which methods are private in the executor so that those methods are not supposed to be invoked outside the executor. Please let me know your thoughts on this.

@ashb
Copy link
Member

ashb commented Sep 9, 2020

@potiuk Worth resurrecting your "multi-executor" PR from earlier this year to make it more generic, or do you think this is Good Enough?

@potiuk
Copy link
Member

potiuk commented Sep 9, 2020

@potiuk Worth resurrecting your "multi-executor" PR from earlier this year to make it more generic, or do you think this is Good Enough?

I think Multi-executor was too-generic. There are hardly use cases for a lot of combinations of the executors and the increased complexity of the configuration and potential side effects and issues if someone tries to combine for example Sequential Executor with KubernetesExecutor are well.... scary.

Also, it gives a very good message IMHO -> if you are in Production use Celery and Kubernetes executor. Full Stop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be configurable? What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the default should be "kubernetes" though.

@turbaszek
Copy link
Member

+1 for this approach instead of "generic-multi-executor". @pingzh can you add please some documentation under docs/executor?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
CeleryKubernetesExecutor consists of celery_executor and kubernetes_executor.
CeleryKubernetesExecutor consists of CeleryExecutor and KubernetesExecutor.

Comment on lines 31 to 32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When the queue is `kubernetes`, kubernetes_executor is selected to run the task,
otherwise, celery_executor is used.
When the queue is `kubernetes`, KubernetesExecutor is selected to run the task,
otherwise, CeleryExecutor is used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I have to do this 😄 I was thinking about this yesterday but was not sure... Do you think that this name is good? On one hand, it's explicit on the other... I can imagine questions like "Is this for Celery on Kubernetes?", "Can I use CeleryExecutor on kubernetes? Or do I have to use this one?"

Do I have any better ideas? CeleryAndKubernetesExecutor? Is this better? I don't know just want to hear what others think 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about the naming. i like it, CeleryAndKubernetesExecutor which indicates celery and kubernetes are both used in this class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinion on the naming of the class, I am fine with either since we have the first line of the docstring explaining it :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@turbaszek @pingzh Maybe we can name it the "MultiExecutor" or "JoinedExecutor"? I could see a case where in the future we route even more executors into this (like Dask)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dimberman thanks for the suggestion. I think we can go further to have a BaseComposeddExecutor, which serves as the base class for any composed executor. The only two methods for the subclass to override is the __init__ and also the _route method.

However, I am thinking of whether we over-optimize it in this case. I don't want to have a too generic name at the beginning. We can come back to refactor the logic when we have another use case like this.

it consists of CeleryExecutor and KubernetesExecutor, which allows users
to route their tasks to either Kubernetes or Celery based on the queue
defined on a task
@pingzh
Copy link
Contributor Author

pingzh commented Sep 10, 2020

Strange, i started to get this static check error after I rebased the master:

Run mypy..........................................................................................................................Failed
- hook id: mypy
- exit code: 1

airflow/models/dagrun.py:72: error: unused 'type: ignore' comment
            primaryjoin=and_(TI.dag_id == dag_id, TI.execution_date == exe...
            ^
Found 1 error in 1 file (checked 2408 source files)

@potiuk
Copy link
Member

potiuk commented Sep 12, 2020

Strange, i started to get this static check error after I rebased the master:

Please rebase it again. We had a MyPy error kicking in and it was fixed in #10879

@pingzh
Copy link
Contributor Author

pingzh commented Sep 12, 2020

(I re-forked the apache/airflow repo, mine was the incubator-airflow)

the new PR is: #10901

@turbaszek
Copy link
Member

@pingzh should we close this one?

@pingzh
Copy link
Contributor Author

pingzh commented Sep 12, 2020

@pingzh should we close this one?

Let us close this one. Sorry about this as it caused the comments not showing up in the new PR.

@pingzh pingzh closed this Sep 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Scheduler including HA (high availability) scheduler

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants