-
-
Notifications
You must be signed in to change notification settings - Fork 750
Use Enum for worker, scheduler and nanny status. #3853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Mostly an experiment/suggestion, I'm perfectly ok if you are not open to such a change. This was mostly inspired by trying to figure out if it is possible to "drain" the workers, and wait until they are in a idle state, to try to fix the issue when retiring worker cannot replicate all the keys and fails. I ended up trying to list all the "status" and where they were used and this helped me a bit. I think I would also probably be more explicit with "WorkerStatus", "SchedulerStatus" and "NannyStatus", unless some of these states need to be unions of the current ones. The other routes would have been properties, though I like the explicitness of Enums. |
Looking at the code it was hard to find all the states a worker, nanny
scheduler could be
in, and in particular it is easy to forget 'closing_gracefully' for
example when skimming over a `self.status.startswith('clos')`, and it
also hard to fine that in some places the status can be "None", "init",
"starting" and "stopped"
So I just took a pass at seeing in how many places status was
assigned/compared in worker, nanny and scheduler and replace that by an
Enum I'm assuming the status each can see is not shared so we end up
I thus tried to do separate Status Enums for Scheduler/Nanny/Worker.
Note that even if Enums have values for their variants, the variants _do
not_ compare equal to their values. That is to say `Status.running =!
'running'`, but `Status.running.value == 'running'`,
So I implemented a custom `__eq__` method that can compare with
strings/None for backward compatibility. It works both when the LHS or
RHS is str/None as the default impl of str/None forward to our
implementation. This also make sure the value you compare with is also
in the list of possible state.
The main other code changes were updating some of the comparisons to use
the enums instead of the strings directly, and replace the "startswith"
with the `in` operator, which is slightly more verbose.
I believe despite being a bit more verbose, it is a bit more explicit and
may help navigate the codebase and if you want to go this route you
probably want to remove the custom comparison but that would be an API
change that could break other project.
|
Actually it seem from the more extensive testsuite than the one I ran the actually values get compared between Worker and Scheduler. So I'll have to revise this. |
|
In principle an approach like this seems sensible to me. It probably makes sense to make things a bit more formal today. This might be a good opportunity for us to think about our statuses more broadly and see if there are opportunities to unify them (it would be nice to have only one Status type). cc'ing @jcrist , who I think will be made happy by this PR |
If possible, I agree this would be a good thing to do. The scheduler, worker, and nanny all should have similar states they transition through during execution. The |
My hope is that we can have one set of states that is consistently used across all components. That would also solve this:
|
|
Thanks, I wasn't sure whether the states were supposed to be shared or at least have a common subset. We Could have a core Enum, and deved enums is my guess (though they will need to not inherit from Enum), and/or restrict the stated on Scheduler/Nany/Worker by making status a property. Let me update that with a single unified status type first and then let's see if we can shave some states or refactor. |
Status get compared with each other and we likely want to consolidate all the status.
|
@lesteve @jacobtomlinson @jcrist thoughts on applying this same change to downstream dask-foo cluster managers? |
|
Dask-Gateway has a status field that's mostly for debouncing close operations - would be fine to use the enum here instead. Dask-Yarn currently has no status field, state is tracked via a couple internal event flags. |
|
It looks like dask-kubernetes doesn't modify status, but dask-cloudprovider
does. I wonder if we might be able to handle some of this by using things
like `super().start()` and `super().close()` and relying on behavior there.
…On Thu, Jun 4, 2020 at 11:49 AM Jim Crist-Harif ***@***.***> wrote:
Dask-Gateway has a status field that's mostly for debouncing close
operations - would be fine to use the enum here instead. Dask-Yarn
currently has no status field, state is tracked via a couple internal event
flags.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3853 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTBZB3GHMATHEO2IIJTRU7UDXANCNFSM4NSDIBLA>
.
|
|
A safe first step would be to make all of those |
|
Which is also relatively easy as they are all subclass of |
|
Can one of you kick the failing travis build ? |
Fine with me, we rely on |
Yeah I support making these changes downstream. And also pulling dask-kubernetes inline. |
| starting = "starting" | ||
| stopped = "stopped" | ||
| stopping = "stopping" | ||
| undefined = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like closing/closed and stopping/stopped might be redundant. I would be in favor of unifying around one pair of values, presumably closing/closed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a sense for how often users check these values and if there is a way that we should make this change smoothly. I'm inclined to make a small breaking change here, just because I don't think that many people depend on these values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are many ways of handling this, in the properties you can emit a warning if you get a given value assigned tell users to use the Enum, and same in the __eq__ method, if you see comparison with the deprecated values, you warn.
Those warnings can be turned into error in tests.
You can also help refactor with having two variants of the enum map to the same value.
| _instances = weakref.WeakSet() | ||
| process = None | ||
| status = None | ||
| status = Status.undefined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if Status.undefined and Status.init should be unified.
|
Do you want to do all the unification as part of this PR, or do you want to get something smaller in master and then bikesheb on which one is renamed how later ? My guess is that the renaming/dropping is backward incompatible and easier to discuss on its own/revert if necessary. |
Good point. I'm happy to continue on this PR without these changes. |
|
I think that there is a release in progress today. Let's let that pass through and then we can merge this and let it sit in master for a while to see if any issues arise. |
|
No problem, and no hurry. |
|
OK, merging this in 24 hours if there are no further comments. |
|
Thanks, sounds great, once done I can change in more places and start adding warnings. |
|
Once this is in we should open issues on downstream projects to track implementing this there too. This blog post I've submitted should be a good overview of the existing cluster managers. |
|
OK, this is in. I'm curious to see how much activity this generates :) |
Turn this into an error in the test suite. We want to make sure of various things: 1) original behavior of comparing/assign with string still works but emit warnings. 1b) assigning strings convert to proper enum variant. 2) assign/comparison with invalid strings should fail 2) warnings are errors in test suite to make sure we don't re-introduce strings. This is the continuation of dask#3853
Turn this into an error in the test suite. We want to make sure of various things: 1) original behavior of comparing/assign with string still works but emit warnings. 1b) assigning strings convert to proper enum variant. 2) assign/comparison with invalid strings should fail 2) warnings are errors in test suite to make sure we don't re-introduce strings. This is the continuation of dask#3853 Maybe Cluster in cluster.py should also get status as a property
Turn this into an error in the test suite. We want to make sure of various things: 1) original behavior of comparing/assign with string still works but emit warnings. 1b) assigning strings convert to proper enum variant. 2) assign/comparison with invalid strings should fail 2) warnings are errors in test suite to make sure we don't re-introduce strings. This is the continuation of dask#3853 Maybe Cluster in cluster.py should also get status as a property
Turn this into an error in the test suite. We want to make sure of various things: 1) original behavior of comparing/assign with string still works but emit warnings. 1b) assigning strings convert to proper enum variant. 2) assign/comparison with invalid strings should fail 2) warnings are errors in test suite to make sure we don't re-introduce strings. This is the continuation of dask#3853 Maybe Cluster in cluster.py should also get status as a property
Turn this into an error in the test suite. We want to make sure of various things: 1) original behavior of comparing/assign with string still works but emit warnings. 1b) assigning strings convert to proper enum variant. 2) assign/comparison with invalid strings should fail 2) warnings are errors in test suite to make sure we don't re-introduce strings. This is the continuation of #3853 Maybe Cluster in cluster.py should also get status as a property
Looking at the code it was hard to find all the states a worker, nanny
scheduler could be
in, and in particular it is easy to forget 'closing_gracefully' for
example when skimming over a
self.status.startswith('clos'), and italso hard to fine that in some places the status can be "None", "init",
"starting" and "stopped"
So I just took a pass at seeing in how many places status was
assigned/compared in worker, nanny and scheduler and replace that by an
Enum I'm assuming the status each can see is not shared so we end up
I thus tried to do separate Status Enums for Scheduler/Nanny/Worker.
Note that even if Enums have values for their variants, the variants do
not compare equal to their values. That is to say
Status.running =! 'running', butStatus.running.value == 'running',So I implemented a custom
__eq__method that can compare withstrings/None for backward compatibility. It works both when the LHS or
RHS is str/None as the default impl of str/None forward to our
implementation. This also make sure the value you compare with is also
in the list of possible state.
The main other code changes were updating some of the comparisons to use
the enums instead of the strings directly, and replace the "startswith"
with the
inoperator, which is slightly more verbose.I believe despite being a bit more verbose, it is a bit more explicit and
may help navigate the codebase and if you want to go this route you
probably want to remove the custom comparison but that would be an API
change that could break other project.