Long running occupancy #5395

fjetter · 2021-10-07T09:58:51Z

We're removing long running tasks from a workers occupancy. However, occasionally we're recalculating the worker occupancy to account for new information since the occupancy gets inaccurate over time. However, this reevaluation does not take into account long-running tasks since the scheduler does not further break the state processing down. Similar for the substate executing, we're tracking this on a per-worker base and we should do the same for the long-running substate

This builds on top of #5392 since I pulled together the occupancy calculation over there.

Fixes of this PR in last commit 9c20b32

Closes No parallelism with long-running, seceded tasks; high occupancy prevents work assignment? #5332
Tests added / passed
Passes pre-commit run --all-files

gjoseph92

LGTM, though I'm a little curious why the task needs to stay in processing on the WorkerState

gjoseph92 · 2021-12-08T23:32:11Z

distributed/scheduler.py


+    @ccall
+    @exceptval(check=False)
+    def _reevaluate_occupancy_worker(self, ws: WorkerState):


What was the reason for moving this to be a method instead of a standalone function? I agree methods are more readable, just not sure about the Cython consequences cc @jakirkham

just readability

Both will work. However there was extra overhead from the method call vs. the function call, which is why it was moved to a function. Given this gets called a lot, that overhead mattered in our benchmarks.

distributed/scheduler.py

distributed/tests/test_client.py

…m a worker client

jakirkham · 2021-12-15T17:15:52Z

distributed/stealing.py

+                self.scheduler._reevaluate_occupancy_worker(thief)
+                self.scheduler._reevaluate_occupancy_worker(victim)


Since it looks like _reevaluate_occupancy_worker was removed (even as a method), should we be doing something else here?

Ahh, I missed this one. I think this is why I put it in as a mathod to avoid cyclic imports

FWIW did like the idea you had of getting rid of this function if that is an option. Though admittedly there might need to be more thought on this test

Intuitively, it feels like this should not even be here since this interacts pretty deeply with the scheduler and the stealing thing should be an extension and not interact on this deep level. However, the only way to not do this here would be to solely rely on the eventual update via a callback. No idea how disruptive this would be

fjetter · 2021-12-17T13:32:07Z

Test failures due to

fjetter mentioned this pull request Oct 7, 2021

No parallelism with long-running, seceded tasks; high occupancy prevents work assignment? #5332

Closed

fjetter linked an issue Oct 7, 2021 that may be closed by this pull request

No parallelism with long-running, seceded tasks; high occupancy prevents work assignment? #5332

Closed

fjetter mentioned this pull request Oct 8, 2021

Fix a race condition which would allow a rescheduled task to be reported missing even though it is not #5160

Merged

fjetter force-pushed the long_running_occupancy branch from e1c1ee9 to c15051c Compare October 8, 2021 10:08

fjetter self-assigned this Oct 22, 2021

fjetter force-pushed the long_running_occupancy branch from 486719f to ab517ea Compare October 25, 2021 08:32

gjoseph92 reviewed Dec 8, 2021

View reviewed changes

fjetter added 4 commits December 10, 2021 15:35

Ensure long running tasks to not contribute to occupancy

98d6348

Ensure tasks are homogeneously spread across workers if scheduled fro…

01823b4

…m a worker client

get unknown task duration from config

fc8242f

Review comments

a151a4b

fjetter force-pushed the long_running_occupancy branch from ab517ea to a151a4b Compare December 10, 2021 14:38

jakirkham reviewed Dec 15, 2021

View reviewed changes

fjetter force-pushed the long_running_occupancy branch from 0eda3c9 to a151a4b Compare December 15, 2021 17:46

fjetter merged commit 3494c2b into dask:main Dec 17, 2021

fjetter deleted the long_running_occupancy branch December 17, 2021 13:32

jakirkham mentioned this pull request Feb 3, 2022

Drop support for cythonized scheduler #5685

Closed

fjetter added the stealing label Jun 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Long running occupancy #5395

Long running occupancy #5395

Uh oh!

fjetter commented Oct 7, 2021 •

edited

Loading

Uh oh!

gjoseph92 left a comment

Uh oh!

gjoseph92 Dec 8, 2021

Uh oh!

fjetter Dec 9, 2021

Uh oh!

jakirkham Dec 10, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jakirkham Dec 15, 2021

Uh oh!

fjetter Dec 15, 2021

Uh oh!

jakirkham Dec 15, 2021

Uh oh!

fjetter Dec 15, 2021

Uh oh!

fjetter commented Dec 17, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		self.scheduler._reevaluate_occupancy_worker(thief)
		self.scheduler._reevaluate_occupancy_worker(victim)

Uh oh!

Long running occupancy #5395

Long running occupancy #5395

Uh oh!

Conversation

fjetter commented Oct 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gjoseph92 left a comment

Choose a reason for hiding this comment

Uh oh!

gjoseph92 Dec 8, 2021

Choose a reason for hiding this comment

Uh oh!

fjetter Dec 9, 2021

Choose a reason for hiding this comment

Uh oh!

jakirkham Dec 10, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jakirkham Dec 15, 2021

Choose a reason for hiding this comment

Uh oh!

fjetter Dec 15, 2021

Choose a reason for hiding this comment

Uh oh!

jakirkham Dec 15, 2021

Choose a reason for hiding this comment

Uh oh!

fjetter Dec 15, 2021

Choose a reason for hiding this comment

Uh oh!

fjetter commented Dec 17, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fjetter commented Oct 7, 2021 •

edited

Loading