Skip to content

Conversation

@gjoseph92
Copy link
Collaborator

@gjoseph92 gjoseph92 self-assigned this Mar 24, 2022
@github-actions
Copy link
Contributor

Unit Test Results

       12 files  ±0         12 suites  ±0   5h 53m 8s ⏱️ - 35m 4s
  2 670 tests +1    2 584 ✔️  -   3    81 💤 ±0    3 +  3  2 🔥 +1 
15 932 runs  +6  15 050 ✔️  - 12  860 💤  - 2  20 +19  2 🔥 +1 

For more details on these failures and errors, see this check.

Results for commit 137bee4. ± Comparison against base commit 5c7d555.

@fjetter
Copy link
Member

fjetter commented Mar 25, 2022

There are related failures

FAILED distributed/tests/test_client.py::test_bad_tasks_fail - SystemExit: 0
FAILED distributed/tests/test_scheduler.py::test_log_tasks_during_restart - a...
FAILED distributed/tests/test_scheduler.py::test_failing_task_increments_suspicious

@mrocklin
Copy link
Member

I have some vague memory of being very careful to except Exception in Dask in order to handle KeyboardInterrupts better than historically. Based on the comments it sounds like you're already thinking about this. It might be a good idea (if you haven't already) to spin up a local cluster, hit Ctrl-C from time to time, and make sure that it still feels right.

@gjoseph92 gjoseph92 force-pushed the baseexception-in-tasks branch from 137bee4 to 1e3cec0 Compare November 18, 2022 03:33
@gjoseph92
Copy link
Collaborator Author

This passes besides distributed/tests/test_worker.py::test_close_while_executing[False].

The fundamental problem is that we don't distinguish between an async user task raising CancelledError because of a real error (HTTP request timed out, etc.), versus Worker.cancel was called, so whatever the user task was awaiting on raises CancelledError.

In the first case, the task should transition to error. In the second case (worker is currently closing), the task should not report as failed, because we cancelled it. The exception should be ignored and it should be rescheduled.

I don't know how we'd distinguish this. I also don't think we care too much because async user tasks are not a very complete feature right now. We just need some choice to resolve the inconsistency. The easiest but least correct answer is "just let it be erred".

I could imagine a if self.status == Status.closing conditional somewhere, but that feels very wrong to add to the state machine itself.

@gjoseph92
Copy link
Collaborator Author

Added a heuristic for detecting when we've cancelled user async tasks due to worker.close. It's a bit weird but should make CI pass?

@github-actions
Copy link
Contributor

github-actions bot commented Nov 18, 2022

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       15 files  ±  0         15 suites  ±0   6h 35m 50s ⏱️ + 2m 2s
  3 233 tests +  8    3 147 ✔️ +  7    83 💤  - 1  3 +2 
23 909 runs  +64  22 997 ✔️ +62  908 💤  - 1  4 +3 

For more details on these failures, see this check.

Results for commit 0bcc3a3. ± Comparison against base commit 2311d4d.

♻️ This comment has been updated with latest results.

)
return RescheduleEvent(
key=ts.key, stimulus_id=f"cancelled-by-worker-close-{time()}"
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a cosmetic tweak to this.
Could you add a unit test for it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already covered by distributed/tests/test_worker.py::test_close_while_executing[False]. I could write another test for it, but it would look almost exactly the same.

The CancelledError when not closing case is also covered by test_base_exception_in_task.

Copy link
Collaborator

@crusaderky crusaderky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@crusaderky crusaderky merged commit c23ee62 into dask:main Nov 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BaseException in task leads to task never completing

4 participants