-
-
Notifications
You must be signed in to change notification settings - Fork 750
Closed
Labels
bugSomething is brokenSomething is brokendeadlockThe cluster appears to not make any progressThe cluster appears to not make any progressstabilityIssue or feature related to cluster stability (e.g. deadlock)Issue or feature related to cluster stability (e.g. deadlock)
Description
What happened:
If a task raises a BaseException (KeyboardInterrupt, SystemExit, etc.), the task will appear to be processing forever.
What you expected to happen:
The task should definitely not deadlock. But what should actually happen, I'm not sure. Could go two ways:
- The entire worker should shut down gracefully.
- We should catch them just like any other exceptions and error the task.
More discussion in comments.
Minimal Complete Verifiable Example:
In [1]: import distributed
In [2]: client = distributed.Client(n_workers=1)
In [3]: def raiser():
...: raise BaseException("this could be a KeyboardInterrupt!")
...:
In [4]: f = client.submit(raiser)
In [5]: Exception in callback IOLoop.add_future.<locals>.<lambda>(<Task finishe...dInterrupt!')>) at /Users/gabe/miniconda3/envs/dask-distributed/lib/python3.9/site-packages/tornado/ioloop.py:688
handle: <Handle IOLoop.add_future.<locals>.<lambda>(<Task finishe...dInterrupt!')>) at /Users/gabe/miniconda3/envs/dask-distributed/lib/python3.9/site-packages/tornado/ioloop.py:688>
Traceback (most recent call last):
File "/Users/gabe/miniconda3/envs/dask-distributed/lib/python3.9/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/Users/gabe/miniconda3/envs/dask-distributed/lib/python3.9/site-packages/tornado/ioloop.py", line 688, in <lambda>
lambda f: self._run_callback(functools.partial(callback, future))
File "/Users/gabe/miniconda3/envs/dask-distributed/lib/python3.9/site-packages/tornado/ioloop.py", line 741, in _run_callback
ret = callback()
File "/Users/gabe/miniconda3/envs/dask-distributed/lib/python3.9/site-packages/tornado/ioloop.py", line 765, in _discard_future_result
future.result()
File "/Users/gabe/dev/distributed/distributed/worker.py", line 3504, in execute
result = await self.loop.run_in_executor(
File "/Users/gabe/dev/distributed/distributed/_concurrent_futures_thread.py", line 65, in run
result = self.fn(*self.args, **self.kwargs)
File "/Users/gabe/dev/distributed/distributed/worker.py", line 4503, in apply_function
msg = apply_function_simple(function, args, kwargs, time_delay)
File "/Users/gabe/dev/distributed/distributed/worker.py", line 4525, in apply_function_simple
result = function(*args, **kwargs)
File "<ipython-input-3-7f701e80695b>", line 2, in raiser
BaseException: this could be a KeyboardInterrupt!
In [5]:
In [5]: client.processing()
Out[5]: {'tcp://127.0.0.1:58316': ('raiser-e3b7ab59305f9e4ddb4ecddd75c55f85',)}
In [6]: client.call_stack()
Out[6]:
{'tcp://127.0.0.1:58316': {'raiser-e3b7ab59305f9e4ddb4ecddd75c55f85': (' File "/Users/gabe/miniconda3/envs/dask-distributed/lib/python3.9/threading.py", line 912, in _bootstrap\n\tself._bootstrap_inner()\n',
' File "/Users/gabe/miniconda3/envs/dask-distributed/lib/python3.9/threading.py", line 954, in _bootstrap_inner\n\tself.run()\n',
' File "/Users/gabe/miniconda3/envs/dask-distributed/lib/python3.9/threading.py", line 892, in run\n\tself._target(*self._args, **self._kwargs)\n',
' File "/Users/gabe/dev/distributed/distributed/threadpoolexecutor.py", line 51, in _worker\n\ttask = work_queue.get(timeout=1)\n',
' File "/Users/gabe/miniconda3/envs/dask-distributed/lib/python3.9/queue.py", line 180, in get\n\tself.not_empty.wait(remaining)\n',
' File "/Users/gabe/miniconda3/envs/dask-distributed/lib/python3.9/threading.py", line 316, in wait\n\tgotit = waiter.acquire(True, timeout)\n')}}
In [7]: client.submit(lambda: 1).result(timeout=5) # the worker still works fine; just that task is stuck now
Out[7]: 1Anything else we need to know?:
Environment:
- Dask version: 2022.2.1
- Python version: 3.9.5
- Operating System: macOS
- Install method (conda, pip, source): source
tekumara
Metadata
Metadata
Assignees
Labels
bugSomething is brokenSomething is brokendeadlockThe cluster appears to not make any progressThe cluster appears to not make any progressstabilityIssue or feature related to cluster stability (e.g. deadlock)Issue or feature related to cluster stability (e.g. deadlock)