Skip to content

Cancelled.__context__ can leak in from other tasks #2649

@njsmith

Description

@njsmith

Suppose task 1 does:

try:
    raise RuntimeError("whoops")
except Exception as task1_exc:
    my_cscope.cancel()

Meanwhile, task 2 is sitting in:

with my_cscope:
    await trio.sleep_forever()

Internally, my_cscope.cancel() marks the scope as cancelled and then recalculates state, which triggers a call to task2._attempt_delivery_of_any_pending_cancel(), which calls task2._attempt_abort(raise_cancel), which calls sleep_forever's abort_func, which succeeds. And since it succeeds, it then immediately calls reschedule(task2, outcome.capture(raise_cancel). Which invokes raise_cancel.

And since this is all happening way down inside task 1's call to my_cscope.cancel(), it's all happening with the Python interpreter in a state where task1_exc is marked as pending. So when raise_cancel does raise Cancelled._create(), then the new Cancelled exception in task 2 gets its __context__ set to task1_exc. Which is weird and unexpected.

(It's also inconsistent, because it only happens on one particular abort pathway -- if task 2 had been sleeping in an operation that wasn't cancellable, or had deferred cancellation like an IOCP/io_uring operation, then it would still eventually end up with a Cancelled exception but with __context__ = None.)

Given our existing APIs, I think the only reasonable solution is to tweak _attempt_abort so it does:

error = capture(raise_cancel)
error.error.__context__ = None
self._runner.reschedule(self, error)

It's also worth thinking a bit about how we got here: this only happens because of our weird API where we represent a pending cancellation as a callable that raises the actual exception. The reason for this is:

  • For operations that have deferred cancellation, like IOCP/io_uring, the abort_func has to request the cancellation, and then some time later it gets notified about whether the cancellation succeeded. If it did, then it now has to raise Cancelled. But:
    • Now we're in user code, without any special access to Trio's internals
    • Cancelled has no public constructor; and besides, the abort might have been caused by either a KeyboardInterrupt or a Cancelled, so we have to pass in the exception to the user code somehow
    • If it's a KeyboardInterrupt, those have exactly-once semantics, so the core code needs to know whether the user code ended up raising it or not
    • So that's why we end up representing the exception-to-be-raised as a function that does the raise -- so the core code can find out whether the exception is raised or not

In #733 we decided that we liked the idea of not injecting KI through the cancellation system, but instead injecting it into the root nursery, so that it acts like a regular exception and isn't intertwined with the cancellation system the same way. If we did that, then we could (eventually) change the abort_func interface so it just takes the exception to be raised (or just provide another way to raise Cancelled). And then this whole thing could have been avoided.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions