Handle cancelled on_complete for host subtasks by jellevandenhooff · Pull Request #12632 · bytecodealliance/wasmtime

jellevandenhooff · 2026-02-21T01:43:37Z

While working on a program with many outgoing DNS requests that also got cancelled I ran into a race with subtask management:

crates/wasmtime/src/runtime/component/concurrent.rs:5108:37:
     called `Result::unwrap()` on an `Err` value: NotPresent

The included reproducing test fails before the fix and works after.

Code and commit message by Claude. It looks sane to me, but I am not sure about eating the error in the branch; please consider if this makes sense. If the commit message is too long I am happy to rewrite it human-style.

Per the component-model spec (CanonicalABI.md), `subtask.cancel` on a subtask that has already resolved collects the pending event and returns `RETURNED`. A subsequent `subtask.drop` is valid because `resolve_delivered` is true at that point. In the implementation, when an async-lowered host function's future completes, a `WorkerFunction` (on_complete) is scheduled via the high-priority work queue to lower the result and deliver the `Returned` event. Between the future completing and on_complete running, another work item in the same batch (e.g. a `ResumeFiber` delivering a different subtask's event) may allow the guest to `subtask.cancel` + `subtask.drop` this task, removing it from the table. When on_complete then runs, it tries to look up the deleted task's scope via `call_context`, causing a `NotPresent` panic in `validate_scope_exit`. Guard on_complete by checking whether the task still exists in the table and whether its `join_handle` is still present (taken by `subtask.cancel`). In either case, the guest already observed the resolution and `cancel_scope` released any outstanding borrows, so on_complete is a no-op. A new test (`cancel_completed_host_task_does_not_crash`) exercises the race deterministically: two async host functions that yield once then complete; the guest waits for the first, then cancels the second whose on_complete is still queued.

jellevandenhooff · 2026-02-21T06:08:00Z

Okay, then ran into a similar but related issue where a cancelled task's on_complete handler was able to steal a replacement task's on_complete. The test is kind of gnarly and I am concerned it's not deterministic. What it tries to show is that:

task A would run
task A gets cancelled
task B would run, with task A's original handle
task A's on_complete would run... and signal success to what is now task B's handle
task B's on_complete would never run
now the guest is very confused because B's results are garbage
The epoch fix seems clean, but I am sure there might be other approaches. Without the fixes in either commit both tests fail.

The previous guard checked `join_handle.is_none()` or table lookup failure, but this doesn't catch the case where a cancelled+dropped host task's table slot is reused by a new host task before the stale on_complete runs. The new entry has `join_handle = Some`, so the guard passes and the stale closure steals the new task's join_handle, writes to the wrong retptr, and fires a spurious Returned event. Add a monotonic `epoch` field to HostTask, incremented for each new host task. The on_complete closure captures the epoch at creation time and compares it against the current occupant's epoch. If they differ, the slot was reused and the closure bails out. Add a regression test that deterministically reproduces the slot reuse scenario using FuturesUnordered LIFO polling order.

alexcrichton · 2026-02-23T16:32:48Z

Thanks for the PR (and tests!)

Upon reading this it's actually related to what I was thinking of when I was reviewing the internals of #12631. I think the fix I have in mind there will resolve these issues too. So, like that PR, I'll work a bit locally and post back here with results. Many thanks for the report & tests & fix!

This commit refactors some of the internals of `subtask.cancel` with respect to host subtasks. Notably a few panics and semantic bugs are fixed here. The main bug was that host subtasks could be aborted but their completion might have still been queued up which would produce the result somewhere or assert that the task exists. Cancellation is changed to use `wait_for_event` to ensure that this completion is executed before `subtask.cancel` returns. This helps keep host subtasks looking more similar to guest subtasks in that respect. Co-authored-by: Jelle van den Hooff <[email protected]> Closes bytecodealliance#12631 Closes bytecodealliance#12632

This commit refactors some of the internals of `subtask.cancel` with respect to host subtasks. Notably a few panics and semantic bugs are fixed here. The main bug was that host subtasks could be aborted but their completion might have still been queued up which would produce the result somewhere or assert that the task exists. Cancellation is changed to use `wait_for_event` to ensure that this completion is executed before `subtask.cancel` returns. This helps keep host subtasks looking more similar to guest subtasks in that respect. Closes bytecodealliance#12631 Closes bytecodealliance#12632 Co-authored-by: Jelle van den Hooff <[email protected]>

alexcrichton · 2026-02-23T18:00:26Z

Ok I've pushed up a "more official fix" to #12640 which includes the tests here and should resolve them. Thanks again @jellevandenhooff!

This commit refactors some of the internals of `subtask.cancel` with respect to host subtasks. Notably a few panics and semantic bugs are fixed here. The main bug was that host subtasks could be aborted but their completion might have still been queued up which would produce the result somewhere or assert that the task exists. Cancellation is changed to use `wait_for_event` to ensure that this completion is executed before `subtask.cancel` returns. This helps keep host subtasks looking more similar to guest subtasks in that respect. Closes #12631 Closes #12632 Co-authored-by: Jelle van den Hooff <[email protected]>

jellevandenhooff requested a review from a team as a code owner February 21, 2026 01:43

jellevandenhooff requested review from pchickey and removed request for a team February 21, 2026 01:43

github-actions bot added the wasmtime:api Related to the API of the `wasmtime` crate itself label Feb 21, 2026

jellevandenhooff force-pushed the fix-on-complete-after-cancel branch from fd4e5f8 to 25f1380 Compare February 21, 2026 06:06

jellevandenhooff changed the title ~~Skip on_complete for already-cancelled host subtasks~~ Handle cancelled on_complete for host subtasks Feb 21, 2026

jellevandenhooff force-pushed the fix-on-complete-after-cancel branch from 25f1380 to 42ab9bf Compare February 21, 2026 06:23

alexcrichton mentioned this pull request Feb 23, 2026

Fix/improve host subtask cancellation #12640

Merged

jellevandenhooff closed this Feb 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle cancelled on_complete for host subtasks#12632

Handle cancelled on_complete for host subtasks#12632
jellevandenhooff wants to merge 2 commits intobytecodealliance:mainfrom
jellevandenhooff:fix-on-complete-after-cancel

jellevandenhooff commented Feb 21, 2026

Uh oh!

jellevandenhooff commented Feb 21, 2026

Uh oh!

alexcrichton commented Feb 23, 2026

Uh oh!

alexcrichton commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jellevandenhooff commented Feb 21, 2026

Uh oh!

jellevandenhooff commented Feb 21, 2026

Uh oh!

alexcrichton commented Feb 23, 2026

Uh oh!

alexcrichton commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants