Handle cancelled on_complete for host subtasks#12632
Closed
jellevandenhooff wants to merge 2 commits intobytecodealliance:mainfrom
Closed
Handle cancelled on_complete for host subtasks#12632jellevandenhooff wants to merge 2 commits intobytecodealliance:mainfrom
jellevandenhooff wants to merge 2 commits intobytecodealliance:mainfrom
Conversation
Per the component-model spec (CanonicalABI.md), `subtask.cancel` on a subtask that has already resolved collects the pending event and returns `RETURNED`. A subsequent `subtask.drop` is valid because `resolve_delivered` is true at that point. In the implementation, when an async-lowered host function's future completes, a `WorkerFunction` (on_complete) is scheduled via the high-priority work queue to lower the result and deliver the `Returned` event. Between the future completing and on_complete running, another work item in the same batch (e.g. a `ResumeFiber` delivering a different subtask's event) may allow the guest to `subtask.cancel` + `subtask.drop` this task, removing it from the table. When on_complete then runs, it tries to look up the deleted task's scope via `call_context`, causing a `NotPresent` panic in `validate_scope_exit`. Guard on_complete by checking whether the task still exists in the table and whether its `join_handle` is still present (taken by `subtask.cancel`). In either case, the guest already observed the resolution and `cancel_scope` released any outstanding borrows, so on_complete is a no-op. A new test (`cancel_completed_host_task_does_not_crash`) exercises the race deterministically: two async host functions that yield once then complete; the guest waits for the first, then cancels the second whose on_complete is still queued.
fd4e5f8 to
25f1380
Compare
Contributor
Author
|
Okay, then ran into a similar but related issue where a cancelled task's on_complete handler was able to steal a replacement task's on_complete. The test is kind of gnarly and I am concerned it's not deterministic. What it tries to show is that:
|
The previous guard checked `join_handle.is_none()` or table lookup failure, but this doesn't catch the case where a cancelled+dropped host task's table slot is reused by a new host task before the stale on_complete runs. The new entry has `join_handle = Some`, so the guard passes and the stale closure steals the new task's join_handle, writes to the wrong retptr, and fires a spurious Returned event. Add a monotonic `epoch` field to HostTask, incremented for each new host task. The on_complete closure captures the epoch at creation time and compares it against the current occupant's epoch. If they differ, the slot was reused and the closure bails out. Add a regression test that deterministically reproduces the slot reuse scenario using FuturesUnordered LIFO polling order.
25f1380 to
42ab9bf
Compare
Member
|
Thanks for the PR (and tests!) Upon reading this it's actually related to what I was thinking of when I was reviewing the internals of #12631. I think the fix I have in mind there will resolve these issues too. So, like that PR, I'll work a bit locally and post back here with results. Many thanks for the report & tests & fix! |
alexcrichton
added a commit
to alexcrichton/wasmtime
that referenced
this pull request
Feb 23, 2026
This commit refactors some of the internals of `subtask.cancel` with respect to host subtasks. Notably a few panics and semantic bugs are fixed here. The main bug was that host subtasks could be aborted but their completion might have still been queued up which would produce the result somewhere or assert that the task exists. Cancellation is changed to use `wait_for_event` to ensure that this completion is executed before `subtask.cancel` returns. This helps keep host subtasks looking more similar to guest subtasks in that respect. Co-authored-by: Jelle van den Hooff <[email protected]> Closes bytecodealliance#12631 Closes bytecodealliance#12632
alexcrichton
added a commit
to alexcrichton/wasmtime
that referenced
this pull request
Feb 23, 2026
This commit refactors some of the internals of `subtask.cancel` with respect to host subtasks. Notably a few panics and semantic bugs are fixed here. The main bug was that host subtasks could be aborted but their completion might have still been queued up which would produce the result somewhere or assert that the task exists. Cancellation is changed to use `wait_for_event` to ensure that this completion is executed before `subtask.cancel` returns. This helps keep host subtasks looking more similar to guest subtasks in that respect. Closes bytecodealliance#12631 Closes bytecodealliance#12632 Co-authored-by: Jelle van den Hooff <[email protected]>
Member
|
Ok I've pushed up a "more official fix" to #12640 which includes the tests here and should resolve them. Thanks again @jellevandenhooff! |
github-merge-queue bot
pushed a commit
that referenced
this pull request
Feb 23, 2026
This commit refactors some of the internals of `subtask.cancel` with respect to host subtasks. Notably a few panics and semantic bugs are fixed here. The main bug was that host subtasks could be aborted but their completion might have still been queued up which would produce the result somewhere or assert that the task exists. Cancellation is changed to use `wait_for_event` to ensure that this completion is executed before `subtask.cancel` returns. This helps keep host subtasks looking more similar to guest subtasks in that respect. Closes #12631 Closes #12632 Co-authored-by: Jelle van den Hooff <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
While working on a program with many outgoing DNS requests that also got cancelled I ran into a race with subtask management:
The included reproducing test fails before the fix and works after.
Code and commit message by Claude. It looks sane to me, but I am not sure about eating the error in the branch; please consider if this makes sense. If the commit message is too long I am happy to rewrite it human-style.