Cranelift: remove multiple return value support in order to allow feasible implementation of exception handling?

I am currently implementing exception handling for Cranelift ([WIP branch](https://github.com/cfallin/wasmtime/tree/lets-be-truly-exceptional)) and I believe I have hit a complexity wall that merits further discussion about Cranelift's IR design.

Currently, we permit signatures to have arbitrary numbers of return values, and we lower these to use stack slots. This adds a nontrivial bit of complexity to ABI handling, but we manage. It means that after a true `call` instruction, we may have loads that are nominally part of the callsite but are separate VCode instructions.

However, adding `try_call` instructions, which define return values as block-call args for their successors, has led me into a difficult spot when these extra "return-value instructions" exist -- I have tried all of the below approaches:

- We cannot emit these as instructions after the` try_call` in the same block, because the `try_call` is a terminator; it acts as a branch to either the normal-return target or exceptional return targets; this is a hard constraint and unlikely to change (e.g. via EBBs).
- Perhaps we could emit the loads as part of the one call VCode instruction. This is the path that was taken for a post-call SP adjustment for tail-call support. However:
  - We can't hold a `Vec<Inst>` inside the `CallInfo` or whatever: Pulley's instruction type is parameterized on a generic parameter that selects between 32- and 64-bit targets, we need to define the inst enum in ISLE, and ISLE doesn't have generics.
  - We can't add a notion of "PReg or stack offset" to the `CallRetPair`s, and emit loads directly: we need to support an arbitrarily large number of vreg definitions, and if they're all constrained as registers, this breaks down after we fill the machine registers (regalloc panics because the program is un-allocatable per its constraints).
    - We can't add these loads with "any" constraints because they we need to codegen loads from stack offsets to spillslots and at this point we are re-inventing regalloc's handling of the various cases of memory-to-memory moves with or without temporaries available.
- We can't split control-flow edges with critical-edge blocks and emit the retval instructions into the edge block when processing the main branch because VCode emission has very tight constraints about instruction lowering order, contiguous instruction IDs, and instruction ID ordering being the same as block ordering.
  - In the latest version of my WIP branch above I have a notion of "block instruction range overrides" and a post-patching mechanism that allows the branch lowering to enqueue a "prepend" for VCode finalization. Note that we lower backwards, so when we reach the `try_call` we've already seen the edge block; this is why it's a post-patch mechanism. Unfortunately this leads to a RA2 panic during liverange construction because we depend on the instruction ID ranges being ordered for a nice linear-time algorithm to be possible.
  - If we instead try to emit the retval insts when we see the edge block, that implies knowing everything about the call before we reach it; we need to set up context to lower an instruction from block B1 while processing block B2; infeasible.

In summary: I've explored I believe every branch of the possibility tree, and the constraints are infeasible. We're doing too much in too few passes and/or supporting too-general input. We need to relax something:

- We could support, at the base CLIF level, only function signatures that have returns that can be placed in registers, or at worst, a fixed number of returns that can be done in the "call pseudoinst" all with register defs. This would avoid all of the above by removing the notion of "retval instructions". This is a little unfortunate from the PoV of flexibility, but it's not too bad: it would mean that Wasmtime would basically inherit this functionality instead, explicitly allocate stackslots and pass pointers if more than one (or two) returns. In essence we're saying: let's legalize the sequence into separate CLIF instructions, so we don't have the complexity of lowering the one super-complex CLIF instruction.
- We could add a notion of "ABI legalization". Old-world Cranelift had this much more pervasively. Maybe we do it for arbitrary returns (and stackrets? and other weird corner cases we handle now?) to make core ABI code more manageable.
- We could go back to atomic-VCode-pseudoinst with loads as part of the try_call, "any" constraints, and allocate a temp reg as another def, and just live with the poor codegen.
- We could find a way through the above possibility-space that I've missed.
- We could support exceptions only for a limited set of signatures. I don't like this at all; it's unclear how we'd lower to it from Wasmtime.

Thoughts?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cranelift: remove multiple return value support in order to allow feasible implementation of exception handling? #10488

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Cranelift: remove multiple return value support in order to allow feasible implementation of exception handling? #10488

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions