Place-less cg_ssa intrinsics

# Proposal

Today, intrinsics in `cg_ssa` are always given a *place* into which to write their results.

That means that the analysis work that otherwise allows emitting SSA values directly can't work with them.

For example, compare these: <https://rust.godbolt.org/z/TbqYoP8Ya>
```rust
pub fn demo1(num: u16) -> u16 {
    bswap(num)
}
pub fn demo2(num: u16) -> u16 {
    !num
}
```
the former using an intrinsic is forced to go through stack and back
```llvm
define i16 @demo1(i16 %num) unnamed_addr {
start:
  %0 = alloca [2 x i8], align 2
  %1 = call i16 @llvm.bswap.i16(i16 %num)
  store i16 %1, ptr %0, align 2    ; <--
  %_0 = load i16, ptr %0, align 2    ; <--
  ret i16 %_0
}
```
for no good reason, while the latter is a native MIR operator and doesn't
```llvm
define i16 @demo2(i16 %num) unnamed_addr {
start:
  %_0 = xor i16 %num, -1
  ret i16 %_0
}
```

This MCP proposes that rather than always giving a `PlaceRef` in <https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/mir/struct.FunctionCx.html#method.codegen_intrinsic_call>, we split it in two:
1. The normal case is that it will return an `OperandValue`.  Most intrinsics need only this, as they either return unit (like `copy` or `prefetch_read_data`) or one/two immediates (`atomic_umin` is only a scalar, `arith_offset` a thin pointer, the float intrinsics return floats, the vector intrinsics usually return a single vector immediate, etc).
2. There will still be a secondary path for things like `volatile_load` which *might* need [`OperandValue::Ref`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/mir/operand/enum.OperandValue.html#variant.Ref) depending on the actual return type.  This would be done purely by return type: such an intrinsic would be expected to return `Scalar`/`ScalarPair` when the type allows, but the always-passed-a-PlaceRef form would still be called for things like `volatile_load<[u32; 10]>`.

Doing this will (hopefully) improve codegen speed by reducing the number of `alloca`s that need to be removed when optimizing (or emitted in unoptimized code), and importantly will reduce the incentive to move things into being MIR primitives when they don't need to be just to stop pessimizing codegen.

I would expect this come out roughly neutral in complexity.  The existing intrinsics that hit the common path (<https://github.com/rust-lang/rust/blob/e0cb264b814526acb82def4b5810e394a2ed294f/compiler/rustc_codegen_ssa/src/mir/intrinsic.rs#L591-L597>) of just returning a single backend value get at most more complex by wrapping that in `OperandValue::Scalar`.  But things that need to do more (like <https://github.com/rust-lang/rust/blob/e0cb264b814526acb82def4b5810e394a2ed294f/compiler/rustc_codegen_ssa/src/mir/intrinsic.rs#L420-L425>) that can't use the common path should get simpler as they can return `OperandValue::Pair` instead of needing to project and write into the output themselves.

NOT part of this MCP, but something we could consider in future, would be saying that the path from 2 is just removed entirely, in favour of either having the intrinsic add its own alloca and return `OperandValue::Ref`, or changing those few intrinsics that need it to be written with explicit out parameters.  At this time I don't have confidence that either is certainly the correct path, so I'd leave those alone for now and reevaluate later once we see how this goes.


# Mentors or Reviewers

I don't expect this to be complex, just enough churn that I wouldn't want to just submit PRs without getting a "sounds plausible" from people.

Unsure who would be interested in reviewing, maybe @workingjubilee ?


# Process

The main points of the [Major Change Process][MCP] are as follows:

* [x] File an issue describing the proposal.
* [ ] A compiler team member who is knowledgeable in the area can **second** by writing `@rustbot second` or kickoff a team FCP with `@rfcbot fcp $RESOLUTION`.
    * Refer to [Proposals, Approvals and Stabilization](https://forge.rust-lang.org/compiler/proposals-and-stabilization.html) docs for when a second is sufficient, or when a full team FCP is required.
* [ ] Once an MCP is seconded, the **Final Comment Period** begins.
    * Final Comment Period lasts for 10 days after all outstanding concerns are solved.
    * Outstanding concerns will block the Final Comment Period from finishing. Once all concerns are resolved, the 10 day countdown is restarted.
    * If no concerns are raised after 10 days since the resolution of the last outstanding concern, the MCP is considered **approved**.

You can read [more about Major Change Proposals on forge][MCP].

[MCP]: https://forge.rust-lang.org/compiler/proposals-and-stabilization.html#how-do-i-submit-an-mcp


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Place-less cg_ssa intrinsics #970

Proposal

Mentors or Reviewers

Process

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Place-less cg_ssa intrinsics #970

Description

Proposal

Mentors or Reviewers

Process

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions