Proposal
Today, intrinsics in cg_ssa are always given a place into which to write their results.
That means that the analysis work that otherwise allows emitting SSA values directly can't work with them.
For example, compare these: https://rust.godbolt.org/z/TbqYoP8Ya
pub fn demo1(num: u16) -> u16 {
bswap(num)
}
pub fn demo2(num: u16) -> u16 {
!num
}
the former using an intrinsic is forced to go through stack and back
define i16 @demo1(i16 %num) unnamed_addr {
start:
%0 = alloca [2 x i8], align 2
%1 = call i16 @llvm.bswap.i16(i16 %num)
store i16 %1, ptr %0, align 2 ; <--
%_0 = load i16, ptr %0, align 2 ; <--
ret i16 %_0
}
for no good reason, while the latter is a native MIR operator and doesn't
define i16 @demo2(i16 %num) unnamed_addr {
start:
%_0 = xor i16 %num, -1
ret i16 %_0
}
This MCP proposes that rather than always giving a PlaceRef in https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/mir/struct.FunctionCx.html#method.codegen_intrinsic_call, we split it in two:
- The normal case is that it will return an
OperandValue. Most intrinsics need only this, as they either return unit (like copy or prefetch_read_data) or one/two immediates (atomic_umin is only a scalar, arith_offset a thin pointer, the float intrinsics return floats, the vector intrinsics usually return a single vector immediate, etc).
- There will still be a secondary path for things like
volatile_load which might need OperandValue::Ref depending on the actual return type. This would be done purely by return type: such an intrinsic would be expected to return Scalar/ScalarPair when the type allows, but the always-passed-a-PlaceRef form would still be called for things like volatile_load<[u32; 10]>.
Doing this will (hopefully) improve codegen speed by reducing the number of allocas that need to be removed when optimizing (or emitted in unoptimized code), and importantly will reduce the incentive to move things into being MIR primitives when they don't need to be just to stop pessimizing codegen.
I would expect this come out roughly neutral in complexity. The existing intrinsics that hit the common path (https://github.com/rust-lang/rust/blob/e0cb264b814526acb82def4b5810e394a2ed294f/compiler/rustc_codegen_ssa/src/mir/intrinsic.rs#L591-L597) of just returning a single backend value get at most more complex by wrapping that in OperandValue::Scalar. But things that need to do more (like https://github.com/rust-lang/rust/blob/e0cb264b814526acb82def4b5810e394a2ed294f/compiler/rustc_codegen_ssa/src/mir/intrinsic.rs#L420-L425) that can't use the common path should get simpler as they can return OperandValue::Pair instead of needing to project and write into the output themselves.
NOT part of this MCP, but something we could consider in future, would be saying that the path from 2 is just removed entirely, in favour of either having the intrinsic add its own alloca and return OperandValue::Ref, or changing those few intrinsics that need it to be written with explicit out parameters. At this time I don't have confidence that either is certainly the correct path, so I'd leave those alone for now and reevaluate later once we see how this goes.
Mentors or Reviewers
I don't expect this to be complex, just enough churn that I wouldn't want to just submit PRs without getting a "sounds plausible" from people.
Unsure who would be interested in reviewing, maybe @workingjubilee ?
Process
The main points of the Major Change Process are as follows:
You can read more about Major Change Proposals on forge.
Proposal
Today, intrinsics in
cg_ssaare always given a place into which to write their results.That means that the analysis work that otherwise allows emitting SSA values directly can't work with them.
For example, compare these: https://rust.godbolt.org/z/TbqYoP8Ya
the former using an intrinsic is forced to go through stack and back
for no good reason, while the latter is a native MIR operator and doesn't
This MCP proposes that rather than always giving a
PlaceRefin https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/mir/struct.FunctionCx.html#method.codegen_intrinsic_call, we split it in two:OperandValue. Most intrinsics need only this, as they either return unit (likecopyorprefetch_read_data) or one/two immediates (atomic_uminis only a scalar,arith_offseta thin pointer, the float intrinsics return floats, the vector intrinsics usually return a single vector immediate, etc).volatile_loadwhich might needOperandValue::Refdepending on the actual return type. This would be done purely by return type: such an intrinsic would be expected to returnScalar/ScalarPairwhen the type allows, but the always-passed-a-PlaceRef form would still be called for things likevolatile_load<[u32; 10]>.Doing this will (hopefully) improve codegen speed by reducing the number of
allocas that need to be removed when optimizing (or emitted in unoptimized code), and importantly will reduce the incentive to move things into being MIR primitives when they don't need to be just to stop pessimizing codegen.I would expect this come out roughly neutral in complexity. The existing intrinsics that hit the common path (https://github.com/rust-lang/rust/blob/e0cb264b814526acb82def4b5810e394a2ed294f/compiler/rustc_codegen_ssa/src/mir/intrinsic.rs#L591-L597) of just returning a single backend value get at most more complex by wrapping that in
OperandValue::Scalar. But things that need to do more (like https://github.com/rust-lang/rust/blob/e0cb264b814526acb82def4b5810e394a2ed294f/compiler/rustc_codegen_ssa/src/mir/intrinsic.rs#L420-L425) that can't use the common path should get simpler as they can returnOperandValue::Pairinstead of needing to project and write into the output themselves.NOT part of this MCP, but something we could consider in future, would be saying that the path from 2 is just removed entirely, in favour of either having the intrinsic add its own alloca and return
OperandValue::Ref, or changing those few intrinsics that need it to be written with explicit out parameters. At this time I don't have confidence that either is certainly the correct path, so I'd leave those alone for now and reevaluate later once we see how this goes.Mentors or Reviewers
I don't expect this to be complex, just enough churn that I wouldn't want to just submit PRs without getting a "sounds plausible" from people.
Unsure who would be interested in reviewing, maybe @workingjubilee ?
Process
The main points of the Major Change Process are as follows:
@rustbot secondor kickoff a team FCP with@rfcbot fcp $RESOLUTION.You can read more about Major Change Proposals on forge.