-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Winch: Add trunc_sat
instructions for x64 with AVX
#10226
Winch: Add trunc_sat
instructions for x64 with AVX
#10226
Conversation
Subscribe to Label Action
This issue or pull request has been labeled: "winch"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
winch/codegen/src/isa/x64/masm.rs
Outdated
.xmm_vex_rr(AvxOpcode::Vpxor, reg.to_reg(), scratch.to_reg(), reg); | ||
} | ||
V128TruncSatKind::F32x4U => { | ||
let reg2 = writable!(context.any_fpr(self)?); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused about the lifetime of this register, I was expecting a context.free_reg
given that it seems to be temporary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
@@ -2543,6 +2543,183 @@ impl Masm for MacroAssembler { | |||
Ok(()) | |||
} | |||
|
|||
fn v128_trunc_sat( | |||
&mut self, | |||
context: &mut CodeGenContext<Emission>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually CodeGenContext
is passed when there's a special ISA needs for lowering, but I don't think that the case here? If I'm reading the code correctly, it seems that the only reason why we need the context is to pop a temporary register when kind == F32x4U
?
If that's the case, could we instead avoid passing the context and pass an Option<Reg>
at each call site where appropriate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we still want to allocate the extra register on AArch64 for F32x4U
when it isn't necessary on that ISA?
Otherwise I'm not sure how to handle only allocating the register on x64 if we're doing the allocation outside the x64 macroassembler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we still want to allocate the extra register on AArch64 for F32x4U when it isn't necessary on that ISA?
I was under the impression that we'd still need an extra reg for aarch64, is that not the case? More generally, maybe it's fine to leave it as is and we can refactor this once we add SIMD support for other backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT, we could implement this instruction with just a single emission of FCVTZU on AArch64 and we would only need the one register.
let reg = writable!(context.pop_to_reg(self, None)?.reg); | ||
let scratch = writable!(regs::scratch_xmm()); | ||
|
||
match kind { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each arm here is considerably long, could we extract each into it's own helper? I had a bit of a hard time following along, having each arm in its own function will make it easier (at least personally) to reason about each invariant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Part of #8093. Adds implementations for the following instructions:
i32x4.trunc_sat_f32x4_s
i32x4.trunc_sat_f32x4_u
i32x4.trunc_sat_f64x2_s_zero
i32x4.trunc_sat_f64x2_u_zero