Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Winch: Add trunc_sat instructions for x64 with AVX #10226

Merged
merged 5 commits into from
Feb 13, 2025

Conversation

jeffcharles
Copy link
Contributor

Part of #8093. Adds implementations for the following instructions:

  • i32x4.trunc_sat_f32x4_s
  • i32x4.trunc_sat_f32x4_u
  • i32x4.trunc_sat_f64x2_s_zero
  • i32x4.trunc_sat_f64x2_u_zero

@jeffcharles jeffcharles requested review from a team as code owners February 12, 2025 20:56
@jeffcharles jeffcharles requested review from cfallin and pchickey and removed request for a team February 12, 2025 20:56
@github-actions github-actions bot added the winch Winch issues or pull requests label Feb 12, 2025
Copy link

Subscribe to Label Action

cc @saulecabrera

This issue or pull request has been labeled: "winch"

Thus the following users have been cc'd because of the following labels:

  • saulecabrera: winch

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

.xmm_vex_rr(AvxOpcode::Vpxor, reg.to_reg(), scratch.to_reg(), reg);
}
V128TruncSatKind::F32x4U => {
let reg2 = writable!(context.any_fpr(self)?);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused about the lifetime of this register, I was expecting a context.free_reg given that it seems to be temporary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@@ -2543,6 +2543,183 @@ impl Masm for MacroAssembler {
Ok(())
}

fn v128_trunc_sat(
&mut self,
context: &mut CodeGenContext<Emission>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually CodeGenContext is passed when there's a special ISA needs for lowering, but I don't think that the case here? If I'm reading the code correctly, it seems that the only reason why we need the context is to pop a temporary register when kind == F32x4U?

If that's the case, could we instead avoid passing the context and pass an Option<Reg> at each call site where appropriate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we still want to allocate the extra register on AArch64 for F32x4U when it isn't necessary on that ISA?

Otherwise I'm not sure how to handle only allocating the register on x64 if we're doing the allocation outside the x64 macroassembler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we still want to allocate the extra register on AArch64 for F32x4U when it isn't necessary on that ISA?

I was under the impression that we'd still need an extra reg for aarch64, is that not the case? More generally, maybe it's fine to leave it as is and we can refactor this once we add SIMD support for other backends.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, we could implement this instruction with just a single emission of FCVTZU on AArch64 and we would only need the one register.

let reg = writable!(context.pop_to_reg(self, None)?.reg);
let scratch = writable!(regs::scratch_xmm());

match kind {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each arm here is considerably long, could we extract each into it's own helper? I had a bit of a hard time following along, having each arm in its own function will make it easier (at least personally) to reason about each invariant.

Copy link
Member

@saulecabrera saulecabrera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@saulecabrera saulecabrera added this pull request to the merge queue Feb 13, 2025
Merged via the queue into bytecodealliance:main with commit 7f93c1e Feb 13, 2025
39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
winch Winch issues or pull requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants