Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Winch: Add splat instructions to x64 using AVX2 #10028

Merged
merged 7 commits into from
Jan 16, 2025

Conversation

jeffcharles
Copy link
Contributor

Part of #8093. Implements the following instructions on x64 with AVX2 support:

  • i8x16.splat
  • i16x8.splat
  • i32x4.splat
  • i64x2.splat
  • f32x4.splat
  • f64x2.splat

@jeffcharles jeffcharles requested review from a team as code owners January 15, 2025 22:33
@jeffcharles jeffcharles requested review from abrown and fitzgen and removed request for a team January 15, 2025 22:33
@github-actions github-actions bot added the winch Winch issues or pull requests label Jan 15, 2025
Copy link

Subscribe to Label Action

cc @saulecabrera

This issue or pull request has been labeled: "winch"

Thus the following users have been cc'd because of the following labels:

  • saulecabrera: winch

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

@saulecabrera saulecabrera requested review from saulecabrera and removed request for abrown and fitzgen January 16, 2025 10:48
@@ -168,6 +168,18 @@ impl From<OperandSize> for Option<ExtMode> {
}
}

impl SplatKind {
/// Get the operand size for `vpbroadcast`.
pub(super) fn vpbroadcast_operand_size(&self) -> OperandSize {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of making this specific to x64, can we instead call this lane_size or something along those lines and move the implementation to masm.rs? I don't think we'd only need this for x64.

Comment on lines 1327 to 1331
// Results in the first 4 bytes and second 4 bytes being
// swapped and then the swapped bytes being copied.
// [d0, d1, d2, d3, d4, d5, d6, d7, ...] yields
// [d4, d5, d6, d7, d0, d1, d2, d3, d4, d5, d6, d7, d0, d1, d2, d3].
let mask = 0b0100_0100;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not wrong, this is the exact same comment/mask value used in load_splat. Could we extract this as a helper?

Comment on lines 1272 to 1279
/// Takes the int in a `src` operand and replicates it across lanes of
/// `size` in a destination result.
fn splat_int(&mut self, context: &mut CodeGenContext<Emission>, size: SplatKind) -> Result<()>;

/// Takes the `src` operand and replicates it across lanes of `size` in
/// `dst`.
fn splat(&mut self, dst: WritableReg, src: RegImm, size: SplatKind) -> Result<()>;

Copy link
Member

@saulecabrera saulecabrera Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking a look at the callsites, and trying to think about future backends, I'm tempted to suggest adding a single entry here, splat, defined as

fn splat(&mut self, context: &mut CodeGenContext, kind: SplatKind) {
  ...
}

That's potentially the most flexible approach, and that would potentially make the definition more uniform across backends: as far as I can tell, the only reason why we have a special splat_int is because of the requirements in x64.

For that suggestion to work though, we'd probably need to extend SplatKind to encode the source type, (e.g., SplatKind::I8x16) -- I don't think this changes anything fundamental, since we'd still be able to derive the lane_size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one thing with changing SplatKind to encode an I or F is the v128.load*_splat instructions aren't necessarily operating on integers or floats. It's operating on whatever state was stored in the memory. That said, it should still be fine to have the visitor specify a SplatKind::I32x4 for v128.load32_splat even if it's operating on 4 floats, it'll still have the same result.

Copy link
Member

@saulecabrera saulecabrera Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be explicit, we could rename the current SplatKind to SplatLoadKind and introduce the type-aware-version in SplatKind. That would tighten the implementation and keep the semantics as close as possible to the Wasm semantics.

Comment on lines +485 to +490
match self {
Imm::I32(n) => n.to_le_bytes().to_vec(),
Imm::I64(n) => n.to_le_bytes().to_vec(),
Imm::F32(n) => n.to_le_bytes().to_vec(),
Imm::F64(n) => n.to_le_bytes().to_vec(),
Imm::V128(n) => n.to_le_bytes().to_vec(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you return a slice here instead? The constant pool will copy the bytes into a Vec<u8>, but returning a slice will at least prevent heap allocations in case this gets used for anything else that's not constant pool related.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more specific, using as_slice() instead of to_vec(), similar to the other uses of add_constant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I use as_slice() and return a &[u8], I get this message from the compiler: cannot return value referencing temporary value. Likely because the array that's created by to_le_bytes() gets dropped when the method returns. The arrays are also of different lengths between the variants. We could maybe use some unsafe code to create a reference to the underlying bytes. Otherwise I'm not sure how to avoid the heap allocation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this is not used for anything that the constant pool, I don't think we need to introduce any unsafe code. Perhaps adding a comment stating that this method heap allocates and it's intended to be used mostly with the constant pool?

@saulecabrera saulecabrera added this pull request to the merge queue Jan 16, 2025
Merged via the queue into bytecodealliance:main with commit be6c6c8 Jan 16, 2025
39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
winch Winch issues or pull requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants