Winch: Add splat instructions to x64 using AVX2 #10028

jeffcharles · 2025-01-15T22:33:58Z

Part of #8093. Implements the following instructions on x64 with AVX2 support:

i8x16.splat
i16x8.splat
i32x4.splat
i64x2.splat
f32x4.splat
f64x2.splat

github-actions · 2025-01-15T23:44:58Z

Subscribe to Label Action

cc @saulecabrera

This issue or pull request has been labeled: "winch"

Thus the following users have been cc'd because of the following labels:

saulecabrera: winch

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

saulecabrera · 2025-01-16T14:19:49Z

winch/codegen/src/isa/x64/asm.rs

@@ -168,6 +168,18 @@ impl From<OperandSize> for Option<ExtMode> {
    }
 }

+impl SplatKind {
+    /// Get the operand size for `vpbroadcast`.
+    pub(super) fn vpbroadcast_operand_size(&self) -> OperandSize {


Instead of making this specific to x64, can we instead call this lane_size or something along those lines and move the implementation to masm.rs? I don't think we'd only need this for x64.

saulecabrera · 2025-01-16T14:31:21Z

winch/codegen/src/isa/x64/masm.rs

+            // Results in the first 4 bytes and second 4 bytes being
+            // swapped and then the swapped bytes being copied.
+            // [d0, d1, d2, d3, d4, d5, d6, d7, ...] yields
+            // [d4, d5, d6, d7, d0, d1, d2, d3, d4, d5, d6, d7, d0, d1, d2, d3].
+            let mask = 0b0100_0100;


If I'm not wrong, this is the exact same comment/mask value used in load_splat. Could we extract this as a helper?

saulecabrera · 2025-01-16T14:39:09Z

winch/codegen/src/masm.rs

+    /// Takes the int in a `src` operand and replicates it across lanes of
+    /// `size` in a destination result.
+    fn splat_int(&mut self, context: &mut CodeGenContext<Emission>, size: SplatKind) -> Result<()>;
+
+    /// Takes the `src` operand and replicates it across lanes of `size` in
+    /// `dst`.
+    fn splat(&mut self, dst: WritableReg, src: RegImm, size: SplatKind) -> Result<()>;
+


Taking a look at the callsites, and trying to think about future backends, I'm tempted to suggest adding a single entry here, splat, defined as

fn splat(&mut self, context: &mut CodeGenContext, kind: SplatKind) { ... }

That's potentially the most flexible approach, and that would potentially make the definition more uniform across backends: as far as I can tell, the only reason why we have a special splat_int is because of the requirements in x64.

For that suggestion to work though, we'd probably need to extend SplatKind to encode the source type, (e.g., SplatKind::I8x16) -- I don't think this changes anything fundamental, since we'd still be able to derive the lane_size.

The one thing with changing SplatKind to encode an I or F is the v128.load*_splat instructions aren't necessarily operating on integers or floats. It's operating on whatever state was stored in the memory. That said, it should still be fine to have the visitor specify a SplatKind::I32x4 for v128.load32_splat even if it's operating on 4 floats, it'll still have the same result.

To be explicit, we could rename the current SplatKind to SplatLoadKind and introduce the type-aware-version in SplatKind. That would tighten the implementation and keep the semantics as close as possible to the Wasm semantics.

saulecabrera · 2025-01-16T14:41:35Z

winch/codegen/src/masm.rs

+        match self {
+            Imm::I32(n) => n.to_le_bytes().to_vec(),
+            Imm::I64(n) => n.to_le_bytes().to_vec(),
+            Imm::F32(n) => n.to_le_bytes().to_vec(),
+            Imm::F64(n) => n.to_le_bytes().to_vec(),
+            Imm::V128(n) => n.to_le_bytes().to_vec(),


Could you return a slice here instead? The constant pool will copy the bytes into a Vec<u8>, but returning a slice will at least prevent heap allocations in case this gets used for anything else that's not constant pool related.

To be more specific, using as_slice() instead of to_vec(), similar to the other uses of add_constant.

If I use as_slice() and return a &[u8], I get this message from the compiler: cannot return value referencing temporary value. Likely because the array that's created by to_le_bytes() gets dropped when the method returns. The arrays are also of different lengths between the variants. We could maybe use some unsafe code to create a reference to the underlying bytes. Otherwise I'm not sure how to avoid the heap allocation.

Given that this is not used for anything that the constant pool, I don't think we need to introduce any unsafe code. Perhaps adding a comment stating that this method heap allocates and it's intended to be used mostly with the constant pool?

Winch: Add splat instructions to x64 using AVX2

a2b025c

jeffcharles requested review from a team as code owners January 15, 2025 22:33

jeffcharles requested review from abrown and fitzgen and removed request for a team January 15, 2025 22:33

github-actions bot added the winch Winch issues or pull requests label Jan 15, 2025

saulecabrera requested review from saulecabrera and removed request for abrown and fitzgen January 16, 2025 10:48

saulecabrera reviewed Jan 16, 2025

View reviewed changes

jeffcharles added 6 commits January 16, 2025 15:36

Change vpbroadcast size to lane size

df4e387

Create helper for vpshuf mask

a7afe14

Change to single masm method for splatting

8033edb

Split out SplatLoadKind from SplatKind

8fb5384

Add comment about heap allocation

0791c99

Merge branch 'main' into winch-simd-splat

3295415

saulecabrera approved these changes Jan 16, 2025

View reviewed changes

saulecabrera added this pull request to the merge queue Jan 16, 2025

Merged via the queue into bytecodealliance:main with commit be6c6c8 Jan 16, 2025
39 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Winch: Add splat instructions to x64 using AVX2 #10028

Winch: Add splat instructions to x64 using AVX2 #10028

jeffcharles commented Jan 15, 2025

github-actions bot commented Jan 15, 2025

saulecabrera Jan 16, 2025

saulecabrera Jan 16, 2025

saulecabrera Jan 16, 2025 •

edited

Loading

jeffcharles Jan 16, 2025

saulecabrera Jan 16, 2025 •

edited

Loading

saulecabrera Jan 16, 2025

saulecabrera Jan 16, 2025

jeffcharles Jan 16, 2025

saulecabrera Jan 16, 2025

Winch: Add splat instructions to x64 using AVX2 #10028

Winch: Add splat instructions to x64 using AVX2 #10028

Conversation

jeffcharles commented Jan 15, 2025

github-actions bot commented Jan 15, 2025

Subscribe to Label Action

saulecabrera Jan 16, 2025

Choose a reason for hiding this comment

saulecabrera Jan 16, 2025

Choose a reason for hiding this comment

saulecabrera Jan 16, 2025 • edited Loading

Choose a reason for hiding this comment

jeffcharles Jan 16, 2025

Choose a reason for hiding this comment

saulecabrera Jan 16, 2025 • edited Loading

Choose a reason for hiding this comment

saulecabrera Jan 16, 2025

Choose a reason for hiding this comment

saulecabrera Jan 16, 2025

Choose a reason for hiding this comment

jeffcharles Jan 16, 2025

Choose a reason for hiding this comment

saulecabrera Jan 16, 2025

Choose a reason for hiding this comment

saulecabrera Jan 16, 2025 •

edited

Loading

saulecabrera Jan 16, 2025 •

edited

Loading