Sve2 Scatters need a temp register for indices by a74nh · Pull Request #124865 · dotnet/runtime

a74nh · 2026-02-25T17:45:52Z

There are three forms of scatter instructions supported by CoreCLR

Vector of addresses
A single address plus a vector of indices (vector length offsets)
A single address plus a vector of byte offsets

There are encodings for all of these in SVE1.

SVE2 duplicates all the scatter instructions, providing non temporal versions of them. The encodings all match SVE1, except for the indices version, which is missing. This can be replicated by simply shifting the offsets before calling the instruction (and is exactly what happens in the equivalent C++ instrinsics).

Therefore, ensure there is a temp register to hold the shifted value.

Fixes dotnet#124750 There are three forms of scatter instructions supported by CoreCLR * Vector of addresses * A single address plus a vector of indices (vector length offsets) * A single address plus a vector of byte offsets There are encodings for all of these in SVE1. SVE2 duplicates all the scatter instructions, providing non temporal versions of them. The encodings all match SVE1, except for the indices version, which is missing. This can be replicated by simply shifting the offsets before calling the instruction (and is exactly what happens in the equivalent C++ instrinsics). Therefore, ensure there is a temp register to hold the shifted value.

a74nh · 2026-02-25T17:47:00Z

src/coreclr/jit/lsraarm64.cpp

    // Build any immediates
    BuildHWIntrinsicImmediate(intrinsicTree, intrin);

+    // Build any additional special cases


I really don't like special casing here (there are no other special cases in the function). Ideally I'd add a hwintrinsic flag, but we're running out of space for them.

dotnet-policy-service · 2026-02-25T17:47:03Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

a74nh · 2026-02-25T17:47:28Z

src/coreclr/jit/hwintrinsiccodegenarm64.cpp

                break;
            }

+            case NI_Sve2_Scatter16BitWithByteOffsetsNarrowingNonTemporal:


Split these out to make the code easier to read

a74nh · 2026-02-25T17:48:09Z

@dotnet/arm64-contrib @jakobbotsch

Copilot

Pull request overview

This PR updates the ARM64 JIT’s SVE2 non-temporal scatter codegen/LSRA to account for the missing “base + indices” encoding in SVE2 by materializing byte offsets via a shifted temporary register.

Changes:

Adds LSRA handling to reserve an internal FP/SIMD temp for certain SVE2 non-temporal scatters.
Updates SVE2 scatter non-temporal codegen to shift indices into a temp register before emitting the store.
Splits SVE2 “with byte offsets” scatter intrinsics into a separate codegen case that does not require index conversion.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
src/coreclr/jit/lsraarm64.cpp	Reserves an internal float/SIMD temp register for selected SVE2 non-temporal scatter intrinsics.
src/coreclr/jit/hwintrinsiccodegenarm64.cpp	Converts indices to byte offsets via `lsl` into a temp register for SVE2 non-temporal scatters; separates byte-offset variants.

Copilot · 2026-02-25T17:51:06Z

src/coreclr/jit/hwintrinsiccodegenarm64.cpp

+                    // SVE2 instruction only directly support byte offsets. Convert indices to bytes.
+                    regNumber tempReg = internalRegisters.GetSingle(node, RBM_ALLFLOAT);
                    if (intrin.id == NI_Sve2_Scatter16BitNarrowingNonTemporal)
                    {
-                        GetEmitter()->emitIns_R_R_I(INS_sve_lsl, emitSize, op3Reg, op3Reg, 1, opt);
+                        GetEmitter()->emitIns_R_R_I(INS_sve_lsl, emitSize, tempReg, op3Reg, 1, opt);


tempReg is always used as the offsets register for the indices-form SVE2 scatters, but it is only initialized for the 16-bit, 32-bit, and 64-bit (ScatterNonTemporal) cases. For NI_Sve2_Scatter8BitNarrowingNonTemporal, no shift/move is emitted, so tempReg contains an unrelated value when passed to the scatter instruction (and LSRA also doesn’t reserve an internal temp for this intrinsic). Handle the 8-bit case by using op3Reg directly (no temp needed) or by explicitly copying/initializing tempReg before use and ensuring LSRA reserves it.

Copilot · 2026-02-25T17:51:07Z

src/coreclr/jit/lsraarm64.cpp

+        case NI_Sve2_Scatter16BitNarrowingNonTemporal:
+        case NI_Sve2_Scatter32BitNarrowingNonTemporal:
+        case NI_Sve2_ScatterNonTemporal:
+            buildInternalFloatRegisterDefForNode(intrinsicTree, internalFloatRegCandidates());


LSRA reserves an internal float register for these SVE2 scatter intrinsics unconditionally, but codegen only needs the temp for the indices-form overload (!varTypeIsSIMD(intrin.op2->gtType)). Reserving an unused internal register for the vector-of-addresses form increases register pressure and can cause unnecessary spills; consider gating buildInternalFloatRegisterDefForNode on the operand type (and only for the IDs that actually need shifting).

Suggested change

buildInternalFloatRegisterDefForNode(intrinsicTree, internalFloatRegCandidates());

// For these SVE2 scatter intrinsics, an internal float register is only

// required for the indices-form overload (where the second operand is

// not a SIMD type). Avoid reserving it for the vector-of-addresses

// form to reduce register pressure.

if (!varTypeIsSIMD(intrin.op2->gtType))

{

buildInternalFloatRegisterDefForNode(intrinsicTree, internalFloatRegCandidates());

}

Copilot · 2026-02-25T17:51:07Z

src/coreclr/jit/hwintrinsiccodegenarm64.cpp

                    assert(intrin.numOperands == 4);

-                    // Calculate the byte offsets if using indices.
+                    // SVE2 instruction only directly support byte offsets. Convert indices to bytes.


Grammar: "instruction only directly support" should be "instruction only directly supports".

Suggested change

// SVE2 instruction only directly support byte offsets. Convert indices to bytes.

// SVE2 instruction only directly supports byte offsets. Convert indices to bytes.

Copilot AI review requested due to automatic review settings February 25, 2026 17:45

dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Feb 25, 2026

github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 25, 2026

a74nh commented Feb 25, 2026

View reviewed changes

Copilot started reviewing on behalf of a74nh February 25, 2026 17:47 View session

a74nh commented Feb 25, 2026

View reviewed changes

Copilot AI reviewed Feb 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sve2 Scatters need a temp register for indices#124865

Sve2 Scatters need a temp register for indices#124865
a74nh wants to merge 1 commit intodotnet:mainfrom
a74nh:ntscatter2_github

a74nh commented Feb 25, 2026

Uh oh!

a74nh Feb 25, 2026

Uh oh!

dotnet-policy-service bot commented Feb 25, 2026

Uh oh!

a74nh Feb 25, 2026

Uh oh!

a74nh commented Feb 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-            buildInternalFloatRegisterDefForNode(intrinsicTree, internalFloatRegCandidates());
+            // For these SVE2 scatter intrinsics, an internal float register is only
+            // required for the indices-form overload (where the second operand is
+            // not a SIMD type). Avoid reserving it for the vector-of-addresses
+            // form to reduce register pressure.
+            if (!varTypeIsSIMD(intrin.op2->gtType))
+            {
+                buildInternalFloatRegisterDefForNode(intrinsicTree, internalFloatRegCandidates());
+            }

	// SVE2 instruction only directly support byte offsets. Convert indices to bytes.
	// SVE2 instruction only directly supports byte offsets. Convert indices to bytes.

Conversation

a74nh commented Feb 25, 2026

Uh oh!

a74nh Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

dotnet-policy-service bot commented Feb 25, 2026

Uh oh!

a74nh Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

a74nh commented Feb 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants