Skip to content

Conversation

@saucecontrol
Copy link
Member

@saucecontrol saucecontrol commented Dec 16, 2024

This is another round of improvements for xarch HWIntrinsics containment. In summary it:

  • Simplifies IsContainableHWIntrinsicOp to use tuple type for size calculation where possible.
  • Fixes several more places where a load was improperly contained when the instruction memory size requirement was larger than the load.
  • Enables containment in more places it would have been valid but was missed before.
  • Enables EVEX embedded broadcast for 64-bit constant vectors on x86.

Diffs look good.

Notable Fixes

var svf1 = Unsafe.Read<Vector128<float>>(floatTable.inArray1Ptr);
var svf2 = Unsafe.Read<Vector128<float>>(floatTable.inArray2Ptr);
var svf3 = Avx.Compare(svf1, svf2, FloatComparisonMode.OrderedEqualNonSignaling);
Unsafe.Write(floatTable.outArrayPtr, svf3);
var svd1 = Unsafe.Read<Vector128<double>>(doubleTable.inArray1Ptr);
var svd2 = Unsafe.Read<Vector128<double>>(doubleTable.inArray2Ptr);
var svd3 = Avx.Compare(svd1, svd2, FloatComparisonMode.OrderedEqualNonSignaling);

These were overreading before and are fixed now:

-       vmovups  xmm0, xmmword ptr [rbp-0x190]
-       vcmpps   ymm0, ymm0, ymmword ptr [rax], 0
+       vmovups  xmm0, xmmword ptr [rax]
+       vmovups  xmm1, xmmword ptr [rbp-0x190]
+       vcmpps   ymm0, ymm1, ymm0, 0

-       vmovups  xmm0, xmmword ptr [rbp-0x190]
-       vcmppd   ymm0, ymm0, ymmword ptr [rax], 0
+       vmovups  xmm0, xmmword ptr [rax]
+       vmovups  xmm1, xmmword ptr [rbp-0x190]
+       vcmppd   ymm0, ymm1, ymm0, 0

(actually, this was both an overread bug and an incorrect SIMD size bug -- both fixed now)

Similarly:

static unsafe Vector256<long> ShouldNotContainBroadcast(int* ptr)
{
    return Avx2.BroadcastScalarToVector256(Sse2.LoadScalarVector128(ptr).AsInt64());
}
-       vpbroadcastq ymm0, qword ptr [rdx]
+       vmovd        xmm0, dword ptr [rdx]
+       vpbroadcastq ymm0, ymm0

And:

static unsafe Vector128<double> ShouldNotOverReadCreateScalarUnsafe(float* ptr)
{
    return Vector128.CreateScalarUnsafe(Sse.LoadScalarVector128(ptr).AsDouble().ToScalar());
}
-       vmovsd   xmm0, qword ptr [rdx]
+       vmovss   xmm0, dword ptr [rdx]

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 16, 2024
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Dec 16, 2024
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@saucecontrol saucecontrol changed the title [JIT] Improve HWIntrinsic containment logic [JIT] Improve x86 HWIntrinsic containment Dec 16, 2024
Copy link
Member Author

@saucecontrol saucecontrol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tannergooding this should be ready now -- just needs outer loop run.

I'll follow up with the ToScalar/GetLower containment I mentioned in another PR since this one's already quite involved.

@saucecontrol saucecontrol marked this pull request as ready for review December 16, 2024 23:38
@tannergooding
Copy link
Member

/azp run runtime-coreclr jitstress-isas-x86

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@saucecontrol
Copy link
Member Author

saucecontrol commented Dec 17, 2024

jitstress failures are more variations of the bogus assert fixed in #92183

Codegen for the test method on main is

movups   xmm0, xmmword ptr [rcx+0x08]
movups   xmm1, xmmword ptr [reloc @RWD00]
movaps   xmm2, xmm0
pblendvb xmm0, xmm1
movups   xmmword ptr [rdx], xmm0
mov      rax, rdx
ret

PR codegen with the assert corrected is the also valid

movups   xmm0, xmmword ptr [rcx+0x08]
movaps   xmm1, xmm0
pblendvb xmm0, xmmword ptr [reloc @RWD00]
movups   xmmword ptr [rdx], xmm0
mov      rax, rdx
ret

Containment of CNS_VEC with non-VEX encoding is one of the improvements in the update.

@tannergooding
Copy link
Member

CC. @dotnet/jit-contrib for secondary review

@tannergooding tannergooding merged commit 08ea199 into dotnet:main Jan 10, 2025
111 of 116 checks passed
@saucecontrol saucecontrol deleted the containment2 branch January 10, 2025 02:10
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants