Skip to content

Conversation

@saucecontrol
Copy link
Member

@saucecontrol saucecontrol commented Nov 2, 2024

This adds containment support for the AVX-512 integer widening loads and fixes a few problems with the existing logic. In summary, it:

  • Replaces a faulty calculation that prevented containment of full vector loads for some of the added AVX-512 instructions.
  • Supports SIMD scalar load containment in more situations.
  • Prevents containment in places it was previously allowed and shouldn't have been.

Examples:

Enabled containment of scalar loads for all qualifying widening instructions

Scalar load containment was previously disabled for all widening instructions except movddup.

static unsafe Vector128<short> ShouldContainScalarLoad(byte* ptr)
{
    return Sse41.ConvertToVector128Int16(Sse2.LoadScalarVector128((ulong*)ptr).AsByte());
}

Before

vmovq    xmm0, qword ptr [rdx]
vpmovzxbw xmm0, xmm0

After

vpmovzxbw xmm0, qword ptr [rdx]

Disabled containment of scalar loads smaller than the instruction requirement

Scalar load containment was enabled unconditionally for movddup. This example should not be contained because it reads only 4 bytes, while the contained load will read 8.

static unsafe Vector128<double> ShouldNotContainDdup(byte* ptr)
{
    return Sse3.MoveAndDuplicate(Sse.LoadScalarVector128((float*)ptr).AsDouble());
}

Before

vmovddup xmm0, qword ptr [rdx]

After

vmovss   xmm0, dword ptr [rdx]
vmovddup xmm0, xmm0

Disabled containment of aligned loads smaller than 16 bytes in MinOpts

In MinOpts, we typically allow containment of aligned loads on non-VEX hardware because the instructions will fault on unaligned addresses, however this is not true for instructions that load smaller values. This example was previously contained with MinOpts, EnableAVX=0.

static unsafe Vector128<short> ShouldNotContainAlignedLoad(byte* ptr)
{
    return Sse41.ConvertToVector128Int16(Sse2.LoadAlignedVector128(ptr));
}

Before

pmovzxbw xmm0, qword ptr [rax]

After

movdqa   xmm0, xmmword ptr [rax]
pmovzxbw xmm0, xmm0

Remaining diffs are all improvements.

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Nov 2, 2024
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Nov 2, 2024
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Comment on lines -9467 to -9468
supportsAlignedSIMDLoads = !comp->canUseVexEncoding() || !comp->opts.MinOpts();
supportsUnalignedSIMDLoads = comp->canUseVexEncoding();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These instructions are AVX+ so they imply VEX encoding.

@tannergooding
Copy link
Member

CC. @dotnet/jit-contrib for secondary review.

@BruceForstall BruceForstall merged commit 975f4ea into dotnet:main Nov 21, 2024
107 of 108 checks passed
@saucecontrol saucecontrol deleted the widening-containment branch November 21, 2024 23:25
mikelle-rogers pushed a commit to mikelle-rogers/runtime that referenced this pull request Dec 10, 2024
* improve containment for widening intrinsics

* apply formatting patch

* tidying

* use tuple type for load size factor

* apply formatting patch

* fix build

* whitespace

* revert emitter changes
@github-actions github-actions bot locked and limited conversation to collaborators Dec 22, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants