Skip to content

Improve codegen for inserting zeros. #53713

@tannergooding

Description

@tannergooding

As per https://github.com/dotnet/runtime/pull/53578/files#r645282551, we are currently emitting a pslddq followed by a psrldq to zero the 4th element.

This is decent codegen for the SSE/SSE2 baseline but is inefficient compared to other patterns we could generate for modern hardware.

We should likely replace this with the relevant logic from vector.WithElement(3, 0.0f), which can then generate a insertps xmm, xmm, 0b00_00_1000 which will preserve all existing values in xmm and zero the third element.

category:cq
theme:codegen
skill-level:expert
cost:small
impact:small

Metadata

Metadata

Assignees

No one assigned

    Labels

    Priority:3Work that is nice to havearea-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIoptimization

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions