Skip to content

[AArch64] manual deinterleaving ld2 not recognized #181514

@folkertdev

Description

@folkertdev

A manual 16-bit ld4 (so normal load, then deinterleave with a shuffle) is recognized, and lowered as ld4. The same is for some odd reason not true for ld2, where more instructions are used.

https://godbolt.org/z/danjGfMb9

manual2:
        ldr     q1, [x0]
        ext     v2.16b, v1.16b, v1.16b, #8
        uzp1    v0.4h, v1.4h, v2.4h
        uzp2    v1.4h, v1.4h, v2.4h
        ret

intrin2:
        ld2     { v0.4h, v1.4h }, [x0]
        ret

manual4:
        ld4     { v0.4h, v1.4h, v2.4h, v3.4h }, [x0]
        stp     d0, d1, [x8]
        stp     d2, d3, [x8, #16]
        ret

intrin4:
        ld4     { v0.4h, v1.4h, v2.4h, v3.4h }, [x0]
        stp     d0, d1, [x8]
        stp     d2, d3, [x8, #16]
        ret

The issue is that the VectorCombinePass turns

  %0 = shufflevector <8 x i16> %tmp.sroa.0.0.copyload.i, <8 x i16> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
  %1 = shufflevector <8 x i16> %tmp.sroa.0.0.copyload.i, <8 x i16> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
  %2 = bitcast <4 x i16> %0 to <8 x i8>
  %3 = bitcast <4 x i16> %1 to <8 x i8>

into

  %0 = bitcast <8 x i16> %tmp.sroa.0.0.copyload.i to <16 x i8>
  %1 = shufflevector <16 x i8> %0, <16 x i8> poison, <8 x i32> <i32 0, i32 1, i32 4, i32 5, i32 8, i32 9, i32 12, i32 13>
  %2 = bitcast <8 x i16> %tmp.sroa.0.0.copyload.i to <16 x i8>
  %3 = shufflevector <16 x i8> %2, <16 x i8> poison, <8 x i32> <i32 2, i32 3, i32 6, i32 7, i32 10, i32 11, i32 14, i32 15>

that presumably breaks the ld2 pattern recognition.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions