[AArch64][SVE] Avoid redundant extend of unsigned i8/i16 extracts.#165863
Merged
[AArch64][SVE] Avoid redundant extend of unsigned i8/i16 extracts.#165863
Conversation
Extracts of unsigned i8 or i16 elements from the the bottom 128-bits of
a scalable register lead to the zero-extend being transformed to an AND
mask. The mask is redundant since UMOV already zeroes the high bits of
the destination register.
For example:
```c
int foo(svuint8_t x) {
return x[3];
}
```
Currently:
```gas
foo:
umov w8, v0.b[3]
and w0, w8, #0xff
ret
```
Becomes:
```
foo:
umov w0, v0.b[3]
ret
```
Member
|
@llvm/pr-subscribers-backend-aarch64 Author: Ricardo Jesus (rj-jesus) ChangesExtracts of unsigned i8 or i16 elements from the bottom 128 bits of a scalable register lead to the implied zero-extend being transformed to an AND mask. The mask is redundant since UMOV already zeroes the high bits of the destination register. For example: int foo(svuint8_t x) {
return x[3];
}Currently: foo:
umov w8, v0.b[3]
and w0, w8, #<!-- -->0xff
retBecomes: 2 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
index 3b268dcbca600..6933303037716 100644
--- a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
@@ -3592,6 +3592,19 @@ let Predicates = [HasSVE_or_SME] in {
def : Pat<(sext (i32 (vector_extract nxv4i32:$vec, VectorIndexS:$index))),
(SMOVvi32to64 (v4i32 (EXTRACT_SUBREG ZPR:$vec, zsub)), VectorIndexS:$index)>;
+
+ // Extracts of ``unsigned'' i8 or i16 elements lead to the zero-extend being
+ // transformed to an AND mask. The mask is redundant since UMOV already zeroes
+ // the high bits of the destination register.
+ // We do something similar in the Neon versions of these patterns.
+ def : Pat<(i32 (and (vector_extract nxv16i8:$vec, VectorIndexB:$index), 0xff)),
+ (UMOVvi8 (v16i8 (EXTRACT_SUBREG ZPR:$vec, zsub)), VectorIndexB:$index)>;
+ def : Pat<(i32 (and (vector_extract nxv8i16:$vec, VectorIndexH:$index), 0xffff)),
+ (UMOVvi16 (v8i16 (EXTRACT_SUBREG ZPR:$vec, zsub)), VectorIndexH:$index)>;
+ def : Pat<(i64 (and (i64 (anyext (i32 (vector_extract nxv16i8:$vec, VectorIndexB:$index)))), (i64 0xff))),
+ (SUBREG_TO_REG (i64 0), (i32 (UMOVvi8 (v16i8 (EXTRACT_SUBREG ZPR:$vec, zsub)), VectorIndexB:$index)), sub_32)>;
+ def : Pat<(i64 (and (i64 (anyext (i32 (vector_extract nxv8i16:$vec, VectorIndexH:$index)))), (i64 0xffff))),
+ (SUBREG_TO_REG (i64 0), (i32 (UMOVvi16 (v8i16 (EXTRACT_SUBREG ZPR:$vec, zsub)), VectorIndexH:$index)), sub_32)>;
} // End HasNEON
// Extract first element from vector.
diff --git a/llvm/test/CodeGen/AArch64/sve-extract-element.ll b/llvm/test/CodeGen/AArch64/sve-extract-element.ll
index c340df1385124..0cc2e04bfb315 100644
--- a/llvm/test/CodeGen/AArch64/sve-extract-element.ll
+++ b/llvm/test/CodeGen/AArch64/sve-extract-element.ll
@@ -12,6 +12,26 @@ define i8 @test_lane0_16xi8(<vscale x 16 x i8> %a) #0 {
ret i8 %b
}
+define i32 @test_lane0_16xi8_zext_i32(<vscale x 16 x i8> %a) #0 {
+; CHECK-LABEL: test_lane0_16xi8_zext_i32:
+; CHECK: // %bb.0:
+; CHECK-NEXT: umov w0, v0.b[0]
+; CHECK-NEXT: ret
+ %b = extractelement <vscale x 16 x i8> %a, i32 0
+ %c = zext i8 %b to i32
+ ret i32 %c
+}
+
+define i64 @test_lane0_16xi8_zext_i64(<vscale x 16 x i8> %a) #0 {
+; CHECK-LABEL: test_lane0_16xi8_zext_i64:
+; CHECK: // %bb.0:
+; CHECK-NEXT: umov w0, v0.b[0]
+; CHECK-NEXT: ret
+ %b = extractelement <vscale x 16 x i8> %a, i32 0
+ %c = zext i8 %b to i64
+ ret i64 %c
+}
+
define i8 @test_lane15_16xi8(<vscale x 16 x i8> %a) #0 {
; CHECK-LABEL: test_lane15_16xi8:
; CHECK: // %bb.0:
@@ -21,6 +41,26 @@ define i8 @test_lane15_16xi8(<vscale x 16 x i8> %a) #0 {
ret i8 %b
}
+define i32 @test_lane15_16xi8_zext_i32(<vscale x 16 x i8> %a) #0 {
+; CHECK-LABEL: test_lane15_16xi8_zext_i32:
+; CHECK: // %bb.0:
+; CHECK-NEXT: umov w0, v0.b[15]
+; CHECK-NEXT: ret
+ %b = extractelement <vscale x 16 x i8> %a, i32 15
+ %c = zext i8 %b to i32
+ ret i32 %c
+}
+
+define i64 @test_lane15_16xi8_zext_i64(<vscale x 16 x i8> %a) #0 {
+; CHECK-LABEL: test_lane15_16xi8_zext_i64:
+; CHECK: // %bb.0:
+; CHECK-NEXT: umov w0, v0.b[15]
+; CHECK-NEXT: ret
+ %b = extractelement <vscale x 16 x i8> %a, i32 15
+ %c = zext i8 %b to i64
+ ret i64 %c
+}
+
define i8 @test_lane16_16xi8(<vscale x 16 x i8> %a) #0 {
; CHECK-LABEL: test_lane16_16xi8:
; CHECK: // %bb.0:
@@ -31,6 +71,32 @@ define i8 @test_lane16_16xi8(<vscale x 16 x i8> %a) #0 {
ret i8 %b
}
+; FIXME: FMOV+AND -> UMOV.
+define i32 @test_lane16_16xi8_zext_i32(<vscale x 16 x i8> %a) #0 {
+; CHECK-LABEL: test_lane16_16xi8_zext_i32:
+; CHECK: // %bb.0:
+; CHECK-NEXT: mov z0.b, z0.b[16]
+; CHECK-NEXT: fmov w8, s0
+; CHECK-NEXT: and w0, w8, #0xff
+; CHECK-NEXT: ret
+ %b = extractelement <vscale x 16 x i8> %a, i32 16
+ %c = zext i8 %b to i32
+ ret i32 %c
+}
+
+; FIXME: FMOV+AND -> UMOV.
+define i64 @test_lane16_16xi8_zext_i64(<vscale x 16 x i8> %a) #0 {
+; CHECK-LABEL: test_lane16_16xi8_zext_i64:
+; CHECK: // %bb.0:
+; CHECK-NEXT: mov z0.b, z0.b[16]
+; CHECK-NEXT: fmov w8, s0
+; CHECK-NEXT: and x0, x8, #0xff
+; CHECK-NEXT: ret
+ %b = extractelement <vscale x 16 x i8> %a, i32 16
+ %c = zext i8 %b to i64
+ ret i64 %c
+}
+
define i16 @test_lane0_8xi16(<vscale x 8 x i16> %a) #0 {
; CHECK-LABEL: test_lane0_8xi16:
; CHECK: // %bb.0:
@@ -40,6 +106,26 @@ define i16 @test_lane0_8xi16(<vscale x 8 x i16> %a) #0 {
ret i16 %b
}
+define i32 @test_lane0_8xi16_zext_i32(<vscale x 8 x i16> %a) #0 {
+; CHECK-LABEL: test_lane0_8xi16_zext_i32:
+; CHECK: // %bb.0:
+; CHECK-NEXT: umov w0, v0.h[0]
+; CHECK-NEXT: ret
+ %b = extractelement <vscale x 8 x i16> %a, i32 0
+ %c = zext i16 %b to i32
+ ret i32 %c
+}
+
+define i64 @test_lane0_8xi16_zext_i64(<vscale x 8 x i16> %a) #0 {
+; CHECK-LABEL: test_lane0_8xi16_zext_i64:
+; CHECK: // %bb.0:
+; CHECK-NEXT: umov w0, v0.h[0]
+; CHECK-NEXT: ret
+ %b = extractelement <vscale x 8 x i16> %a, i32 0
+ %c = zext i16 %b to i64
+ ret i64 %c
+}
+
define i16 @test_lane7_8xi16(<vscale x 8 x i16> %a) #0 {
; CHECK-LABEL: test_lane7_8xi16:
; CHECK: // %bb.0:
@@ -49,6 +135,26 @@ define i16 @test_lane7_8xi16(<vscale x 8 x i16> %a) #0 {
ret i16 %b
}
+define i32 @test_lane7_8xi16_zext_i32(<vscale x 8 x i16> %a) #0 {
+; CHECK-LABEL: test_lane7_8xi16_zext_i32:
+; CHECK: // %bb.0:
+; CHECK-NEXT: umov w0, v0.h[7]
+; CHECK-NEXT: ret
+ %b = extractelement <vscale x 8 x i16> %a, i32 7
+ %c = zext i16 %b to i32
+ ret i32 %c
+}
+
+define i64 @test_lane7_8xi16_zext_i64(<vscale x 8 x i16> %a) #0 {
+; CHECK-LABEL: test_lane7_8xi16_zext_i64:
+; CHECK: // %bb.0:
+; CHECK-NEXT: umov w0, v0.h[7]
+; CHECK-NEXT: ret
+ %b = extractelement <vscale x 8 x i16> %a, i32 7
+ %c = zext i16 %b to i64
+ ret i64 %c
+}
+
define i16 @test_lane8_8xi16(<vscale x 8 x i16> %a) #0 {
; CHECK-LABEL: test_lane8_8xi16:
; CHECK: // %bb.0:
@@ -59,6 +165,32 @@ define i16 @test_lane8_8xi16(<vscale x 8 x i16> %a) #0 {
ret i16 %b
}
+; FIXME: FMOV+AND -> UMOV.
+define i32 @test_lane8_8xi16_zext_i32(<vscale x 8 x i16> %a) #0 {
+; CHECK-LABEL: test_lane8_8xi16_zext_i32:
+; CHECK: // %bb.0:
+; CHECK-NEXT: mov z0.h, z0.h[8]
+; CHECK-NEXT: fmov w8, s0
+; CHECK-NEXT: and w0, w8, #0xffff
+; CHECK-NEXT: ret
+ %b = extractelement <vscale x 8 x i16> %a, i32 8
+ %c = zext i16 %b to i32
+ ret i32 %c
+}
+
+; FIXME: FMOV+AND -> UMOV.
+define i64 @test_lane8_8xi16_zext_i64(<vscale x 8 x i16> %a) #0 {
+; CHECK-LABEL: test_lane8_8xi16_zext_i64:
+; CHECK: // %bb.0:
+; CHECK-NEXT: mov z0.h, z0.h[8]
+; CHECK-NEXT: fmov w8, s0
+; CHECK-NEXT: and x0, x8, #0xffff
+; CHECK-NEXT: ret
+ %b = extractelement <vscale x 8 x i16> %a, i32 8
+ %c = zext i16 %b to i64
+ ret i64 %c
+}
+
define i32 @test_lane0_4xi32(<vscale x 4 x i32> %a) #0 {
; CHECK-LABEL: test_lane0_4xi32:
; CHECK: // %bb.0:
|
paulwalker-arm
approved these changes
Nov 7, 2025
ckoparkar
added a commit
to ckoparkar/llvm-project
that referenced
this pull request
Nov 10, 2025
* main: (1028 commits) [clang][DebugInfo] Attach `DISubprogram` to additional call variants (llvm#166202) [C2y] Claim nonconformance to WG14 N3348 (llvm#166966) [X86] 2012-01-10-UndefExceptionEdge.ll - regenerate test checks (llvm#167307) Remove unused standard headers: <string>, <optional>, <numeric>, <tuple> (llvm#167232) [DebugInfo] Add Verifier check for incorrectly-scoped retainedNodes (llvm#166855) [VPlan] Don't apply predication discount to non-originally-predicated blocks (llvm#160449) [libc++] Avoid overloaded `operator,` for (`T`, `Iter`) cases (llvm#161049) [tools][llc] Make save-stats.ll test target independent (llvm#167238) [AArch64] Fallback to PRFUM for PRFM with negative or unaligned offset (llvm#166756) [X86] ldexp-avx512.ll - add v8f16/v16f16/v32f16 test coverage for llvm#165694 (llvm#167294) [DropAssumes] Drop dereferenceable assumptions after vectorization. (llvm#166947) [VPlan] Simplify branch-cond with getVectorTripCount (llvm#155604) Remove unused <algorithm> inclusion (llvm#166942) [AArch64] Combine subtract with borrow to SBC. (llvm#165271) [AArch64][SVE] Avoid redundant extend of unsigned i8/i16 extracts. (llvm#165863) [SPIRV] Fix failing assertion in SPIRVAsmPrinter (llvm#166909) [libc++] Merge insert/emplace(const_iterator, Args...) implementations (llvm#166470) [libc++] Replace __libcpp_is_final with a variable template (llvm#167137) [gn build] Port 152bda7 [libc++] Replace the last uses of __tuple_types with __type_list (llvm#167214) ...
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Extracts of unsigned i8 or i16 elements from the bottom 128 bits of a scalable register lead to the implied zero-extend being transformed to an AND mask. The mask is redundant since UMOV already zeroes the high bits of the destination register.
For example:
Currently:
Becomes: