[IA] Remove recursive [de]interleaving support#143875
Conversation
Now that the loop vectorizer emits just a single llvm.vector.[de]interleaveN intrinsic, we can remove the need to recognise recursively [de]interleaved intrinsics. No in-tree target currently has instructions to emit an interleaved access with a factor > 8, and I'm not aware of any other passes that will emit recursive interleave patterns, so this code is effectively dead. Some tests have been converted from the recursive form to a single intrinsic, and some others were deleted that are no longer needed, e.g. to do with the recursive tree. This closes off the work started in llvm#139893.
|
@llvm/pr-subscribers-llvm-transforms Author: Luke Lau (lukel97) ChangesNow that the loop vectorizer emits just a single llvm.vector.[de]interleaveN intrinsic after #141865, we can remove the need to recognise recursively [de]interleaved intrinsics. No in-tree target currently has instructions to emit an interleaved access with a factor > 8, and I'm not aware of any other passes that will emit recursive interleave patterns, so this code is effectively dead. Some tests have been converted from the recursive form to a single intrinsic, and some others were deleted that are no longer needed, e.g. to do with the recursive tree. This closes off the work started in #139893. Patch is 98.56 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/143875.diff 10 Files Affected:
diff --git a/llvm/lib/CodeGen/InterleavedAccessPass.cpp b/llvm/lib/CodeGen/InterleavedAccessPass.cpp
index 49f1504d244ed..9c4c86cebe7e5 100644
--- a/llvm/lib/CodeGen/InterleavedAccessPass.cpp
+++ b/llvm/lib/CodeGen/InterleavedAccessPass.cpp
@@ -629,173 +629,12 @@ static unsigned getIntrinsicFactor(const IntrinsicInst *II) {
}
}
-// For an (de)interleave tree like this:
-//
-// A C B D
-// |___| |___|
-// |_____|
-// |
-// A B C D
-//
-// We will get ABCD at the end while the leaf operands/results
-// are ACBD, which are also what we initially collected in
-// getVectorInterleaveFactor / getVectorDeinterleaveFactor. But TLI
-// hooks (e.g. lowerDeinterleaveIntrinsicToLoad) expect ABCD, so we need
-// to reorder them by interleaving these values.
-static void interleaveLeafValues(MutableArrayRef<Value *> SubLeaves) {
- unsigned NumLeaves = SubLeaves.size();
- assert(isPowerOf2_32(NumLeaves) && NumLeaves > 1);
- if (NumLeaves == 2)
- return;
-
- const unsigned HalfLeaves = NumLeaves / 2;
- // Visit the sub-trees.
- interleaveLeafValues(SubLeaves.take_front(HalfLeaves));
- interleaveLeafValues(SubLeaves.drop_front(HalfLeaves));
-
- SmallVector<Value *, 8> Buffer;
- // a0 a1 a2 a3 b0 b1 b2 b3
- // -> a0 b0 a1 b1 a2 b2 a3 b3
- for (unsigned i = 0U; i < NumLeaves; ++i)
- Buffer.push_back(SubLeaves[i / 2 + (i % 2 ? HalfLeaves : 0)]);
-
- llvm::copy(Buffer, SubLeaves.begin());
-}
-
-static bool
-getVectorInterleaveFactor(IntrinsicInst *II, SmallVectorImpl<Value *> &Operands,
- SmallVectorImpl<Instruction *> &DeadInsts) {
- assert(isInterleaveIntrinsic(II->getIntrinsicID()));
-
- // Visit with BFS
- SmallVector<IntrinsicInst *, 8> Queue;
- Queue.push_back(II);
- while (!Queue.empty()) {
- IntrinsicInst *Current = Queue.front();
- Queue.erase(Queue.begin());
-
- // All the intermediate intrinsics will be deleted.
- DeadInsts.push_back(Current);
-
- for (unsigned I = 0; I < getIntrinsicFactor(Current); ++I) {
- Value *Op = Current->getOperand(I);
- if (auto *OpII = dyn_cast<IntrinsicInst>(Op))
- if (OpII->getIntrinsicID() == Intrinsic::vector_interleave2) {
- Queue.push_back(OpII);
- continue;
- }
-
- // If this is not a perfectly balanced tree, the leaf
- // result types would be different.
- if (!Operands.empty() && Op->getType() != Operands.back()->getType())
- return false;
-
- Operands.push_back(Op);
- }
- }
-
- const unsigned Factor = Operands.size();
- // Currently we only recognize factors 2...8 and other powers of 2.
- // FIXME: should we assert here instead?
- if (Factor <= 1 ||
- (!isPowerOf2_32(Factor) && Factor != getIntrinsicFactor(II)))
- return false;
-
- // Recursively interleaved factors need to have their values reordered
- // TODO: Remove once the loop vectorizer no longer recursively interleaves
- // factors 4 + 8
- if (isPowerOf2_32(Factor) && getIntrinsicFactor(II) == 2)
- interleaveLeafValues(Operands);
- return true;
-}
-
-static bool
-getVectorDeinterleaveFactor(IntrinsicInst *II,
- SmallVectorImpl<Value *> &Results,
- SmallVectorImpl<Instruction *> &DeadInsts) {
- assert(isDeinterleaveIntrinsic(II->getIntrinsicID()));
- using namespace PatternMatch;
- if (!II->hasNUses(getIntrinsicFactor(II)))
- return false;
-
- // Visit with BFS
- SmallVector<IntrinsicInst *, 8> Queue;
- Queue.push_back(II);
- while (!Queue.empty()) {
- IntrinsicInst *Current = Queue.front();
- Queue.erase(Queue.begin());
- assert(Current->hasNUses(getIntrinsicFactor(Current)));
-
- // All the intermediate intrinsics will be deleted from the bottom-up.
- DeadInsts.insert(DeadInsts.begin(), Current);
-
- SmallVector<ExtractValueInst *> EVs(getIntrinsicFactor(Current), nullptr);
- for (User *Usr : Current->users()) {
- if (!isa<ExtractValueInst>(Usr))
- return 0;
-
- auto *EV = cast<ExtractValueInst>(Usr);
- // Intermediate ExtractValue instructions will also be deleted.
- DeadInsts.insert(DeadInsts.begin(), EV);
- ArrayRef<unsigned> Indices = EV->getIndices();
- if (Indices.size() != 1)
- return false;
-
- if (!EVs[Indices[0]])
- EVs[Indices[0]] = EV;
- else
- return false;
- }
-
- // We have legal indices. At this point we're either going
- // to continue the traversal or push the leaf values into Results.
- for (ExtractValueInst *EV : EVs) {
- // Continue the traversal. We're playing safe here and matching only the
- // expression consisting of a perfectly balanced binary tree in which all
- // intermediate values are only used once.
- if (EV->hasOneUse() &&
- match(EV->user_back(),
- m_Intrinsic<Intrinsic::vector_deinterleave2>()) &&
- EV->user_back()->hasNUses(2)) {
- auto *EVUsr = cast<IntrinsicInst>(EV->user_back());
- Queue.push_back(EVUsr);
- continue;
- }
-
- // If this is not a perfectly balanced tree, the leaf
- // result types would be different.
- if (!Results.empty() && EV->getType() != Results.back()->getType())
- return false;
-
- // Save the leaf value.
- Results.push_back(EV);
- }
- }
-
- const unsigned Factor = Results.size();
- // Currently we only recognize factors of 2...8 and other powers of 2.
- // FIXME: should we assert here instead?
- if (Factor <= 1 ||
- (!isPowerOf2_32(Factor) && Factor != getIntrinsicFactor(II)))
- return 0;
-
- // Recursively interleaved factors need to have their values reordered
- // TODO: Remove once the loop vectorizer no longer recursively interleaves
- // factors 4 + 8
- if (isPowerOf2_32(Factor) && getIntrinsicFactor(II) == 2)
- interleaveLeafValues(Results);
- return true;
-}
-
static Value *getMask(Value *WideMask, unsigned Factor,
ElementCount LeafValueEC) {
if (auto *IMI = dyn_cast<IntrinsicInst>(WideMask)) {
- SmallVector<Value *, 8> Operands;
- SmallVector<Instruction *, 8> DeadInsts;
- if (getVectorInterleaveFactor(IMI, Operands, DeadInsts)) {
- assert(!Operands.empty());
- if (Operands.size() == Factor && llvm::all_equal(Operands))
- return Operands[0];
+ if (isInterleaveIntrinsic(IMI->getIntrinsicID()) &&
+ getIntrinsicFactor(IMI) == Factor && llvm::all_equal(IMI->args())) {
+ return IMI->getArgOperand(0);
}
}
@@ -830,13 +669,19 @@ bool InterleavedAccessImpl::lowerDeinterleaveIntrinsic(
if (!LoadedVal->hasOneUse() || !isa<LoadInst, VPIntrinsic>(LoadedVal))
return false;
- SmallVector<Value *, 8> DeinterleaveValues;
- SmallVector<Instruction *, 8> DeinterleaveDeadInsts;
- if (!getVectorDeinterleaveFactor(DI, DeinterleaveValues,
- DeinterleaveDeadInsts))
+ const unsigned Factor = getIntrinsicFactor(DI);
+ if (!DI->hasNUses(Factor))
return false;
-
- const unsigned Factor = DeinterleaveValues.size();
+ SmallVector<Value *, 8> DeinterleaveValues(Factor);
+ for (auto *User : DI->users()) {
+ auto *Extract = dyn_cast<ExtractValueInst>(User);
+ if (!Extract || Extract->getNumIndices() != 1)
+ return false;
+ unsigned Idx = Extract->getIndices()[0];
+ if (DeinterleaveValues[Idx])
+ return false;
+ DeinterleaveValues[Idx] = Extract;
+ }
if (auto *VPLoad = dyn_cast<VPIntrinsic>(LoadedVal)) {
if (VPLoad->getIntrinsicID() != Intrinsic::vp_load)
@@ -869,7 +714,9 @@ bool InterleavedAccessImpl::lowerDeinterleaveIntrinsic(
return false;
}
- DeadInsts.insert_range(DeinterleaveDeadInsts);
+ for (Value *V : DeinterleaveValues)
+ DeadInsts.insert(cast<Instruction>(V));
+ DeadInsts.insert(DI);
// We now have a target-specific load, so delete the old one.
DeadInsts.insert(cast<Instruction>(LoadedVal));
return true;
@@ -883,12 +730,8 @@ bool InterleavedAccessImpl::lowerInterleaveIntrinsic(
if (!isa<StoreInst, VPIntrinsic>(StoredBy))
return false;
- SmallVector<Value *, 8> InterleaveValues;
- SmallVector<Instruction *, 8> InterleaveDeadInsts;
- if (!getVectorInterleaveFactor(II, InterleaveValues, InterleaveDeadInsts))
- return false;
-
- const unsigned Factor = InterleaveValues.size();
+ SmallVector<Value *, 8> InterleaveValues(II->args());
+ const unsigned Factor = getIntrinsicFactor(II);
if (auto *VPStore = dyn_cast<VPIntrinsic>(StoredBy)) {
if (VPStore->getIntrinsicID() != Intrinsic::vp_store)
@@ -922,7 +765,7 @@ bool InterleavedAccessImpl::lowerInterleaveIntrinsic(
// We now have a target-specific store, so delete the old one.
DeadInsts.insert(cast<Instruction>(StoredBy));
- DeadInsts.insert_range(InterleaveDeadInsts);
+ DeadInsts.insert(II);
return true;
}
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-deinterleave-load.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-deinterleave-load.ll
index c2ae1ce491389..3e822d357b667 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-deinterleave-load.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-deinterleave-load.ll
@@ -293,31 +293,6 @@ define { <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8> } @vector_deinterleave_load_fact
ret { <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8> } %res3
}
-; TODO: Remove once recursive deinterleaving support is removed
-define { <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8> } @vector_deinterleave_load_factor4_recursive(ptr %p) {
-; CHECK-LABEL: vector_deinterleave_load_factor4_recursive:
-; CHECK: # %bb.0:
-; CHECK-NEXT: vsetivli zero, 8, e8, mf2, ta, ma
-; CHECK-NEXT: vlseg4e8.v v8, (a0)
-; CHECK-NEXT: ret
- %vec = load <32 x i8>, ptr %p
- %d0 = call {<16 x i8>, <16 x i8>} @llvm.vector.deinterleave2.v32i8(<32 x i8> %vec)
- %d0.0 = extractvalue { <16 x i8>, <16 x i8> } %d0, 0
- %d0.1 = extractvalue { <16 x i8>, <16 x i8> } %d0, 1
- %d1 = call {<8 x i8>, <8 x i8>} @llvm.vector.deinterleave2.v16i8(<16 x i8> %d0.0)
- %t0 = extractvalue { <8 x i8>, <8 x i8> } %d1, 0
- %t2 = extractvalue { <8 x i8>, <8 x i8> } %d1, 1
- %d2 = call {<8 x i8>, <8 x i8>} @llvm.vector.deinterleave2.v16i8(<16 x i8> %d0.1)
- %t1 = extractvalue { <8 x i8>, <8 x i8> } %d2, 0
- %t3 = extractvalue { <8 x i8>, <8 x i8> } %d2, 1
-
- %res0 = insertvalue { <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8> } undef, <8 x i8> %t0, 0
- %res1 = insertvalue { <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8> } %res0, <8 x i8> %t1, 1
- %res2 = insertvalue { <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8> } %res1, <8 x i8> %t2, 2
- %res3 = insertvalue { <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8> } %res2, <8 x i8> %t3, 3
- ret { <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8> } %res3
-}
-
define { <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8> } @vector_deinterleave_load_factor5(ptr %p) {
; CHECK-LABEL: vector_deinterleave_load_factor5:
; CHECK: # %bb.0:
@@ -414,45 +389,3 @@ define { <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>, <
%res7 = insertvalue { <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8> } %res6, <8 x i8> %t6, 7
ret { <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8> } %res7
}
-
-; TODO: Remove once recursive deinterleaving support is removed
-define {<2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>} @vector_deinterleave_load_factor8_recursive(ptr %ptr) {
-; CHECK-LABEL: vector_deinterleave_load_factor8_recursive:
-; CHECK: # %bb.0:
-; CHECK-NEXT: vsetivli zero, 2, e32, mf2, ta, ma
-; CHECK-NEXT: vlseg8e32.v v8, (a0)
-; CHECK-NEXT: ret
- %vec = load <16 x i32>, ptr %ptr
- %d0 = call { <8 x i32>, <8 x i32> } @llvm.vector.deinterleave2.v16i32(<16 x i32> %vec)
- %d0.0 = extractvalue { <8 x i32>, <8 x i32> } %d0, 0
- %d0.1 = extractvalue { <8 x i32>, <8 x i32> } %d0, 1
- %d1 = call { <4 x i32>, <4 x i32> } @llvm.vector.deinterleave2.v8i32(<8 x i32> %d0.0)
- %d1.0 = extractvalue { <4 x i32>, <4 x i32> } %d1, 0
- %d1.1 = extractvalue { <4 x i32>, <4 x i32> } %d1, 1
- %d2 = call { <4 x i32>, <4 x i32> } @llvm.vector.deinterleave2.v8i32(<8 x i32> %d0.1)
- %d2.0 = extractvalue { <4 x i32>, <4 x i32> } %d2, 0
- %d2.1 = extractvalue { <4 x i32>, <4 x i32> } %d2, 1
-
- %d3 = call { <2 x i32>, <2 x i32> } @llvm.vector.deinterleave2.v4i32(<4 x i32> %d1.0)
- %t0 = extractvalue { <2 x i32>, <2 x i32> } %d3, 0
- %t4 = extractvalue { <2 x i32>, <2 x i32> } %d3, 1
- %d4 = call { <2 x i32>, <2 x i32> } @llvm.vector.deinterleave2.v4i32(<4 x i32> %d1.1)
- %t2 = extractvalue { <2 x i32>, <2 x i32> } %d4, 0
- %t6 = extractvalue { <2 x i32>, <2 x i32> } %d4, 1
- %d5 = call { <2 x i32>, <2 x i32> } @llvm.vector.deinterleave2.v4i32(<4 x i32> %d2.0)
- %t1 = extractvalue { <2 x i32>, <2 x i32> } %d5, 0
- %t5 = extractvalue { <2 x i32>, <2 x i32> } %d5, 1
- %d6 = call { <2 x i32>, <2 x i32> } @llvm.vector.deinterleave2.v4i32(<4 x i32> %d2.1)
- %t3 = extractvalue { <2 x i32>, <2 x i32> } %d6, 0
- %t7 = extractvalue { <2 x i32>, <2 x i32> } %d6, 1
-
- %res0 = insertvalue { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> } undef, <2 x i32> %t0, 0
- %res1 = insertvalue { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> } %res0, <2 x i32> %t1, 1
- %res2 = insertvalue { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> } %res1, <2 x i32> %t2, 2
- %res3 = insertvalue { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> } %res2, <2 x i32> %t3, 3
- %res4 = insertvalue { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> } %res3, <2 x i32> %t4, 4
- %res5 = insertvalue { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> } %res4, <2 x i32> %t5, 5
- %res6 = insertvalue { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> } %res5, <2 x i32> %t6, 6
- %res7 = insertvalue { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> } %res6, <2 x i32> %t7, 7
- ret { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> } %res7
-}
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleave-store.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleave-store.ll
index c394e7aa2e3e8..a49eeed3605c5 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleave-store.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleave-store.ll
@@ -203,20 +203,6 @@ define void @vector_interleave_store_factor4(<4 x i32> %a, <4 x i32> %b, <4 x i3
ret void
}
-; TODO: Remove once recursive interleaving support is removed
-define void @vector_interleave_store_factor4_recursive(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c, <4 x i32> %d, ptr %p) {
-; CHECK-LABEL: vector_interleave_store_factor4_recursive:
-; CHECK: # %bb.0:
-; CHECK-NEXT: vsetivli zero, 4, e32, m1, ta, ma
-; CHECK-NEXT: vsseg4e32.v v8, (a0)
-; CHECK-NEXT: ret
- %v0 = call <8 x i32> @llvm.vector.interleave2.v8i32(<4 x i32> %a, <4 x i32> %c)
- %v1 = call <8 x i32> @llvm.vector.interleave2.v8i32(<4 x i32> %b, <4 x i32> %d)
- %v2 = call <16 x i32> @llvm.vector.interleave2.v16i32(<8 x i32> %v0, <8 x i32> %v1)
- store <16 x i32> %v2, ptr %p
- ret void
-}
-
define void @vector_interleave_store_factor5(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c, <4 x i32> %d, <4 x i32> %e, ptr %p) {
; CHECK-LABEL: vector_interleave_store_factor5:
; CHECK: # %bb.0:
@@ -260,23 +246,3 @@ define void @vector_interleave_store_factor8(<4 x i32> %a, <4 x i32> %b, <4 x i3
store <32 x i32> %v, ptr %p
ret void
}
-
-; TODO: Remove once recursive interleaving support is removed
-define void @vector_interleave_store_factor8_recursive(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c, <4 x i32> %d, <4 x i32> %e, <4 x i32> %f, <4 x i32> %g, <4 x i32> %h, ptr %p) {
-; CHECK-LABEL: vector_interleave_store_factor8_recursive:
-; CHECK: # %bb.0:
-; CHECK-NEXT: vsetivli zero, 4, e32, m1, ta, ma
-; CHECK-NEXT: vsseg8e32.v v8, (a0)
-; CHECK-NEXT: ret
- %v0 = call <8 x i32> @llvm.vector.interleave2.v8i32(<4 x i32> %a, <4 x i32> %e)
- %v1 = call <8 x i32> @llvm.vector.interleave2.v8i32(<4 x i32> %c, <4 x i32> %g)
- %v2 = call <16 x i32> @llvm.vector.interleave2.v16i32(<8 x i32> %v0, <8 x i32> %v1)
-
- %v3 = call <8 x i32> @llvm.vector.interleave2.v8i32(<4 x i32> %b, <4 x i32> %f)
- %v4 = call <8 x i32> @llvm.vector.interleave2.v8i32(<4 x i32> %d, <4 x i32> %h)
- %v5 = call <16 x i32> @llvm.vector.interleave2.v16i32(<8 x i32> %v3, <8 x i32> %v4)
-
- %v6 = call <32 x i32> @llvm.vector.interleave2.v32i32(<16 x i32> %v2, <16 x i32> %v5)
- store <32 x i32> %v6, ptr %p
- ret void
-}
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleaved-access.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleaved-access.ll
index 8ac4c7447c7d4..5e3ae2faf1a53 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleaved-access.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleaved-access.ll
@@ -302,15 +302,11 @@ define {<2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>} @vpload_factor4_intrinsics(p
; CHECK-NEXT: vlseg4e32.v v8, (a0)
; CHECK-NEXT: ret
%wide.masked.load = call <8 x i32> @llvm.vp.load.v8i32.p0(ptr %ptr, <8 x i1> splat (i1 true), i32 8)
- %d0 = call { <4 x i32>, <4 x i32> } @llvm.vector.deinterleave2.v8i32(<8 x i32> %wide.masked.load)
- %d0.0 = extractvalue { <4 x i32>, <4 x i32> } %d0, 0
- %d0.1 = extractvalue { <4 x i32>, <4 x i32> } %d0, 1
- %d1 = call { <2 x i32>, <2 x i32> } @llvm.vector.deinterleave2.v4i32(<4 x i32> %d0.0)
- %t0 = extractvalue { <2 x i32>, <2 x i32> } %d1, 0
- %t2 = extractvalue { <2 x i32>, <2 x i32> } %d1, 1
- %d2 = call { <2 x i32>, <2 x i32> } @llvm.vector.deinterleave2.v4i32(<4 x i32> %d0.1)
- %t1 = extractvalue { <2 x i32>, <2 x i32> } %d2, 0
- %t3 = extractvalue { <2 x i32>, <2 x i32> } %d2, 1
+ %d = call { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> } @llvm.vector.deinterleave4.v8i32(<8 x i32> %wide.masked.load)
+ %t0 = extractvalue { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> } %d, 0
+ %t1 = extractvalue { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> } %d, 1
+ %t2 = extractvalue { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> } %d, 2
+ %t3 = extractvalue { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> } %d, 3
%res0 = insertvalue { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> } poison, <2 x i32> %t0, 0
%res1 = insertvalue { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> } %res0, <2 x i32> %t1, 1
diff --git a/llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-load.ll b/llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-load.ll
index 9344c52098684..b11db3d61f693 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-load.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-load.ll
@@ -380,31 +380,6 @@ define { <vscale x 8 x i8>, <vscale x 8 x i8>, <vscale x 8 x i8>, <vscale x 8 x
ret { <vscale x 8 x i8>, <vscale x 8 x i8>, <vscale x 8 x i8>, <vscale x 8 x i8> } %res3
}
-; TODO: Remove once recursive deinterleaving support is removed
-define { <vscale x 8 x i8>, <vscale x 8 x i8>, <vscale x 8 x i8>, <vscale x 8 x i8> } @vector_deinterleave_load_factor4_recursive(ptr %p) {
-; CHECK-LABEL: vector_deinterleave_load_factor4_recursive:
-; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli a1, zero, e8, m1, ta, ma
-; CHECK-NEXT: vlseg4e8.v v8, (a0)
-; CHECK-NEXT: ret
- %vec = load <vscale x 32 x i8>, ptr %p
- %d0 = call {<vscale x 16 x i8>, <vscale x 16 x i8>} @llvm.vector.deinterleave2.nxv32i8(<vscale x 32 x i8> %vec)
- %d0.0 = extractvalue { <vscale x 16 x i8>, <vscale x 16 x i8> } %d0, 0
- %d0.1 = extractvalue { <vscale x 16 x i8>, <vscale x 16 x i8> } %d0, 1
- %d1 = call {<vscale x 8 x i8>, <vscale x 8 x i8>} @llvm.vector.deinterleave2.nxv16i8(<vscale x 16 x i8> %d0.0)
- %t0 = extractvalue { <vscale x 8 x i8>, <vscale x 8 x i8> } %d1, 0
- %t2 = extractvalue { <vscale x 8 x i8>, <vscale x 8 x i8> } %d1, 1
- %d2 = call {<vscale x 8 x i8>, <vscale x 8 x i8>} @llvm.vector.deinterleave2.nxv16i8(<vscale x 16 x i8> %d0.1)
- %t1 = extractvalue { <vscale x 8 x i8>, <vscale x 8 x i8> } %d2, 0
- %t3 = extractvalue { <vscale x 8 x i8>, <vscale x 8 x i8> } %d2, 1
-
- %res0 = insertvalue { <vscale x 8 x i8>, <vscale x 8 x i8>, <vscale x 8 x i8>...
[truncated]
|
…access/remove-recursive
|
Gentle ping |
…access/remove-recursive
Now that the loop vectorizer emits just a single llvm.vector.[de]interleaveN intrinsic after llvm#141865, we can remove the need to recognise recursively [de]interleaved intrinsics. No in-tree target currently has instructions to emit an interleaved access with a factor > 8, and I'm not aware of any other passes that will emit recursive interleave patterns, so this code is effectively dead. Some tests have been converted from the recursive form to a single intrinsic, and some others were deleted that are no longer needed, e.g. to do with the recursive tree. This closes off the work started in llvm#139893.
Now that the loop vectorizer emits just a single llvm.vector.[de]interleaveN intrinsic after #141865, we can remove the need to recognise recursively [de]interleaved intrinsics.
No in-tree target currently has instructions to emit an interleaved access with a factor > 8, and I'm not aware of any other passes that will emit recursive interleave patterns, so this code is effectively dead.
Some tests have been converted from the recursive form to a single intrinsic, and some others were deleted that are no longer needed, e.g. to do with the recursive tree.
This closes off the work started in #139893.