[LV][EVL] Support in-loop reduction using tail folding with EVL.#90184
[LV][EVL] Support in-loop reduction using tail folding with EVL.#90184
Conversation
|
@llvm/pr-subscribers-llvm-ir @llvm/pr-subscribers-llvm-transforms Author: Mel Chen (Mel-Chen) ChangesFollowing from #87816, add VPReductionEVLRecipe to describe vector predication reduction. Address one of TODOs from #76172. Patch is 148.24 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/90184.diff 12 Files Affected:
diff --git a/llvm/include/llvm/IR/IRBuilder.h b/llvm/include/llvm/IR/IRBuilder.h
index b6534a1962a2f5..4db1fe5ff93aef 100644
--- a/llvm/include/llvm/IR/IRBuilder.h
+++ b/llvm/include/llvm/IR/IRBuilder.h
@@ -746,49 +746,68 @@ class IRBuilderBase {
private:
CallInst *getReductionIntrinsic(Intrinsic::ID ID, Value *Src);
+ // Helper function for creating VP reduce intrinsic call.
+ CallInst *getReductionIntrinsic(Intrinsic::ID ID, Value *Acc, Value *Src,
+ Value *Mask, Value *EVL);
+
public:
/// Create a sequential vector fadd reduction intrinsic of the source vector.
/// The first parameter is a scalar accumulator value. An unordered reduction
/// can be created by adding the reassoc fast-math flag to the resulting
/// sequential reduction.
CallInst *CreateFAddReduce(Value *Acc, Value *Src);
+ CallInst *CreateFAddReduce(Value *Acc, Value *Src, Value *EVL,
+ Value *Mask = nullptr);
/// Create a sequential vector fmul reduction intrinsic of the source vector.
/// The first parameter is a scalar accumulator value. An unordered reduction
/// can be created by adding the reassoc fast-math flag to the resulting
/// sequential reduction.
CallInst *CreateFMulReduce(Value *Acc, Value *Src);
+ CallInst *CreateFMulReduce(Value *Acc, Value *Src, Value *EVL,
+ Value *Mask = nullptr);
/// Create a vector int add reduction intrinsic of the source vector.
CallInst *CreateAddReduce(Value *Src);
+ CallInst *CreateAddReduce(Value *Src, Value *EVL, Value *Mask = nullptr);
/// Create a vector int mul reduction intrinsic of the source vector.
CallInst *CreateMulReduce(Value *Src);
+ CallInst *CreateMulReduce(Value *Src, Value *EVL, Value *Mask = nullptr);
/// Create a vector int AND reduction intrinsic of the source vector.
CallInst *CreateAndReduce(Value *Src);
+ CallInst *CreateAndReduce(Value *Src, Value *EVL, Value *Mask = nullptr);
/// Create a vector int OR reduction intrinsic of the source vector.
CallInst *CreateOrReduce(Value *Src);
+ CallInst *CreateOrReduce(Value *Src, Value *EVL, Value *Mask = nullptr);
/// Create a vector int XOR reduction intrinsic of the source vector.
CallInst *CreateXorReduce(Value *Src);
+ CallInst *CreateXorReduce(Value *Src, Value *EVL, Value *Mask = nullptr);
/// Create a vector integer max reduction intrinsic of the source
/// vector.
CallInst *CreateIntMaxReduce(Value *Src, bool IsSigned = false);
+ CallInst *CreateIntMaxReduce(Value *Src, Value *EVL, bool IsSigned = false,
+ Value *Mask = nullptr);
/// Create a vector integer min reduction intrinsic of the source
/// vector.
CallInst *CreateIntMinReduce(Value *Src, bool IsSigned = false);
+ CallInst *CreateIntMinReduce(Value *Src, Value *EVL, bool IsSigned = false,
+ Value *Mask = nullptr);
/// Create a vector float max reduction intrinsic of the source
/// vector.
CallInst *CreateFPMaxReduce(Value *Src);
+ CallInst *CreateFPMaxReduce(Value *Src, Value *EVL, Value *Mask = nullptr);
/// Create a vector float min reduction intrinsic of the source
/// vector.
CallInst *CreateFPMinReduce(Value *Src);
+ CallInst *CreateFPMinReduce(Value *Src, Value *EVL, Value *Mask = nullptr);
/// Create a vector float maximum reduction intrinsic of the source
/// vector. This variant follows the NaN and signed zero semantic of
diff --git a/llvm/include/llvm/Transforms/Utils/LoopUtils.h b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
index 187ace3a0cbedf..5003fa66100b46 100644
--- a/llvm/include/llvm/Transforms/Utils/LoopUtils.h
+++ b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
@@ -403,6 +403,9 @@ Value *getShuffleReduction(IRBuilderBase &Builder, Value *Src, unsigned Op,
/// Fast-math-flags are propagated using the IRBuilder's setting.
Value *createSimpleTargetReduction(IRBuilderBase &B, Value *Src,
RecurKind RdxKind);
+Value *createSimpleTargetReduction(IRBuilderBase &B, Value *Src,
+ RecurKind RdxKind, Value *EVL,
+ Value *Mask = nullptr);
/// Create a target reduction of the given vector \p Src for a reduction of the
/// kind RecurKind::IAnyOf or RecurKind::FAnyOf. The reduction operation is
@@ -423,6 +426,9 @@ Value *createTargetReduction(IRBuilderBase &B, const RecurrenceDescriptor &Desc,
Value *createOrderedReduction(IRBuilderBase &B,
const RecurrenceDescriptor &Desc, Value *Src,
Value *Start);
+Value *createOrderedReduction(IRBuilderBase &B,
+ const RecurrenceDescriptor &Desc, Value *Src,
+ Value *Start, Value *EVL, Value *Mask = nullptr);
/// Get the intersection (logical and) of all of the potential IR flags
/// of each scalar operation (VL) that will be converted into a vector (I).
diff --git a/llvm/lib/IR/IRBuilder.cpp b/llvm/lib/IR/IRBuilder.cpp
index d6746d1d438242..90f637940d00da 100644
--- a/llvm/lib/IR/IRBuilder.cpp
+++ b/llvm/lib/IR/IRBuilder.cpp
@@ -414,6 +414,20 @@ CallInst *IRBuilderBase::getReductionIntrinsic(Intrinsic::ID ID, Value *Src) {
return CreateCall(Decl, Ops);
}
+CallInst *IRBuilderBase::getReductionIntrinsic(Intrinsic::ID ID, Value *Acc,
+ Value *Src, Value *Mask,
+ Value *EVL) {
+ Module *M = GetInsertBlock()->getParent()->getParent();
+ auto *SrcTy = cast<VectorType>(Src->getType());
+ EVL = CreateIntCast(EVL, getInt32Ty(), /*isSigned=*/false);
+ if (!Mask)
+ Mask = CreateVectorSplat(SrcTy->getElementCount(), getTrue());
+ Value *Ops[] = {Acc, Src, Mask, EVL};
+ Type *Tys[] = {SrcTy};
+ auto Decl = Intrinsic::getDeclaration(M, ID, Tys);
+ return CreateCall(Decl, Ops);
+}
+
CallInst *IRBuilderBase::CreateFAddReduce(Value *Acc, Value *Src) {
Module *M = GetInsertBlock()->getParent()->getParent();
Value *Ops[] = {Acc, Src};
@@ -422,6 +436,11 @@ CallInst *IRBuilderBase::CreateFAddReduce(Value *Acc, Value *Src) {
return CreateCall(Decl, Ops);
}
+CallInst *IRBuilderBase::CreateFAddReduce(Value *Acc, Value *Src, Value *EVL,
+ Value *Mask) {
+ return getReductionIntrinsic(Intrinsic::vp_reduce_fadd, Acc, Src, Mask ,EVL);
+}
+
CallInst *IRBuilderBase::CreateFMulReduce(Value *Acc, Value *Src) {
Module *M = GetInsertBlock()->getParent()->getParent();
Value *Ops[] = {Acc, Src};
@@ -430,46 +449,149 @@ CallInst *IRBuilderBase::CreateFMulReduce(Value *Acc, Value *Src) {
return CreateCall(Decl, Ops);
}
+CallInst *IRBuilderBase::CreateFMulReduce(Value *Acc, Value *Src, Value *EVL,
+ Value *Mask) {
+ return getReductionIntrinsic(Intrinsic::vp_reduce_fmul, Acc, Src, Mask, EVL);
+}
+
CallInst *IRBuilderBase::CreateAddReduce(Value *Src) {
return getReductionIntrinsic(Intrinsic::vector_reduce_add, Src);
}
+CallInst *IRBuilderBase::CreateAddReduce(Value *Src, Value *EVL, Value *Mask) {
+ auto *SrcTy = cast<VectorType>(Src->getType());
+ auto *EltTy = SrcTy->getElementType();
+ return getReductionIntrinsic(Intrinsic::vp_reduce_add,
+ ConstantInt::get(EltTy, 0), Src, Mask, EVL);
+}
+
CallInst *IRBuilderBase::CreateMulReduce(Value *Src) {
return getReductionIntrinsic(Intrinsic::vector_reduce_mul, Src);
}
+CallInst *IRBuilderBase::CreateMulReduce(Value *Src, Value *EVL, Value *Mask) {
+ auto *SrcTy = cast<VectorType>(Src->getType());
+ auto *EltTy = SrcTy->getElementType();
+ return getReductionIntrinsic(Intrinsic::vp_reduce_mul,
+ ConstantInt::get(EltTy, 1), Src, Mask, EVL);
+}
+
CallInst *IRBuilderBase::CreateAndReduce(Value *Src) {
return getReductionIntrinsic(Intrinsic::vector_reduce_and, Src);
}
+CallInst *IRBuilderBase::CreateAndReduce(Value *Src, Value *EVL, Value *Mask) {
+ auto *SrcTy = cast<VectorType>(Src->getType());
+ auto *EltTy = SrcTy->getElementType();
+ return getReductionIntrinsic(Intrinsic::vp_reduce_and,
+ Constant::getAllOnesValue(EltTy), Src, Mask,
+ EVL);
+}
+
CallInst *IRBuilderBase::CreateOrReduce(Value *Src) {
return getReductionIntrinsic(Intrinsic::vector_reduce_or, Src);
}
+CallInst *IRBuilderBase::CreateOrReduce(Value *Src, Value *EVL, Value *Mask) {
+ auto *SrcTy = cast<VectorType>(Src->getType());
+ auto *EltTy = SrcTy->getElementType();
+ return getReductionIntrinsic(Intrinsic::vp_reduce_or,
+ ConstantInt::get(EltTy, 0), Src, Mask, EVL);
+}
+
CallInst *IRBuilderBase::CreateXorReduce(Value *Src) {
return getReductionIntrinsic(Intrinsic::vector_reduce_xor, Src);
}
+CallInst *IRBuilderBase::CreateXorReduce(Value *Src, Value *EVL, Value *Mask) {
+ auto *SrcTy = cast<VectorType>(Src->getType());
+ auto *EltTy = SrcTy->getElementType();
+ return getReductionIntrinsic(Intrinsic::vp_reduce_xor,
+ ConstantInt::get(EltTy, 0), Src, Mask, EVL);
+}
+
CallInst *IRBuilderBase::CreateIntMaxReduce(Value *Src, bool IsSigned) {
auto ID =
IsSigned ? Intrinsic::vector_reduce_smax : Intrinsic::vector_reduce_umax;
return getReductionIntrinsic(ID, Src);
}
+CallInst *IRBuilderBase::CreateIntMaxReduce(Value *Src, Value *EVL,
+ bool IsSigned, Value *Mask) {
+ auto *SrcTy = cast<VectorType>(Src->getType());
+ auto *EltTy = SrcTy->getElementType();
+ return getReductionIntrinsic(
+ IsSigned ? Intrinsic::vp_reduce_smax : Intrinsic::vp_reduce_umax,
+ IsSigned ? ConstantInt::get(EltTy, APInt::getSignedMinValue(
+ EltTy->getIntegerBitWidth()))
+ : ConstantInt::get(EltTy, 0),
+ Src, Mask, EVL);
+}
+
CallInst *IRBuilderBase::CreateIntMinReduce(Value *Src, bool IsSigned) {
auto ID =
IsSigned ? Intrinsic::vector_reduce_smin : Intrinsic::vector_reduce_umin;
return getReductionIntrinsic(ID, Src);
}
+CallInst *IRBuilderBase::CreateIntMinReduce(Value *Src, Value *EVL,
+ bool IsSigned, Value *Mask) {
+ auto *SrcTy = cast<VectorType>(Src->getType());
+ auto *EltTy = SrcTy->getElementType();
+ return getReductionIntrinsic(
+ IsSigned ? Intrinsic::vp_reduce_smin : Intrinsic::vp_reduce_umin,
+ IsSigned ? ConstantInt::get(EltTy, APInt::getSignedMaxValue(
+ EltTy->getIntegerBitWidth()))
+ : Constant::getAllOnesValue(EltTy),
+ Src, Mask, EVL);
+}
+
CallInst *IRBuilderBase::CreateFPMaxReduce(Value *Src) {
return getReductionIntrinsic(Intrinsic::vector_reduce_fmax, Src);
}
+CallInst *IRBuilderBase::CreateFPMaxReduce(Value *Src, Value *EVL,
+ Value *Mask) {
+ auto *SrcTy = cast<VectorType>(Src->getType());
+ auto *EltTy = SrcTy->getElementType();
+ FastMathFlags FMF = getFastMathFlags();
+ Value *Neutral;
+ if (FMF.noNaNs())
+ Neutral = FMF.noInfs()
+ ? ConstantFP::get(
+ EltTy, APFloat::getLargest(EltTy->getFltSemantics(),
+ /*Negative=*/true))
+ : ConstantFP::getInfinity(EltTy, true);
+ else
+ Neutral = ConstantFP::getQNaN(EltTy, /*Negative=*/true);
+
+ return getReductionIntrinsic(Intrinsic::vp_reduce_fmax, Neutral, Src, Mask,
+ EVL);
+}
+
CallInst *IRBuilderBase::CreateFPMinReduce(Value *Src) {
return getReductionIntrinsic(Intrinsic::vector_reduce_fmin, Src);
}
+CallInst *IRBuilderBase::CreateFPMinReduce(Value *Src, Value *EVL,
+ Value *Mask) {
+ auto *SrcTy = cast<VectorType>(Src->getType());
+ auto *EltTy = SrcTy->getElementType();
+ FastMathFlags FMF = getFastMathFlags();
+ Value *Neutral;
+ if (FMF.noNaNs())
+ Neutral = FMF.noInfs()
+ ? ConstantFP::get(
+ EltTy, APFloat::getLargest(EltTy->getFltSemantics(),
+ /*Negative=*/false))
+ : ConstantFP::getInfinity(EltTy, false);
+ else
+ Neutral = ConstantFP::getQNaN(EltTy, /*Negative=*/false);
+
+ return getReductionIntrinsic(Intrinsic::vp_reduce_fmin, Neutral, Src, Mask,
+ EVL);
+}
+
CallInst *IRBuilderBase::CreateFPMaximumReduce(Value *Src) {
return getReductionIntrinsic(Intrinsic::vector_reduce_fmaximum, Src);
}
diff --git a/llvm/lib/Transforms/Utils/LoopUtils.cpp b/llvm/lib/Transforms/Utils/LoopUtils.cpp
index 73c5d636782294..d0abcdfb1440ab 100644
--- a/llvm/lib/Transforms/Utils/LoopUtils.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUtils.cpp
@@ -1204,6 +1204,48 @@ Value *llvm::createSimpleTargetReduction(IRBuilderBase &Builder, Value *Src,
}
}
+Value *llvm::createSimpleTargetReduction(IRBuilderBase &Builder, Value *Src,
+ RecurKind RdxKind, Value *EVL,
+ Value *Mask) {
+ auto *SrcVecEltTy = cast<VectorType>(Src->getType())->getElementType();
+ switch (RdxKind) {
+ case RecurKind::Add:
+ return Builder.CreateAddReduce(Src, EVL, Mask);
+ case RecurKind::Mul:
+ return Builder.CreateMulReduce(Src, EVL, Mask);
+ case RecurKind::And:
+ return Builder.CreateAndReduce(Src, EVL, Mask);
+ case RecurKind::Or:
+ return Builder.CreateOrReduce(Src, EVL, Mask);
+ case RecurKind::Xor:
+ return Builder.CreateXorReduce(Src, EVL, Mask);
+ case RecurKind::FMulAdd:
+ case RecurKind::FAdd:
+ return Builder.CreateFAddReduce(ConstantFP::getNegativeZero(SrcVecEltTy),
+ Src, EVL, Mask);
+ case RecurKind::FMul:
+ return Builder.CreateFMulReduce(ConstantFP::get(SrcVecEltTy, 1.0), Src, EVL,
+ Mask);
+ case RecurKind::SMax:
+ return Builder.CreateIntMaxReduce(Src, EVL, true, Mask);
+ case RecurKind::SMin:
+ return Builder.CreateIntMinReduce(Src, EVL, true, Mask);
+ case RecurKind::UMax:
+ return Builder.CreateIntMaxReduce(Src, EVL, false, Mask);
+ case RecurKind::UMin:
+ return Builder.CreateIntMinReduce(Src, EVL, false, Mask);
+ case RecurKind::FMax:
+ return Builder.CreateFPMaxReduce(Src, EVL, Mask);
+ case RecurKind::FMin:
+ return Builder.CreateFPMinReduce(Src, EVL, Mask);
+ case RecurKind::FMinimum:
+ case RecurKind::FMaximum:
+ assert(0 && "FMaximum/FMinimum reduction VP intrinsic is not supported.");
+ default:
+ llvm_unreachable("Unhandled opcode");
+ }
+}
+
Value *llvm::createTargetReduction(IRBuilderBase &B,
const RecurrenceDescriptor &Desc, Value *Src,
PHINode *OrigPhi) {
@@ -1232,6 +1274,20 @@ Value *llvm::createOrderedReduction(IRBuilderBase &B,
return B.CreateFAddReduce(Start, Src);
}
+Value *llvm::createOrderedReduction(IRBuilderBase &B,
+ const RecurrenceDescriptor &Desc,
+ Value *Src, Value *Start, Value *EVL,
+ Value *Mask) {
+ assert((Desc.getRecurrenceKind() == RecurKind::FAdd ||
+ Desc.getRecurrenceKind() == RecurKind::FMulAdd) &&
+ "Unexpected reduction kind");
+ assert(Src->getType()->isVectorTy() && "Expected a vector type");
+ assert(!Start->getType()->isVectorTy() && "Expected a scalar type");
+ assert(EVL->getType()->isIntegerTy() && "Expected a integer type");
+
+ return B.CreateFAddReduce(Start, Src, EVL, Mask);
+}
+
void llvm::propagateIRFlags(Value *I, ArrayRef<Value *> VL, Value *OpValue,
bool IncludeWrapFlags) {
auto *VecOp = dyn_cast<Instruction>(I);
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 33c4decd58a6c2..1db531e170a4bf 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -1526,6 +1526,17 @@ class LoopVectorizationCostModel {
ForceTailFoldingStyle.getValue());
if (ForceTailFoldingStyle != TailFoldingStyle::DataWithEVL)
return;
+
+ // Block folding with EVL since vector-predication intrinsics have not
+ // support FMinimum and FMaximum reduction.
+ // FIXME: remove this check once llvm.vp.reduce.fminimum/fmaximum are
+ // supported
+ bool ContainsFMinimumOrFMaximumReduction =
+ any_of(Legal->getReductionVars(), [&](auto &Reduction) {
+ const RecurrenceDescriptor &RdxDesc = Reduction.second;
+ RecurKind Kind = RdxDesc.getRecurrenceKind();
+ return Kind == RecurKind::FMinimum || Kind == RecurKind::FMaximum;
+ });
// Override forced styles if needed.
// FIXME: use actual opcode/data type for analysis here.
// FIXME: Investigate opportunity for fixed vector factor.
@@ -1535,8 +1546,7 @@ class LoopVectorizationCostModel {
!EnableVPlanNativePath &&
// FIXME: implement support for max safe dependency distance.
Legal->isSafeForAnyVectorWidth() &&
- // FIXME: remove this once reductions are supported.
- Legal->getReductionVars().empty();
+ !ContainsFMinimumOrFMaximumReduction;
if (!EVLIsLegal) {
// If for some reason EVL mode is unsupported, fallback to
// DataWithoutLaneMask to try to vectorize the loop with folded tail
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index c74329a0bcc4ac..a444064dab692a 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -843,6 +843,7 @@ class VPSingleDefRecipe : public VPRecipeBase, public VPValue {
case VPRecipeBase::VPDerivedIVSC:
case VPRecipeBase::VPExpandSCEVSC:
case VPRecipeBase::VPInstructionSC:
+ case VPRecipeBase::VPReductionEVLSC:
case VPRecipeBase::VPReductionSC:
case VPRecipeBase::VPReplicateSC:
case VPRecipeBase::VPScalarIVStepsSC:
@@ -2110,6 +2111,12 @@ class VPReductionRecipe : public VPSingleDefRecipe {
VPSlotTracker &SlotTracker) const override;
#endif
+ /// Return the recurrence decriptor for the in-loop reduction.
+ const RecurrenceDescriptor &getRecurrenceDescriptor() const {
+ return RdxDesc;
+ }
+ /// Return true if the in-loop reduction is ordered.
+ bool isOrdered() const { return IsOrdered; };
/// The VPValue of the scalar Chain being accumulated.
VPValue *getChainOp() const { return getOperand(0); }
/// The VPValue of the vector value to be reduced.
@@ -2120,6 +2127,63 @@ class VPReductionRecipe : public VPSingleDefRecipe {
}
};
+/// A recipe to represent inloop reduction operations with vector-predication
+/// intrinsics, performing a reduction on a vector operand with the explicit
+/// vector length (EVL) into a scalar value, and adding the result to a chain.
+/// The Operands are {ChainOp, VecOp, EVL, [Condition]}.
+class VPReductionEVLRecipe : public VPSingleDefRecipe {
+ /// The recurrence decriptor for the reduction in question.
+ const RecurrenceDescriptor &RdxDesc;
+ bool IsOrdered;
+
+public:
+ VPReductionEVLRecipe(VPReductionRecipe *R, VPValue *EVL)
+ : VPSingleDefRecipe(
+ VPDef::VPReductionEVLSC,
+ ArrayRef<VPValue *>({R->getChainOp(), R->getVecOp(), EVL}),
+ R->getUnderlyingInstr()),
+ RdxDesc(R->getRecurrenceDescriptor()), IsOrdered(R->isOrdered()) {
+ VPValue *CondOp = R->getCondOp();
+ if (CondOp)
+ addOperand(CondOp);
+ };
+
+ ~VPReductionEVLRecipe() override = default;
+
+ VPReductionEVLRecipe *clone() override {
+ llvm_unreachable("cloning not implemented yet");
+ }
+
+ VP_CLASSOF_IMPL(VPDef::VPReductionEVLSC)
+
+ /// Generate the reduction in the loop
+ void execute(VPTransformState &State) override;
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+ /// Print the recipe.
+ void print(raw_ostream &O, const Twine &Indent,
+ VPSlotTracker &SlotTracker) const override;
+#endif
+
+ /// The VPValue of the scalar Chain being accumulated.
+ VPValue *getChainOp() const { return g...
[truncated]
|
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
df1c995 to
af3e8a5
Compare
fhahn
left a comment
There was a problem hiding this comment.
The title mentions this adds support for in-loop reductions, but I wasn't able to find a check to make sure we only vectorize in-loop reductions?
All tests seem to pass flags guiding towards the use of in-loop/ordered reductions, so the case where the regular reduction strategy is chosen may not be tested well
llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-reduction.ll
Outdated
Show resolved
Hide resolved
llvm/include/llvm/IR/VectorBuilder.h
Outdated
There was a problem hiding this comment.
independent of this change, but if VectorBuilder only supports generation of vector-predication intrinsics, then it would be better to call it VectorPredicationBuilder to avoid confusion
There was a problem hiding this comment.
Indeed, this can be adjusted later. (But I would prefer a shorter name, perhaps just VectorPredBuilder would be good enough?)
There was a problem hiding this comment.
or VPBuilder, like there's IRBuilder
There was a problem hiding this comment.
or
VPBuilder, like there'sIRBuilder
That's unfortunate, as there is already a class named VPBuilder in LoopVectorizationPlanner.h.
/// VPlan-based builder utility analogous to IRBuilder.
class VPBuilder {
VPBasicBlock *BB = nullptr;
There was a problem hiding this comment.
I think it's worth being more explicit for the name in the utility in llvm/IR, VectorPredBuilder would sound good to me.
There was a problem hiding this comment.
nit: use /// for doc-comment
Indeed, we might need to change the title. |
|
@fhahn ping |
Sounds good to me, but it would be good to have some upstream buildbot that builds some code with the various options to have some runtime testing. |
llvm/include/llvm/IR/VectorBuilder.h
Outdated
There was a problem hiding this comment.
I think it's worth being more explicit for the name in the utility in llvm/IR, VectorPredBuilder would sound good to me.
llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-reduction.ll
Outdated
Show resolved
Hide resolved
llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-reduction.ll
Outdated
Show resolved
Hide resolved
…ion." This reverts commit 8488520.
******************** Failed Tests (2): LLVM-Unit :: Transforms/Vectorize/./VectorizeTests/21/51 LLVM-Unit :: Transforms/Vectorize/./VectorizeTests/26/51
chapuni
left a comment
There was a problem hiding this comment.
llvm/IR should not depend on llvm/Analysis.
| #ifndef LLVM_IR_VECTORBUILDER_H | ||
| #define LLVM_IR_VECTORBUILDER_H | ||
|
|
||
| #include <llvm/Analysis/IVDescriptors.h> |
There was a problem hiding this comment.
This is a layering violation.
There was a problem hiding this comment.
Thanks for pointing out this issue. I opened #99276 to fix it. Please take a look, thanks a lot.
) Summary: Following from #87816, add VPReductionEVLRecipe to describe vector predication reduction. Address one of TODOs from #76172. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60251485
… vectorization. (#101641) Following #90184, this patch emits vp.merge intrinsic, which is used to set the inactive lanes in a select operation to the RHS instead of undef. Currently, it is applied to out-loop reduction for EVL vectorization. This patch performs transformation to convert select(header_mask, LHS, RHS) into vp.merge(all-true, LHS, RHS, EVL) And always use the predicated reduction select to set the incoming value of the reduction phi to support out-loop reduction when using tail folding with EVL. TODO: Postpone the adjustment of the predicated reduction select to VPlanTransform. The current adjustment might be too early, which could lead to a situation where the predicated reduction select is adjusted, but the EVL recipes cannot be successfully generated during VPlanTransform.
Following from #87816, add VPReductionEVLRecipe to describe vector predication reduction.
Address one of TODOs from #76172.