[VPlan] Extract reverse mask from reverse accesses#155579
[VPlan] Extract reverse mask from reverse accesses#155579
Conversation
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
@fhahn This could lead to overestimating the cost.
Multiple VPWidenMemoryRecipes may be able to share a single reverse mask, but if each VPWidenMemoryRecipe computes the cost of the reverse mask separately, the cost would be overestimated.
There was a problem hiding this comment.
@fhahn I think this is the most straightforward approach.
Currently, we could check whether the operand or user is a reverse operation, but in the future reverse operations might be simplified away, so relying on reverse operation is not a long-term approach.
The last option is to use the address for the check, since only reverse operations need to use VPVectorEndPointer.
What do you think?
llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse-mask4.ll
Outdated
Show resolved
Hide resolved
6fd0244 to
53be850
Compare
|
Reverse ping (no pun intended!) |
I’d like to understand whether you need mask reverses to be split out before costing the plan, or whether doing so after the cost plan would also work for you. This issue comes up because the mask reverse has never been accounted for in ::computeCost. If we split the mask reverse before costing the plan, the current VPlan-based cost model will compute a higher cost than the legacy cost model. There are two ways to split the mask reverse from reverse load/stores. The first, and faster approach (I do remember Florian mentioned a similar way earlier) is to lower reverse accesses into mask reverse + normal consecutive accesses in VPlanTransforms::convertToConcreteRecipes. The second approach is to split out mask reverses directly in VPRecipeBuilder::tryToWidenMemory, and then run an extra CSE in VPlanTransforms::optimize to reduce redundant mask reverses and thus minimize overcounting of reverse cost. Do you have a preference between these approaches first to go? |
Definitely this approach, as I'd really like to see any notion of "reversing" explicitly moved out of VPWidenMemoryRecipe. As for the costing issue, I think we should start by just aligning the VPlan cost model with legacy cost model: diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 5716a6f82259..58e977b30853 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -1226,7 +1226,10 @@ InstructionCost VPInstruction::computeCost(ElementCount VF,
case VPInstruction::Reverse: {
assert(VF.isVector() && "Reverse operation must be vector type");
auto *VectorTy = cast<VectorType>(
- toVectorTy(Ctx.Types.inferScalarType(getOperand(0)), VF));
+ toVectorTy(Ctx.Types.inferScalarType(getOperand(0)), VF));
+ // TODO: Remove after adding some reverse simplifications
+ if (VectorTy->getElementType()->isIntegerTy(1))
+ return 0;
return Ctx.TTI.getShuffleCost(TargetTransformInfo::SK_Reverse, VectorTy,
VectorTy, /*Mask=*/{}, Ctx.CostKind,
/*Index=*/0);This seemed to work when I tried this locally last month, and I think this way we won't need to add any VPInstruction::Reverse simplifications in the first PR. We can defer those to a follow up PR, and hopefully then start costing the mask. Does that sound good? |
fhahn
left a comment
There was a problem hiding this comment.
The second approach is to split out mask reverses directly in VPRecipeBuilder::tryToWidenMemory
Definitely this approach, as I'd really like to see any notion of "reversing" explicitly moved out of VPWidenMemoryRecipe.
As for the costing issue, I think we should start by just aligning the VPlan cost model with legacy cost model:
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp index 5716a6f82259..58e977b30853 100644 --- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp +++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp @@ -1226,7 +1226,10 @@ InstructionCost VPInstruction::computeCost(ElementCount VF, case VPInstruction::Reverse: { assert(VF.isVector() && "Reverse operation must be vector type"); auto *VectorTy = cast<VectorType>( - toVectorTy(Ctx.Types.inferScalarType(getOperand(0)), VF)); + toVectorTy(Ctx.Types.inferScalarType(getOperand(0)), VF)); + // TODO: Remove after adding some reverse simplifications + if (VectorTy->getElementType()->isIntegerTy(1)) + return 0; return Ctx.TTI.getShuffleCost(TargetTransformInfo::SK_Reverse, VectorTy, VectorTy, /*Mask=*/{}, Ctx.CostKind, /*Index=*/0);This seemed to work when I tried this locally last month, and I think this way we won't need to add any VPInstruction::Reverse simplifications in the first PR. We can defer those to a follow up PR, and hopefully then start costing the mask. Does that sound good?
We already include the cost of some VPInstructions after computing cost from the legacy cost model IIRC, we could do the same for reverse to get the more accurate estimates
Yes we can do that too, but would it better to split the cost model change off into its own PR? I think this PR can be NFC-ish otherwise. |
53be850 to
48fa6f5
Compare
|
@llvm/pr-subscribers-vectorizers @llvm/pr-subscribers-llvm-transforms Author: Mel Chen (Mel-Chen) ChangesFollowing #146525, separate the reverse mask from reverse access recipes. Base on #146525 Patch is 57.95 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/155579.diff 13 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 3f1e12e5d1cd0..cf8bd87bcb203 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7097,18 +7097,21 @@ static bool planContainsAdditionalSimplifications(VPlan &Plan,
CostCtx.isLegacyUniformAfterVectorization(AddrI, VF))
return true;
- if (WidenMemR->isReverse()) {
- // If the stored value of a reverse store is invariant, LICM will
- // hoist the reverse operation to the preheader. In this case, the
- // result of the VPlan-based cost model will diverge from that of
- // the legacy model.
+ // If the stored value of a reverse store is invariant, LICM will
+ // hoist the reverse operation to the preheader. In this case, the
+ // result of the VPlan-based cost model will diverge from that of
+ // the legacy model.
+ if (isa<VPWidenStoreRecipe, VPWidenStoreEVLRecipe>(WidenMemR)) {
+ VPValue *StoredVal;
if (auto *StoreR = dyn_cast<VPWidenStoreRecipe>(WidenMemR))
- if (StoreR->getStoredValue()->isDefinedOutsideLoopRegions())
- return true;
+ StoredVal = StoreR->getStoredValue();
+ else
+ StoredVal =
+ cast<VPWidenStoreEVLRecipe>(WidenMemR)->getStoredValue();
- if (auto *StoreR = dyn_cast<VPWidenStoreEVLRecipe>(WidenMemR))
- if (StoreR->getStoredValue()->isDefinedOutsideLoopRegions())
- return true;
+ using namespace VPlanPatternMatch;
+ if (match(StoredVal, m_Reverse(m_VPValue())))
+ return StoredVal->isDefinedOutsideLoopRegions();
}
}
@@ -7747,10 +7750,13 @@ VPRecipeBase *VPRecipeBuilder::tryToWidenMemory(VPInstruction *VPI,
Ptr = VectorPtr;
}
+ if (Reverse && Mask)
+ Mask = Builder.createNaryOp(VPInstruction::Reverse, Mask, I->getDebugLoc());
+
if (VPI->getOpcode() == Instruction::Load) {
auto *Load = cast<LoadInst>(I);
- auto *LoadR = new VPWidenLoadRecipe(*Load, Ptr, Mask, Consecutive, Reverse,
- *VPI, Load->getDebugLoc());
+ auto *LoadR = new VPWidenLoadRecipe(*Load, Ptr, Mask, Consecutive, *VPI,
+ Load->getDebugLoc());
if (Reverse) {
Builder.insert(LoadR);
return new VPInstruction(VPInstruction::Reverse, LoadR, {}, {},
@@ -7764,8 +7770,8 @@ VPRecipeBase *VPRecipeBuilder::tryToWidenMemory(VPInstruction *VPI,
if (Reverse)
StoredVal = Builder.createNaryOp(VPInstruction::Reverse, StoredVal,
Store->getDebugLoc());
- return new VPWidenStoreRecipe(*Store, Ptr, StoredVal, Mask, Consecutive,
- Reverse, *VPI, Store->getDebugLoc());
+ return new VPWidenStoreRecipe(*Store, Ptr, StoredVal, Mask, Consecutive, *VPI,
+ Store->getDebugLoc());
}
VPWidenIntOrFpInductionRecipe *
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 1d250322ccf63..5354ab266ed83 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -3300,9 +3300,6 @@ class LLVM_ABI_FOR_TEST VPWidenMemoryRecipe : public VPRecipeBase,
/// Whether the accessed addresses are consecutive.
bool Consecutive;
- /// Whether the consecutive accessed addresses are in reverse order.
- bool Reverse;
-
/// Whether the memory access is masked.
bool IsMasked = false;
@@ -3316,15 +3313,10 @@ class LLVM_ABI_FOR_TEST VPWidenMemoryRecipe : public VPRecipeBase,
VPWidenMemoryRecipe(const char unsigned SC, Instruction &I,
std::initializer_list<VPValue *> Operands,
- bool Consecutive, bool Reverse,
- const VPIRMetadata &Metadata, DebugLoc DL)
+ bool Consecutive, const VPIRMetadata &Metadata,
+ DebugLoc DL)
: VPRecipeBase(SC, Operands, DL), VPIRMetadata(Metadata), Ingredient(I),
- Alignment(getLoadStoreAlignment(&I)), Consecutive(Consecutive),
- Reverse(Reverse) {
- assert((Consecutive || !Reverse) && "Reverse implies consecutive");
- assert((isa<VPVectorEndPointerRecipe>(getAddr()) || !Reverse) &&
- "Reversed acccess without VPVectorEndPointerRecipe address?");
- }
+ Alignment(getLoadStoreAlignment(&I)), Consecutive(Consecutive) {}
public:
VPWidenMemoryRecipe *clone() override {
@@ -3346,10 +3338,6 @@ class LLVM_ABI_FOR_TEST VPWidenMemoryRecipe : public VPRecipeBase,
/// Return whether the loaded-from / stored-to addresses are consecutive.
bool isConsecutive() const { return Consecutive; }
- /// Return whether the consecutive loaded/stored addresses are in reverse
- /// order.
- bool isReverse() const { return Reverse; }
-
/// Return the address accessed by this recipe.
VPValue *getAddr() const { return getOperand(0); }
@@ -3383,18 +3371,16 @@ class LLVM_ABI_FOR_TEST VPWidenMemoryRecipe : public VPRecipeBase,
struct LLVM_ABI_FOR_TEST VPWidenLoadRecipe final : public VPWidenMemoryRecipe,
public VPRecipeValue {
VPWidenLoadRecipe(LoadInst &Load, VPValue *Addr, VPValue *Mask,
- bool Consecutive, bool Reverse,
- const VPIRMetadata &Metadata, DebugLoc DL)
+ bool Consecutive, const VPIRMetadata &Metadata, DebugLoc DL)
: VPWidenMemoryRecipe(VPDef::VPWidenLoadSC, Load, {Addr}, Consecutive,
- Reverse, Metadata, DL),
+ Metadata, DL),
VPRecipeValue(this, &Load) {
setMask(Mask);
}
VPWidenLoadRecipe *clone() override {
return new VPWidenLoadRecipe(cast<LoadInst>(Ingredient), getAddr(),
- getMask(), Consecutive, Reverse, *this,
- getDebugLoc());
+ getMask(), Consecutive, *this, getDebugLoc());
}
VP_CLASSOF_IMPL(VPDef::VPWidenLoadSC);
@@ -3427,7 +3413,7 @@ struct VPWidenLoadEVLRecipe final : public VPWidenMemoryRecipe,
VPWidenLoadEVLRecipe(VPWidenLoadRecipe &L, VPValue *Addr, VPValue &EVL,
VPValue *Mask)
: VPWidenMemoryRecipe(VPDef::VPWidenLoadEVLSC, L.getIngredient(),
- {Addr, &EVL}, L.isConsecutive(), L.isReverse(), L,
+ {Addr, &EVL}, L.isConsecutive(), L,
L.getDebugLoc()),
VPRecipeValue(this, &getIngredient()) {
setMask(Mask);
@@ -3466,17 +3452,17 @@ struct VPWidenLoadEVLRecipe final : public VPWidenMemoryRecipe,
/// to store to and an optional mask.
struct LLVM_ABI_FOR_TEST VPWidenStoreRecipe final : public VPWidenMemoryRecipe {
VPWidenStoreRecipe(StoreInst &Store, VPValue *Addr, VPValue *StoredVal,
- VPValue *Mask, bool Consecutive, bool Reverse,
+ VPValue *Mask, bool Consecutive,
const VPIRMetadata &Metadata, DebugLoc DL)
: VPWidenMemoryRecipe(VPDef::VPWidenStoreSC, Store, {Addr, StoredVal},
- Consecutive, Reverse, Metadata, DL) {
+ Consecutive, Metadata, DL) {
setMask(Mask);
}
VPWidenStoreRecipe *clone() override {
return new VPWidenStoreRecipe(cast<StoreInst>(Ingredient), getAddr(),
getStoredValue(), getMask(), Consecutive,
- Reverse, *this, getDebugLoc());
+ *this, getDebugLoc());
}
VP_CLASSOF_IMPL(VPDef::VPWidenStoreSC);
@@ -3511,8 +3497,8 @@ struct VPWidenStoreEVLRecipe final : public VPWidenMemoryRecipe {
VPWidenStoreEVLRecipe(VPWidenStoreRecipe &S, VPValue *Addr,
VPValue *StoredVal, VPValue &EVL, VPValue *Mask)
: VPWidenMemoryRecipe(VPDef::VPWidenStoreEVLSC, S.getIngredient(),
- {Addr, StoredVal, &EVL}, S.isConsecutive(),
- S.isReverse(), S, S.getDebugLoc()) {
+ {Addr, StoredVal, &EVL}, S.isConsecutive(), S,
+ S.getDebugLoc()) {
setMask(Mask);
}
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 0c6100c04f785..49297bc12cf18 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -1042,8 +1042,6 @@ InstructionCost VPRecipeWithIRFlags::getCostForRecipeWithOpcode(
return TTI::CastContextHint::Normal;
if (!WidenMemoryRecipe->isConsecutive())
return TTI::CastContextHint::GatherScatter;
- if (WidenMemoryRecipe->isReverse())
- return TTI::CastContextHint::Reversed;
if (WidenMemoryRecipe->isMasked())
return TTI::CastContextHint::Masked;
return TTI::CastContextHint::Normal;
@@ -1051,6 +1049,7 @@ InstructionCost VPRecipeWithIRFlags::getCostForRecipeWithOpcode(
VPValue *Operand = getOperand(0);
TTI::CastContextHint CCH = TTI::CastContextHint::None;
+ bool IsReverse = false;
// For Trunc/FPTrunc, get the context from the only user.
if (Opcode == Instruction::Trunc || Opcode == Instruction::FPTrunc) {
auto GetOnlyUser = [](const VPSingleDefRecipe *R) -> VPRecipeBase * {
@@ -1059,8 +1058,10 @@ InstructionCost VPRecipeWithIRFlags::getCostForRecipeWithOpcode(
return dyn_cast<VPRecipeBase>(*R->user_begin());
};
if (VPRecipeBase *Recipe = GetOnlyUser(this)) {
- if (match(Recipe, m_Reverse(m_VPValue())))
+ if (match(Recipe, m_Reverse(m_VPValue()))) {
Recipe = GetOnlyUser(cast<VPInstruction>(Recipe));
+ IsReverse = true;
+ }
if (Recipe)
CCH = ComputeCCH(Recipe);
}
@@ -1070,12 +1071,16 @@ InstructionCost VPRecipeWithIRFlags::getCostForRecipeWithOpcode(
Opcode == Instruction::FPExt) {
if (auto *Recipe = Operand->getDefiningRecipe()) {
VPValue *ReverseOp;
- if (match(Recipe, m_Reverse(m_VPValue(ReverseOp))))
+ if (match(Recipe, m_Reverse(m_VPValue(ReverseOp)))) {
Recipe = ReverseOp->getDefiningRecipe();
+ IsReverse = true;
+ }
if (Recipe)
CCH = ComputeCCH(Recipe);
}
}
+ if (IsReverse && CCH != TTI::CastContextHint::None)
+ CCH = TTI::CastContextHint::Reversed;
auto *ScalarSrcTy = Ctx.Types.inferScalarType(Operand);
Type *SrcTy = VF.isVector() ? toVectorTy(ScalarSrcTy, VF) : ScalarSrcTy;
@@ -1251,8 +1256,13 @@ InstructionCost VPInstruction::computeCost(ElementCount VF,
}
case VPInstruction::Reverse: {
assert(VF.isVector() && "Reverse operation must be vector type");
- auto *VectorTy = cast<VectorType>(
- toVectorTy(Ctx.Types.inferScalarType(getOperand(0)), VF));
+ Type *EltTy = Ctx.Types.inferScalarType(getOperand(0));
+ // Skip the reverse operation cost for the mask.
+ // FIXME: Remove this once redundant mask reverse operations can be
+ // eliminated by VPlanTransforms::cse before cost computation.
+ if (EltTy->isIntegerTy(1))
+ return 0;
+ auto *VectorTy = cast<VectorType>(toVectorTy(EltTy, VF));
return Ctx.TTI.getShuffleCost(TargetTransformInfo::SK_Reverse, VectorTy,
VectorTy, /*Mask=*/{}, Ctx.CostKind,
/*Index=*/0);
@@ -3636,9 +3646,6 @@ InstructionCost VPWidenMemoryRecipe::computeCost(ElementCount VF,
// TODO: Using the original IR may not be accurate.
// Currently, ARM will use the underlying IR to calculate gather/scatter
// instruction cost.
- assert(!Reverse &&
- "Inconsecutive memory access should not have the order.");
-
const Value *Ptr = getLoadStorePointerOperand(&Ingredient);
Type *PtrTy = Ptr->getType();
@@ -3682,13 +3689,8 @@ void VPWidenLoadRecipe::execute(VPTransformState &State) {
auto &Builder = State.Builder;
Value *Mask = nullptr;
- if (auto *VPMask = getMask()) {
- // Mask reversal is only needed for non-all-one (null) masks, as reverse
- // of a null all-one mask is a null mask.
+ if (auto *VPMask = getMask())
Mask = State.get(VPMask);
- if (isReverse())
- Mask = Builder.CreateVectorReverse(Mask, "reverse");
- }
Value *Addr = State.get(getAddr(), /*IsScalar*/ !CreateGather);
Value *NewLI;
@@ -3716,17 +3718,6 @@ void VPWidenLoadRecipe::printRecipe(raw_ostream &O, const Twine &Indent,
}
#endif
-/// Use all-true mask for reverse rather than actual mask, as it avoids a
-/// dependence w/o affecting the result.
-static Instruction *createReverseEVL(IRBuilderBase &Builder, Value *Operand,
- Value *EVL, const Twine &Name) {
- VectorType *ValTy = cast<VectorType>(Operand->getType());
- Value *AllTrueMask =
- Builder.CreateVectorSplat(ValTy->getElementCount(), Builder.getTrue());
- return Builder.CreateIntrinsic(ValTy, Intrinsic::experimental_vp_reverse,
- {Operand, AllTrueMask, EVL}, nullptr, Name);
-}
-
void VPWidenLoadEVLRecipe::execute(VPTransformState &State) {
Type *ScalarDataTy = getLoadStoreType(&Ingredient);
auto *DataTy = VectorType::get(ScalarDataTy, State.VF);
@@ -3737,13 +3728,10 @@ void VPWidenLoadEVLRecipe::execute(VPTransformState &State) {
Value *EVL = State.get(getEVL(), VPLane(0));
Value *Addr = State.get(getAddr(), !CreateGather);
Value *Mask = nullptr;
- if (VPValue *VPMask = getMask()) {
+ if (VPValue *VPMask = getMask())
Mask = State.get(VPMask);
- if (isReverse())
- Mask = createReverseEVL(Builder, Mask, EVL, "vp.reverse.mask");
- } else {
+ else
Mask = Builder.CreateVectorSplat(State.VF, Builder.getTrue());
- }
if (CreateGather) {
NewLI =
@@ -3795,13 +3783,8 @@ void VPWidenStoreRecipe::execute(VPTransformState &State) {
auto &Builder = State.Builder;
Value *Mask = nullptr;
- if (auto *VPMask = getMask()) {
- // Mask reversal is only needed for non-all-one (null) masks, as reverse
- // of a null all-one mask is a null mask.
+ if (auto *VPMask = getMask())
Mask = State.get(VPMask);
- if (isReverse())
- Mask = Builder.CreateVectorReverse(Mask, "reverse");
- }
Value *StoredVal = State.get(StoredVPValue);
Value *Addr = State.get(getAddr(), /*IsScalar*/ !CreateScatter);
@@ -3833,13 +3816,11 @@ void VPWidenStoreEVLRecipe::execute(VPTransformState &State) {
Value *StoredVal = State.get(StoredValue);
Value *EVL = State.get(getEVL(), VPLane(0));
Value *Mask = nullptr;
- if (VPValue *VPMask = getMask()) {
+ if (VPValue *VPMask = getMask())
Mask = State.get(VPMask);
- if (isReverse())
- Mask = createReverseEVL(Builder, Mask, EVL, "vp.reverse.mask");
- } else {
+ else
Mask = Builder.CreateVectorSplat(State.VF, Builder.getTrue());
- }
+
Value *Addr = State.get(getAddr(), !CreateScatter);
if (CreateScatter) {
NewSI = Builder.CreateIntrinsic(Type::getVoidTy(EVL->getContext()),
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index bfef277070db7..2fa575683f88d 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -79,12 +79,11 @@ bool VPlanTransforms::tryToConvertVPInstructionsToVPRecipes(
if (LoadInst *Load = dyn_cast<LoadInst>(Inst)) {
NewRecipe = new VPWidenLoadRecipe(
*Load, Ingredient.getOperand(0), nullptr /*Mask*/,
- false /*Consecutive*/, false /*Reverse*/, *VPI,
- Ingredient.getDebugLoc());
+ false /*Consecutive*/, *VPI, Ingredient.getDebugLoc());
} else if (StoreInst *Store = dyn_cast<StoreInst>(Inst)) {
NewRecipe = new VPWidenStoreRecipe(
*Store, Ingredient.getOperand(1), Ingredient.getOperand(0),
- nullptr /*Mask*/, false /*Consecutive*/, false /*Reverse*/, *VPI,
+ nullptr /*Mask*/, false /*Consecutive*/, *VPI,
Ingredient.getDebugLoc());
} else if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(Inst)) {
NewRecipe = new VPWidenGEPRecipe(GEP, Ingredient.operands(), *VPI,
@@ -1608,8 +1607,6 @@ static void narrowToSingleScalarRecipes(VPlan &Plan) {
auto *WidenStoreR = dyn_cast<VPWidenStoreRecipe>(&R);
if (WidenStoreR && vputils::isSingleScalar(WidenStoreR->getAddr()) &&
!WidenStoreR->isConsecutive()) {
- assert(!WidenStoreR->isReverse() &&
- "Not consecutive memory recipes shouldn't be reversed");
VPValue *Mask = WidenStoreR->getMask();
// Only convert the scatter to a scalar store if it is unmasked.
@@ -2918,18 +2915,28 @@ static VPRecipeBase *optimizeMaskToEVL(VPValue *HeaderMask,
return EVLEndPtr;
};
+ auto GetVPReverse = [&CurRecipe, &EVL, &TypeInfo, Plan,
+ DL](VPValue *V) -> VPWidenIntrinsicRecipe * {
+ auto *Reverse = new VPWidenIntrinsicRecipe(
+ Intrinsic::experimental_vp_reverse, {V, Plan->getTrue(), &EVL},
+ TypeInfo.inferScalarType(V), {}, {}, DL);
+ Reverse->insertBefore(&CurRecipe);
+ return Reverse;
+ };
+
if (match(&CurRecipe,
- m_MaskedLoad(m_VPValue(Addr), m_RemoveMask(HeaderMask, Mask))) &&
- !cast<VPWidenLoadRecipe>(CurRecipe).isReverse())
+ m_MaskedLoad(m_VPValue(Addr), m_RemoveMask(HeaderMask, Mask))))
return new VPWidenLoadEVLRecipe(cast<VPWidenLoadRecipe>(CurRecipe), Addr,
EVL, Mask);
VPValue *ReversedVal;
if (match(&CurRecipe, m_Reverse(m_VPValue(ReversedVal))) &&
match(ReversedVal,
- m_MaskedLoad(m_VPValue(EndPtr), m_RemoveMask(HeaderMask, Mask))) &&
- match(EndPtr, m_VecEndPtr(m_VPValue(Addr), m_Specific(&Plan->getVF()))) &&
- cast<VPWidenLoadRecipe>(ReversedVal)->isReverse()) {
+ m_MaskedLoad(m_VPValue(EndPtr),
+ m_Reverse(m_RemoveMask(HeaderMask, Mask)))) &&
+ match(EndPtr, m_VecEndPtr(m_VPValue(Addr), m_Specific(&Plan->getVF())))) {
+ if (Mask)
+ Mask = GetVPReverse(Mask);
auto *LoadR = new VPWidenLoadEVLRecipe(
*cast<VPWidenLoadRecipe>(ReversedVal), AdjustEndPtr(EndPtr), EVL, Mask);
LoadR->insertBefore(&CurRecipe);
@@ -2940,24 +2947,19 @@ static VPRecipeBase *optimizeMaskToEVL(VPValue *HeaderMask,
VPValue *StoredVal;
if (match(&CurRecipe, m_MaskedStore(m_VPValue(Addr), m_VPValue(StoredVal),
- m_RemoveMask(HeaderMask, Mask))) &&
- !cast<VPWidenStoreRecipe>(CurRecipe).isReverse())
+ m_RemoveMask(HeaderMask, Mask))))
return new VPWidenStoreEVLRecipe(cast<VPWidenStoreRecipe>(CurRecipe), Addr,
StoredVal, EVL, Mask);
if (match(&CurRecipe,
m_MaskedStore(m_VPValue(EndPtr), m_Reverse(m_VPValue(ReversedVal)),
- m_RemoveMask(HeaderMask, Mask))) &&
- match(EndPtr, m_VecEndPtr(m_VPValue(Addr), m_Specific(&Plan->getVF()))) &&
- cast<VPWidenStoreRecipe>(CurRecipe).isReverse()) {
- auto *NewReverse = new VPWidenIntrinsicRecipe(
- Intrinsic::experimental_vp_reverse,
- {ReversedVal, Plan->getTrue(), &EVL},
- TypeInfo.inferScalarType(ReversedVal), {}, {}, DL);
- NewReverse->insertBefore(&CurRecipe);
+ m_Reverse(m_RemoveMask(HeaderMask, Mask)))) &&
+ match(EndPtr, m_VecEndPtr(m_VPValue(Addr), m_Specific(&Plan->getVF())))) {
+ if (Mask)
+ Mask = GetVPReverse(Mask);
return new VPWidenStoreEVLRecipe(cast<VPWidenStoreRecipe>(CurRecipe),
- AdjustEndPtr(EndPtr), NewReverse, EVL,
- Mask);
+ AdjustEndPtr(EndPtr),
+ GetVPReverse(ReversedVal), EVL, Mask);
}
if (auto *Rdx = dyn_cast<VPReductionRecipe>(&CurRecipe))
@@ -4953,9 +4955,9 @@ narrowInterleaveGroupOp(VPValue *V, SmallPtrSetImpl<VPValue *> &NarrowedOps) {
// Narrow interleave group to wi...
[truncated]
|
|
@llvm/pr-subscribers-backend-risc-v Author: Mel Chen (Mel-Chen) ChangesFollowing #146525, separate the reverse mask from reverse access recipes. Base on #146525 Patch is 57.95 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/155579.diff 13 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 3f1e12e5d1cd0..cf8bd87bcb203 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7097,18 +7097,21 @@ static bool planContainsAdditionalSimplifications(VPlan &Plan,
CostCtx.isLegacyUniformAfterVectorization(AddrI, VF))
return true;
- if (WidenMemR->isReverse()) {
- // If the stored value of a reverse store is invariant, LICM will
- // hoist the reverse operation to the preheader. In this case, the
- // result of the VPlan-based cost model will diverge from that of
- // the legacy model.
+ // If the stored value of a reverse store is invariant, LICM will
+ // hoist the reverse operation to the preheader. In this case, the
+ // result of the VPlan-based cost model will diverge from that of
+ // the legacy model.
+ if (isa<VPWidenStoreRecipe, VPWidenStoreEVLRecipe>(WidenMemR)) {
+ VPValue *StoredVal;
if (auto *StoreR = dyn_cast<VPWidenStoreRecipe>(WidenMemR))
- if (StoreR->getStoredValue()->isDefinedOutsideLoopRegions())
- return true;
+ StoredVal = StoreR->getStoredValue();
+ else
+ StoredVal =
+ cast<VPWidenStoreEVLRecipe>(WidenMemR)->getStoredValue();
- if (auto *StoreR = dyn_cast<VPWidenStoreEVLRecipe>(WidenMemR))
- if (StoreR->getStoredValue()->isDefinedOutsideLoopRegions())
- return true;
+ using namespace VPlanPatternMatch;
+ if (match(StoredVal, m_Reverse(m_VPValue())))
+ return StoredVal->isDefinedOutsideLoopRegions();
}
}
@@ -7747,10 +7750,13 @@ VPRecipeBase *VPRecipeBuilder::tryToWidenMemory(VPInstruction *VPI,
Ptr = VectorPtr;
}
+ if (Reverse && Mask)
+ Mask = Builder.createNaryOp(VPInstruction::Reverse, Mask, I->getDebugLoc());
+
if (VPI->getOpcode() == Instruction::Load) {
auto *Load = cast<LoadInst>(I);
- auto *LoadR = new VPWidenLoadRecipe(*Load, Ptr, Mask, Consecutive, Reverse,
- *VPI, Load->getDebugLoc());
+ auto *LoadR = new VPWidenLoadRecipe(*Load, Ptr, Mask, Consecutive, *VPI,
+ Load->getDebugLoc());
if (Reverse) {
Builder.insert(LoadR);
return new VPInstruction(VPInstruction::Reverse, LoadR, {}, {},
@@ -7764,8 +7770,8 @@ VPRecipeBase *VPRecipeBuilder::tryToWidenMemory(VPInstruction *VPI,
if (Reverse)
StoredVal = Builder.createNaryOp(VPInstruction::Reverse, StoredVal,
Store->getDebugLoc());
- return new VPWidenStoreRecipe(*Store, Ptr, StoredVal, Mask, Consecutive,
- Reverse, *VPI, Store->getDebugLoc());
+ return new VPWidenStoreRecipe(*Store, Ptr, StoredVal, Mask, Consecutive, *VPI,
+ Store->getDebugLoc());
}
VPWidenIntOrFpInductionRecipe *
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 1d250322ccf63..5354ab266ed83 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -3300,9 +3300,6 @@ class LLVM_ABI_FOR_TEST VPWidenMemoryRecipe : public VPRecipeBase,
/// Whether the accessed addresses are consecutive.
bool Consecutive;
- /// Whether the consecutive accessed addresses are in reverse order.
- bool Reverse;
-
/// Whether the memory access is masked.
bool IsMasked = false;
@@ -3316,15 +3313,10 @@ class LLVM_ABI_FOR_TEST VPWidenMemoryRecipe : public VPRecipeBase,
VPWidenMemoryRecipe(const char unsigned SC, Instruction &I,
std::initializer_list<VPValue *> Operands,
- bool Consecutive, bool Reverse,
- const VPIRMetadata &Metadata, DebugLoc DL)
+ bool Consecutive, const VPIRMetadata &Metadata,
+ DebugLoc DL)
: VPRecipeBase(SC, Operands, DL), VPIRMetadata(Metadata), Ingredient(I),
- Alignment(getLoadStoreAlignment(&I)), Consecutive(Consecutive),
- Reverse(Reverse) {
- assert((Consecutive || !Reverse) && "Reverse implies consecutive");
- assert((isa<VPVectorEndPointerRecipe>(getAddr()) || !Reverse) &&
- "Reversed acccess without VPVectorEndPointerRecipe address?");
- }
+ Alignment(getLoadStoreAlignment(&I)), Consecutive(Consecutive) {}
public:
VPWidenMemoryRecipe *clone() override {
@@ -3346,10 +3338,6 @@ class LLVM_ABI_FOR_TEST VPWidenMemoryRecipe : public VPRecipeBase,
/// Return whether the loaded-from / stored-to addresses are consecutive.
bool isConsecutive() const { return Consecutive; }
- /// Return whether the consecutive loaded/stored addresses are in reverse
- /// order.
- bool isReverse() const { return Reverse; }
-
/// Return the address accessed by this recipe.
VPValue *getAddr() const { return getOperand(0); }
@@ -3383,18 +3371,16 @@ class LLVM_ABI_FOR_TEST VPWidenMemoryRecipe : public VPRecipeBase,
struct LLVM_ABI_FOR_TEST VPWidenLoadRecipe final : public VPWidenMemoryRecipe,
public VPRecipeValue {
VPWidenLoadRecipe(LoadInst &Load, VPValue *Addr, VPValue *Mask,
- bool Consecutive, bool Reverse,
- const VPIRMetadata &Metadata, DebugLoc DL)
+ bool Consecutive, const VPIRMetadata &Metadata, DebugLoc DL)
: VPWidenMemoryRecipe(VPDef::VPWidenLoadSC, Load, {Addr}, Consecutive,
- Reverse, Metadata, DL),
+ Metadata, DL),
VPRecipeValue(this, &Load) {
setMask(Mask);
}
VPWidenLoadRecipe *clone() override {
return new VPWidenLoadRecipe(cast<LoadInst>(Ingredient), getAddr(),
- getMask(), Consecutive, Reverse, *this,
- getDebugLoc());
+ getMask(), Consecutive, *this, getDebugLoc());
}
VP_CLASSOF_IMPL(VPDef::VPWidenLoadSC);
@@ -3427,7 +3413,7 @@ struct VPWidenLoadEVLRecipe final : public VPWidenMemoryRecipe,
VPWidenLoadEVLRecipe(VPWidenLoadRecipe &L, VPValue *Addr, VPValue &EVL,
VPValue *Mask)
: VPWidenMemoryRecipe(VPDef::VPWidenLoadEVLSC, L.getIngredient(),
- {Addr, &EVL}, L.isConsecutive(), L.isReverse(), L,
+ {Addr, &EVL}, L.isConsecutive(), L,
L.getDebugLoc()),
VPRecipeValue(this, &getIngredient()) {
setMask(Mask);
@@ -3466,17 +3452,17 @@ struct VPWidenLoadEVLRecipe final : public VPWidenMemoryRecipe,
/// to store to and an optional mask.
struct LLVM_ABI_FOR_TEST VPWidenStoreRecipe final : public VPWidenMemoryRecipe {
VPWidenStoreRecipe(StoreInst &Store, VPValue *Addr, VPValue *StoredVal,
- VPValue *Mask, bool Consecutive, bool Reverse,
+ VPValue *Mask, bool Consecutive,
const VPIRMetadata &Metadata, DebugLoc DL)
: VPWidenMemoryRecipe(VPDef::VPWidenStoreSC, Store, {Addr, StoredVal},
- Consecutive, Reverse, Metadata, DL) {
+ Consecutive, Metadata, DL) {
setMask(Mask);
}
VPWidenStoreRecipe *clone() override {
return new VPWidenStoreRecipe(cast<StoreInst>(Ingredient), getAddr(),
getStoredValue(), getMask(), Consecutive,
- Reverse, *this, getDebugLoc());
+ *this, getDebugLoc());
}
VP_CLASSOF_IMPL(VPDef::VPWidenStoreSC);
@@ -3511,8 +3497,8 @@ struct VPWidenStoreEVLRecipe final : public VPWidenMemoryRecipe {
VPWidenStoreEVLRecipe(VPWidenStoreRecipe &S, VPValue *Addr,
VPValue *StoredVal, VPValue &EVL, VPValue *Mask)
: VPWidenMemoryRecipe(VPDef::VPWidenStoreEVLSC, S.getIngredient(),
- {Addr, StoredVal, &EVL}, S.isConsecutive(),
- S.isReverse(), S, S.getDebugLoc()) {
+ {Addr, StoredVal, &EVL}, S.isConsecutive(), S,
+ S.getDebugLoc()) {
setMask(Mask);
}
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 0c6100c04f785..49297bc12cf18 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -1042,8 +1042,6 @@ InstructionCost VPRecipeWithIRFlags::getCostForRecipeWithOpcode(
return TTI::CastContextHint::Normal;
if (!WidenMemoryRecipe->isConsecutive())
return TTI::CastContextHint::GatherScatter;
- if (WidenMemoryRecipe->isReverse())
- return TTI::CastContextHint::Reversed;
if (WidenMemoryRecipe->isMasked())
return TTI::CastContextHint::Masked;
return TTI::CastContextHint::Normal;
@@ -1051,6 +1049,7 @@ InstructionCost VPRecipeWithIRFlags::getCostForRecipeWithOpcode(
VPValue *Operand = getOperand(0);
TTI::CastContextHint CCH = TTI::CastContextHint::None;
+ bool IsReverse = false;
// For Trunc/FPTrunc, get the context from the only user.
if (Opcode == Instruction::Trunc || Opcode == Instruction::FPTrunc) {
auto GetOnlyUser = [](const VPSingleDefRecipe *R) -> VPRecipeBase * {
@@ -1059,8 +1058,10 @@ InstructionCost VPRecipeWithIRFlags::getCostForRecipeWithOpcode(
return dyn_cast<VPRecipeBase>(*R->user_begin());
};
if (VPRecipeBase *Recipe = GetOnlyUser(this)) {
- if (match(Recipe, m_Reverse(m_VPValue())))
+ if (match(Recipe, m_Reverse(m_VPValue()))) {
Recipe = GetOnlyUser(cast<VPInstruction>(Recipe));
+ IsReverse = true;
+ }
if (Recipe)
CCH = ComputeCCH(Recipe);
}
@@ -1070,12 +1071,16 @@ InstructionCost VPRecipeWithIRFlags::getCostForRecipeWithOpcode(
Opcode == Instruction::FPExt) {
if (auto *Recipe = Operand->getDefiningRecipe()) {
VPValue *ReverseOp;
- if (match(Recipe, m_Reverse(m_VPValue(ReverseOp))))
+ if (match(Recipe, m_Reverse(m_VPValue(ReverseOp)))) {
Recipe = ReverseOp->getDefiningRecipe();
+ IsReverse = true;
+ }
if (Recipe)
CCH = ComputeCCH(Recipe);
}
}
+ if (IsReverse && CCH != TTI::CastContextHint::None)
+ CCH = TTI::CastContextHint::Reversed;
auto *ScalarSrcTy = Ctx.Types.inferScalarType(Operand);
Type *SrcTy = VF.isVector() ? toVectorTy(ScalarSrcTy, VF) : ScalarSrcTy;
@@ -1251,8 +1256,13 @@ InstructionCost VPInstruction::computeCost(ElementCount VF,
}
case VPInstruction::Reverse: {
assert(VF.isVector() && "Reverse operation must be vector type");
- auto *VectorTy = cast<VectorType>(
- toVectorTy(Ctx.Types.inferScalarType(getOperand(0)), VF));
+ Type *EltTy = Ctx.Types.inferScalarType(getOperand(0));
+ // Skip the reverse operation cost for the mask.
+ // FIXME: Remove this once redundant mask reverse operations can be
+ // eliminated by VPlanTransforms::cse before cost computation.
+ if (EltTy->isIntegerTy(1))
+ return 0;
+ auto *VectorTy = cast<VectorType>(toVectorTy(EltTy, VF));
return Ctx.TTI.getShuffleCost(TargetTransformInfo::SK_Reverse, VectorTy,
VectorTy, /*Mask=*/{}, Ctx.CostKind,
/*Index=*/0);
@@ -3636,9 +3646,6 @@ InstructionCost VPWidenMemoryRecipe::computeCost(ElementCount VF,
// TODO: Using the original IR may not be accurate.
// Currently, ARM will use the underlying IR to calculate gather/scatter
// instruction cost.
- assert(!Reverse &&
- "Inconsecutive memory access should not have the order.");
-
const Value *Ptr = getLoadStorePointerOperand(&Ingredient);
Type *PtrTy = Ptr->getType();
@@ -3682,13 +3689,8 @@ void VPWidenLoadRecipe::execute(VPTransformState &State) {
auto &Builder = State.Builder;
Value *Mask = nullptr;
- if (auto *VPMask = getMask()) {
- // Mask reversal is only needed for non-all-one (null) masks, as reverse
- // of a null all-one mask is a null mask.
+ if (auto *VPMask = getMask())
Mask = State.get(VPMask);
- if (isReverse())
- Mask = Builder.CreateVectorReverse(Mask, "reverse");
- }
Value *Addr = State.get(getAddr(), /*IsScalar*/ !CreateGather);
Value *NewLI;
@@ -3716,17 +3718,6 @@ void VPWidenLoadRecipe::printRecipe(raw_ostream &O, const Twine &Indent,
}
#endif
-/// Use all-true mask for reverse rather than actual mask, as it avoids a
-/// dependence w/o affecting the result.
-static Instruction *createReverseEVL(IRBuilderBase &Builder, Value *Operand,
- Value *EVL, const Twine &Name) {
- VectorType *ValTy = cast<VectorType>(Operand->getType());
- Value *AllTrueMask =
- Builder.CreateVectorSplat(ValTy->getElementCount(), Builder.getTrue());
- return Builder.CreateIntrinsic(ValTy, Intrinsic::experimental_vp_reverse,
- {Operand, AllTrueMask, EVL}, nullptr, Name);
-}
-
void VPWidenLoadEVLRecipe::execute(VPTransformState &State) {
Type *ScalarDataTy = getLoadStoreType(&Ingredient);
auto *DataTy = VectorType::get(ScalarDataTy, State.VF);
@@ -3737,13 +3728,10 @@ void VPWidenLoadEVLRecipe::execute(VPTransformState &State) {
Value *EVL = State.get(getEVL(), VPLane(0));
Value *Addr = State.get(getAddr(), !CreateGather);
Value *Mask = nullptr;
- if (VPValue *VPMask = getMask()) {
+ if (VPValue *VPMask = getMask())
Mask = State.get(VPMask);
- if (isReverse())
- Mask = createReverseEVL(Builder, Mask, EVL, "vp.reverse.mask");
- } else {
+ else
Mask = Builder.CreateVectorSplat(State.VF, Builder.getTrue());
- }
if (CreateGather) {
NewLI =
@@ -3795,13 +3783,8 @@ void VPWidenStoreRecipe::execute(VPTransformState &State) {
auto &Builder = State.Builder;
Value *Mask = nullptr;
- if (auto *VPMask = getMask()) {
- // Mask reversal is only needed for non-all-one (null) masks, as reverse
- // of a null all-one mask is a null mask.
+ if (auto *VPMask = getMask())
Mask = State.get(VPMask);
- if (isReverse())
- Mask = Builder.CreateVectorReverse(Mask, "reverse");
- }
Value *StoredVal = State.get(StoredVPValue);
Value *Addr = State.get(getAddr(), /*IsScalar*/ !CreateScatter);
@@ -3833,13 +3816,11 @@ void VPWidenStoreEVLRecipe::execute(VPTransformState &State) {
Value *StoredVal = State.get(StoredValue);
Value *EVL = State.get(getEVL(), VPLane(0));
Value *Mask = nullptr;
- if (VPValue *VPMask = getMask()) {
+ if (VPValue *VPMask = getMask())
Mask = State.get(VPMask);
- if (isReverse())
- Mask = createReverseEVL(Builder, Mask, EVL, "vp.reverse.mask");
- } else {
+ else
Mask = Builder.CreateVectorSplat(State.VF, Builder.getTrue());
- }
+
Value *Addr = State.get(getAddr(), !CreateScatter);
if (CreateScatter) {
NewSI = Builder.CreateIntrinsic(Type::getVoidTy(EVL->getContext()),
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index bfef277070db7..2fa575683f88d 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -79,12 +79,11 @@ bool VPlanTransforms::tryToConvertVPInstructionsToVPRecipes(
if (LoadInst *Load = dyn_cast<LoadInst>(Inst)) {
NewRecipe = new VPWidenLoadRecipe(
*Load, Ingredient.getOperand(0), nullptr /*Mask*/,
- false /*Consecutive*/, false /*Reverse*/, *VPI,
- Ingredient.getDebugLoc());
+ false /*Consecutive*/, *VPI, Ingredient.getDebugLoc());
} else if (StoreInst *Store = dyn_cast<StoreInst>(Inst)) {
NewRecipe = new VPWidenStoreRecipe(
*Store, Ingredient.getOperand(1), Ingredient.getOperand(0),
- nullptr /*Mask*/, false /*Consecutive*/, false /*Reverse*/, *VPI,
+ nullptr /*Mask*/, false /*Consecutive*/, *VPI,
Ingredient.getDebugLoc());
} else if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(Inst)) {
NewRecipe = new VPWidenGEPRecipe(GEP, Ingredient.operands(), *VPI,
@@ -1608,8 +1607,6 @@ static void narrowToSingleScalarRecipes(VPlan &Plan) {
auto *WidenStoreR = dyn_cast<VPWidenStoreRecipe>(&R);
if (WidenStoreR && vputils::isSingleScalar(WidenStoreR->getAddr()) &&
!WidenStoreR->isConsecutive()) {
- assert(!WidenStoreR->isReverse() &&
- "Not consecutive memory recipes shouldn't be reversed");
VPValue *Mask = WidenStoreR->getMask();
// Only convert the scatter to a scalar store if it is unmasked.
@@ -2918,18 +2915,28 @@ static VPRecipeBase *optimizeMaskToEVL(VPValue *HeaderMask,
return EVLEndPtr;
};
+ auto GetVPReverse = [&CurRecipe, &EVL, &TypeInfo, Plan,
+ DL](VPValue *V) -> VPWidenIntrinsicRecipe * {
+ auto *Reverse = new VPWidenIntrinsicRecipe(
+ Intrinsic::experimental_vp_reverse, {V, Plan->getTrue(), &EVL},
+ TypeInfo.inferScalarType(V), {}, {}, DL);
+ Reverse->insertBefore(&CurRecipe);
+ return Reverse;
+ };
+
if (match(&CurRecipe,
- m_MaskedLoad(m_VPValue(Addr), m_RemoveMask(HeaderMask, Mask))) &&
- !cast<VPWidenLoadRecipe>(CurRecipe).isReverse())
+ m_MaskedLoad(m_VPValue(Addr), m_RemoveMask(HeaderMask, Mask))))
return new VPWidenLoadEVLRecipe(cast<VPWidenLoadRecipe>(CurRecipe), Addr,
EVL, Mask);
VPValue *ReversedVal;
if (match(&CurRecipe, m_Reverse(m_VPValue(ReversedVal))) &&
match(ReversedVal,
- m_MaskedLoad(m_VPValue(EndPtr), m_RemoveMask(HeaderMask, Mask))) &&
- match(EndPtr, m_VecEndPtr(m_VPValue(Addr), m_Specific(&Plan->getVF()))) &&
- cast<VPWidenLoadRecipe>(ReversedVal)->isReverse()) {
+ m_MaskedLoad(m_VPValue(EndPtr),
+ m_Reverse(m_RemoveMask(HeaderMask, Mask)))) &&
+ match(EndPtr, m_VecEndPtr(m_VPValue(Addr), m_Specific(&Plan->getVF())))) {
+ if (Mask)
+ Mask = GetVPReverse(Mask);
auto *LoadR = new VPWidenLoadEVLRecipe(
*cast<VPWidenLoadRecipe>(ReversedVal), AdjustEndPtr(EndPtr), EVL, Mask);
LoadR->insertBefore(&CurRecipe);
@@ -2940,24 +2947,19 @@ static VPRecipeBase *optimizeMaskToEVL(VPValue *HeaderMask,
VPValue *StoredVal;
if (match(&CurRecipe, m_MaskedStore(m_VPValue(Addr), m_VPValue(StoredVal),
- m_RemoveMask(HeaderMask, Mask))) &&
- !cast<VPWidenStoreRecipe>(CurRecipe).isReverse())
+ m_RemoveMask(HeaderMask, Mask))))
return new VPWidenStoreEVLRecipe(cast<VPWidenStoreRecipe>(CurRecipe), Addr,
StoredVal, EVL, Mask);
if (match(&CurRecipe,
m_MaskedStore(m_VPValue(EndPtr), m_Reverse(m_VPValue(ReversedVal)),
- m_RemoveMask(HeaderMask, Mask))) &&
- match(EndPtr, m_VecEndPtr(m_VPValue(Addr), m_Specific(&Plan->getVF()))) &&
- cast<VPWidenStoreRecipe>(CurRecipe).isReverse()) {
- auto *NewReverse = new VPWidenIntrinsicRecipe(
- Intrinsic::experimental_vp_reverse,
- {ReversedVal, Plan->getTrue(), &EVL},
- TypeInfo.inferScalarType(ReversedVal), {}, {}, DL);
- NewReverse->insertBefore(&CurRecipe);
+ m_Reverse(m_RemoveMask(HeaderMask, Mask)))) &&
+ match(EndPtr, m_VecEndPtr(m_VPValue(Addr), m_Specific(&Plan->getVF())))) {
+ if (Mask)
+ Mask = GetVPReverse(Mask);
return new VPWidenStoreEVLRecipe(cast<VPWidenStoreRecipe>(CurRecipe),
- AdjustEndPtr(EndPtr), NewReverse, EVL,
- Mask);
+ AdjustEndPtr(EndPtr),
+ GetVPReverse(ReversedVal), EVL, Mask);
}
if (auto *Rdx = dyn_cast<VPReductionRecipe>(&CurRecipe))
@@ -4953,9 +4955,9 @@ narrowInterleaveGroupOp(VPValue *V, SmallPtrSetImpl<VPValue *> &NarrowedOps) {
// Narrow interleave group to wi...
[truncated]
|
I can understand @lukel97's approach, so I’ll go with this method for now to rebase and move the patch forward. |
🐧 Linux x64 Test Results
✅ The build succeeded and all tests passed. |
🪟 Windows x64 Test Results
✅ The build succeeded and all tests passed. |
e4bd5b8 to
cf38142
Compare
cf38142 |
| // TODO: Using the original IR may not be accurate. | ||
| // Currently, ARM will use the underlying IR to calculate gather/scatter | ||
| // instruction cost. | ||
| assert(!Reverse && |
There was a problem hiding this comment.
Could we retain this, by checking that the operand / single user is not reverse?
artagnon
left a comment
There was a problem hiding this comment.
Looks good, I'll defer to Florian and Luke's existing approvals.
|
I think one of the aarch64 buildbots is failing on llvm-test-suite with this: https://lab.llvm.org/buildbot/#/builders/198/builds/12563 Edit, also on the RISC-V buildbots: https://lab.llvm.org/buildbot/#/builders/210/builds/9728 |
|
Yes that looks related. I'll revert for now to get things back to green |
Reverts #155579 Assertion added triggers on some buildbots clang: /home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/llvm/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp:3840: virtual InstructionCost llvm::VPWidenMemoryRecipe::computeCost(ElementCount, VPCostContext &) const: Assertion `!IsReverse() && "Inconsecutive memory access should not have reverse order"' failed. PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script. Stack dump: 0. Program arguments: /home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/stage1.install/bin/clang -DNDEBUG -mcpu=neoverse-v2 -mllvm -scalable-vectorization=preferred -O3 -std=gnu17 -fcommon -Wno-error=incompatible-pointer-types -MD -MT MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/CMakeFiles/timberwolfmc.dir/finalpin.c.o -MF CMakeFiles/timberwolfmc.dir/finalpin.c.o.d -o CMakeFiles/timberwolfmc.dir/finalpin.c.o -c /home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/test/test-suite/MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/finalpin.c
…" (#189637) Reverts llvm/llvm-project#155579 Assertion added triggers on some buildbots clang: /home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/llvm/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp:3840: virtual InstructionCost llvm::VPWidenMemoryRecipe::computeCost(ElementCount, VPCostContext &) const: Assertion `!IsReverse() && "Inconsecutive memory access should not have reverse order"' failed. PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script. Stack dump: 0. Program arguments: /home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/stage1.install/bin/clang -DNDEBUG -mcpu=neoverse-v2 -mllvm -scalable-vectorization=preferred -O3 -std=gnu17 -fcommon -Wno-error=incompatible-pointer-types -MD -MT MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/CMakeFiles/timberwolfmc.dir/finalpin.c.o -MF CMakeFiles/timberwolfmc.dir/finalpin.c.o.d -o CMakeFiles/timberwolfmc.dir/finalpin.c.o -c /home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/test/test-suite/MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/finalpin.c
Following #146525, separate the reverse mask from reverse access recipes.
At the same time, remove the unused member variable
ReversefromVPWidenMemoryRecipe.This will help to reduce redundant reverse mask computations by VPlan-based common subexpression elimination.