[VPlan] Extract reverse mask from reverse accesses by Mel-Chen · Pull Request #155579 · llvm/llvm-project

Mel-Chen · 2025-08-27T09:13:18Z

Following #146525, separate the reverse mask from reverse access recipes.
At the same time, remove the unused member variable Reverse from VPWidenMemoryRecipe.
This will help to reduce redundant reverse mask computations by VPlan-based common subexpression elimination.

github-actions · 2025-08-27T09:16:28Z

✅ With the latest revision this PR passed the C/C++ code formatter.

Mel-Chen · 2025-08-27T09:34:51Z

llvm/test/Transforms/LoopVectorize/X86/masked_load_store.ll

#146525 (comment)

Mel-Chen · 2025-08-27T09:43:43Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

#146525 (comment)

@fhahn This could lead to overestimating the cost.
Multiple VPWidenMemoryRecipes may be able to share a single reverse mask, but if each VPWidenMemoryRecipe computes the cost of the reverse mask separately, the cost would be overestimated.

Mel-Chen · 2025-08-27T09:48:15Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

#146525 (comment)

@fhahn I think this is the most straightforward approach.
Currently, we could check whether the operand or user is a reverse operation, but in the future reverse operations might be simplified away, so relying on reverse operation is not a long-term approach.
The last option is to use the address for the check, since only reverse operations need to use VPVectorEndPointer.
What do you think?

llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse-mask4.ll

lukel97 · 2026-01-08T07:51:49Z

Reverse ping (no pun intended!)

Mel-Chen · 2026-01-12T09:46:30Z

Reverse ping (no pun intended!)

I’d like to understand whether you need mask reverses to be split out before costing the plan, or whether doing so after the cost plan would also work for you. This issue comes up because the mask reverse has never been accounted for in ::computeCost. If we split the mask reverse before costing the plan, the current VPlan-based cost model will compute a higher cost than the legacy cost model.

There are two ways to split the mask reverse from reverse load/stores.

The first, and faster approach (I do remember Florian mentioned a similar way earlier) is to lower reverse accesses into mask reverse + normal consecutive accesses in VPlanTransforms::convertToConcreteRecipes.

The second approach is to split out mask reverses directly in VPRecipeBuilder::tryToWidenMemory, and then run an extra CSE in VPlanTransforms::optimize to reduce redundant mask reverses and thus minimize overcounting of reverse cost.

Do you have a preference between these approaches first to go?

lukel97 · 2026-01-12T10:07:40Z

The second approach is to split out mask reverses directly in VPRecipeBuilder::tryToWidenMemory

Definitely this approach, as I'd really like to see any notion of "reversing" explicitly moved out of VPWidenMemoryRecipe.

As for the costing issue, I think we should start by just aligning the VPlan cost model with legacy cost model:

diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 5716a6f82259..58e977b30853 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -1226,7 +1226,10 @@ InstructionCost VPInstruction::computeCost(ElementCount VF,
   case VPInstruction::Reverse: {
     assert(VF.isVector() && "Reverse operation must be vector type");
     auto *VectorTy = cast<VectorType>(
-        toVectorTy(Ctx.Types.inferScalarType(getOperand(0)), VF));
+				      toVectorTy(Ctx.Types.inferScalarType(getOperand(0)), VF));
+    // TODO: Remove after adding some reverse simplifications
+    if (VectorTy->getElementType()->isIntegerTy(1))
+      return 0;
     return Ctx.TTI.getShuffleCost(TargetTransformInfo::SK_Reverse, VectorTy,
                                   VectorTy, /*Mask=*/{}, Ctx.CostKind,
                                   /*Index=*/0);

This seemed to work when I tried this locally last month, and I think this way we won't need to add any VPInstruction::Reverse simplifications in the first PR. We can defer those to a follow up PR, and hopefully then start costing the mask. Does that sound good?

fhahn

The second approach is to split out mask reverses directly in VPRecipeBuilder::tryToWidenMemory

Definitely this approach, as I'd really like to see any notion of "reversing" explicitly moved out of VPWidenMemoryRecipe.

As for the costing issue, I think we should start by just aligning the VPlan cost model with legacy cost model:
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 5716a6f82259..58e977b30853 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -1226,7 +1226,10 @@ InstructionCost VPInstruction::computeCost(ElementCount VF,
   case VPInstruction::Reverse: {
     assert(VF.isVector() && "Reverse operation must be vector type");
     auto *VectorTy = cast<VectorType>(
-        toVectorTy(Ctx.Types.inferScalarType(getOperand(0)), VF));
+				      toVectorTy(Ctx.Types.inferScalarType(getOperand(0)), VF));
+    // TODO: Remove after adding some reverse simplifications
+    if (VectorTy->getElementType()->isIntegerTy(1))
+      return 0;
     return Ctx.TTI.getShuffleCost(TargetTransformInfo::SK_Reverse, VectorTy,
                                   VectorTy, /*Mask=*/{}, Ctx.CostKind,
                                   /*Index=*/0);
This seemed to work when I tried this locally last month, and I think this way we won't need to add any VPInstruction::Reverse simplifications in the first PR. We can defer those to a follow up PR, and hopefully then start costing the mask. Does that sound good?

We already include the cost of some VPInstructions after computing cost from the legacy cost model IIRC, we could do the same for reverse to get the more accurate estimates

lukel97 · 2026-01-12T11:03:38Z

We already include the cost of some VPInstructions after computing cost from the legacy cost model IIRC, we could do the same for reverse to get the more accurate estimates

Yes we can do that too, but would it better to split the cost model change off into its own PR? I think this PR can be NFC-ish otherwise.

llvmbot · 2026-01-15T06:43:18Z

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Mel Chen (Mel-Chen)

Changes

Following #146525, separate the reverse mask from reverse access recipes.
At the same time, remove the unused member variable Reverse from VPWidenMemoryRecipe.
This will help to reduce redundant reverse mask computations once VPlan-based CSE is supported.

Base on #146525

Patch is 57.95 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/155579.diff

13 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+20-14)
(modified) llvm/lib/Transforms/Vectorize/VPlan.h (+12-26)
(modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+23-42)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+30-28)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse-mask4.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse-mask4.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/dbg-tail-folding-by-evl.ll (+8-8)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll (+8-8)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-reverse-load-store.ll (+12-14)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-uniform-store.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vplan-riscv-vector-reverse.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/X86/masked_load_store.ll (+6-6)
(modified) llvm/unittests/Transforms/Vectorize/VPlanTest.cpp (+3-4)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 3f1e12e5d1cd0..cf8bd87bcb203 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7097,18 +7097,21 @@ static bool planContainsAdditionalSimplifications(VPlan &Plan,
                          CostCtx.isLegacyUniformAfterVectorization(AddrI, VF))
           return true;
 
-        if (WidenMemR->isReverse()) {
-          // If the stored value of a reverse store is invariant, LICM will
-          // hoist the reverse operation to the preheader. In this case, the
-          // result of the VPlan-based cost model will diverge from that of
-          // the legacy model.
+        // If the stored value of a reverse store is invariant, LICM will
+        // hoist the reverse operation to the preheader. In this case, the
+        // result of the VPlan-based cost model will diverge from that of
+        // the legacy model.
+        if (isa<VPWidenStoreRecipe, VPWidenStoreEVLRecipe>(WidenMemR)) {
+          VPValue *StoredVal;
           if (auto *StoreR = dyn_cast<VPWidenStoreRecipe>(WidenMemR))
-            if (StoreR->getStoredValue()->isDefinedOutsideLoopRegions())
-              return true;
+            StoredVal = StoreR->getStoredValue();
+          else
+            StoredVal =
+                cast<VPWidenStoreEVLRecipe>(WidenMemR)->getStoredValue();
 
-          if (auto *StoreR = dyn_cast<VPWidenStoreEVLRecipe>(WidenMemR))
-            if (StoreR->getStoredValue()->isDefinedOutsideLoopRegions())
-              return true;
+          using namespace VPlanPatternMatch;
+          if (match(StoredVal, m_Reverse(m_VPValue())))
+            return StoredVal->isDefinedOutsideLoopRegions();
         }
       }
 
@@ -7747,10 +7750,13 @@ VPRecipeBase *VPRecipeBuilder::tryToWidenMemory(VPInstruction *VPI,
     Ptr = VectorPtr;
   }
 
+  if (Reverse && Mask)
+    Mask = Builder.createNaryOp(VPInstruction::Reverse, Mask, I->getDebugLoc());
+
   if (VPI->getOpcode() == Instruction::Load) {
     auto *Load = cast<LoadInst>(I);
-    auto *LoadR = new VPWidenLoadRecipe(*Load, Ptr, Mask, Consecutive, Reverse,
-                                        *VPI, Load->getDebugLoc());
+    auto *LoadR = new VPWidenLoadRecipe(*Load, Ptr, Mask, Consecutive, *VPI,
+                                        Load->getDebugLoc());
     if (Reverse) {
       Builder.insert(LoadR);
       return new VPInstruction(VPInstruction::Reverse, LoadR, {}, {},
@@ -7764,8 +7770,8 @@ VPRecipeBase *VPRecipeBuilder::tryToWidenMemory(VPInstruction *VPI,
   if (Reverse)
     StoredVal = Builder.createNaryOp(VPInstruction::Reverse, StoredVal,
                                      Store->getDebugLoc());
-  return new VPWidenStoreRecipe(*Store, Ptr, StoredVal, Mask, Consecutive,
-                                Reverse, *VPI, Store->getDebugLoc());
+  return new VPWidenStoreRecipe(*Store, Ptr, StoredVal, Mask, Consecutive, *VPI,
+                                Store->getDebugLoc());
 }
 
 VPWidenIntOrFpInductionRecipe *
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 1d250322ccf63..5354ab266ed83 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -3300,9 +3300,6 @@ class LLVM_ABI_FOR_TEST VPWidenMemoryRecipe : public VPRecipeBase,
   /// Whether the accessed addresses are consecutive.
   bool Consecutive;
 
-  /// Whether the consecutive accessed addresses are in reverse order.
-  bool Reverse;
-
   /// Whether the memory access is masked.
   bool IsMasked = false;
 
@@ -3316,15 +3313,10 @@ class LLVM_ABI_FOR_TEST VPWidenMemoryRecipe : public VPRecipeBase,
 
   VPWidenMemoryRecipe(const char unsigned SC, Instruction &I,
                       std::initializer_list<VPValue *> Operands,
-                      bool Consecutive, bool Reverse,
-                      const VPIRMetadata &Metadata, DebugLoc DL)
+                      bool Consecutive, const VPIRMetadata &Metadata,
+                      DebugLoc DL)
       : VPRecipeBase(SC, Operands, DL), VPIRMetadata(Metadata), Ingredient(I),
-        Alignment(getLoadStoreAlignment(&I)), Consecutive(Consecutive),
-        Reverse(Reverse) {
-    assert((Consecutive || !Reverse) && "Reverse implies consecutive");
-    assert((isa<VPVectorEndPointerRecipe>(getAddr()) || !Reverse) &&
-           "Reversed acccess without VPVectorEndPointerRecipe address?");
-  }
+        Alignment(getLoadStoreAlignment(&I)), Consecutive(Consecutive) {}
 
 public:
   VPWidenMemoryRecipe *clone() override {
@@ -3346,10 +3338,6 @@ class LLVM_ABI_FOR_TEST VPWidenMemoryRecipe : public VPRecipeBase,
   /// Return whether the loaded-from / stored-to addresses are consecutive.
   bool isConsecutive() const { return Consecutive; }
 
-  /// Return whether the consecutive loaded/stored addresses are in reverse
-  /// order.
-  bool isReverse() const { return Reverse; }
-
   /// Return the address accessed by this recipe.
   VPValue *getAddr() const { return getOperand(0); }
 
@@ -3383,18 +3371,16 @@ class LLVM_ABI_FOR_TEST VPWidenMemoryRecipe : public VPRecipeBase,
 struct LLVM_ABI_FOR_TEST VPWidenLoadRecipe final : public VPWidenMemoryRecipe,
                                                    public VPRecipeValue {
   VPWidenLoadRecipe(LoadInst &Load, VPValue *Addr, VPValue *Mask,
-                    bool Consecutive, bool Reverse,
-                    const VPIRMetadata &Metadata, DebugLoc DL)
+                    bool Consecutive, const VPIRMetadata &Metadata, DebugLoc DL)
       : VPWidenMemoryRecipe(VPDef::VPWidenLoadSC, Load, {Addr}, Consecutive,
-                            Reverse, Metadata, DL),
+                            Metadata, DL),
         VPRecipeValue(this, &Load) {
     setMask(Mask);
   }
 
   VPWidenLoadRecipe *clone() override {
     return new VPWidenLoadRecipe(cast<LoadInst>(Ingredient), getAddr(),
-                                 getMask(), Consecutive, Reverse, *this,
-                                 getDebugLoc());
+                                 getMask(), Consecutive, *this, getDebugLoc());
   }
 
   VP_CLASSOF_IMPL(VPDef::VPWidenLoadSC);
@@ -3427,7 +3413,7 @@ struct VPWidenLoadEVLRecipe final : public VPWidenMemoryRecipe,
   VPWidenLoadEVLRecipe(VPWidenLoadRecipe &L, VPValue *Addr, VPValue &EVL,
                        VPValue *Mask)
       : VPWidenMemoryRecipe(VPDef::VPWidenLoadEVLSC, L.getIngredient(),
-                            {Addr, &EVL}, L.isConsecutive(), L.isReverse(), L,
+                            {Addr, &EVL}, L.isConsecutive(), L,
                             L.getDebugLoc()),
         VPRecipeValue(this, &getIngredient()) {
     setMask(Mask);
@@ -3466,17 +3452,17 @@ struct VPWidenLoadEVLRecipe final : public VPWidenMemoryRecipe,
 /// to store to and an optional mask.
 struct LLVM_ABI_FOR_TEST VPWidenStoreRecipe final : public VPWidenMemoryRecipe {
   VPWidenStoreRecipe(StoreInst &Store, VPValue *Addr, VPValue *StoredVal,
-                     VPValue *Mask, bool Consecutive, bool Reverse,
+                     VPValue *Mask, bool Consecutive,
                      const VPIRMetadata &Metadata, DebugLoc DL)
       : VPWidenMemoryRecipe(VPDef::VPWidenStoreSC, Store, {Addr, StoredVal},
-                            Consecutive, Reverse, Metadata, DL) {
+                            Consecutive, Metadata, DL) {
     setMask(Mask);
   }
 
   VPWidenStoreRecipe *clone() override {
     return new VPWidenStoreRecipe(cast<StoreInst>(Ingredient), getAddr(),
                                   getStoredValue(), getMask(), Consecutive,
-                                  Reverse, *this, getDebugLoc());
+                                  *this, getDebugLoc());
   }
 
   VP_CLASSOF_IMPL(VPDef::VPWidenStoreSC);
@@ -3511,8 +3497,8 @@ struct VPWidenStoreEVLRecipe final : public VPWidenMemoryRecipe {
   VPWidenStoreEVLRecipe(VPWidenStoreRecipe &S, VPValue *Addr,
                         VPValue *StoredVal, VPValue &EVL, VPValue *Mask)
       : VPWidenMemoryRecipe(VPDef::VPWidenStoreEVLSC, S.getIngredient(),
-                            {Addr, StoredVal, &EVL}, S.isConsecutive(),
-                            S.isReverse(), S, S.getDebugLoc()) {
+                            {Addr, StoredVal, &EVL}, S.isConsecutive(), S,
+                            S.getDebugLoc()) {
     setMask(Mask);
   }
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 0c6100c04f785..49297bc12cf18 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -1042,8 +1042,6 @@ InstructionCost VPRecipeWithIRFlags::getCostForRecipeWithOpcode(
         return TTI::CastContextHint::Normal;
       if (!WidenMemoryRecipe->isConsecutive())
         return TTI::CastContextHint::GatherScatter;
-      if (WidenMemoryRecipe->isReverse())
-        return TTI::CastContextHint::Reversed;
       if (WidenMemoryRecipe->isMasked())
         return TTI::CastContextHint::Masked;
       return TTI::CastContextHint::Normal;
@@ -1051,6 +1049,7 @@ InstructionCost VPRecipeWithIRFlags::getCostForRecipeWithOpcode(
 
     VPValue *Operand = getOperand(0);
     TTI::CastContextHint CCH = TTI::CastContextHint::None;
+    bool IsReverse = false;
     // For Trunc/FPTrunc, get the context from the only user.
     if (Opcode == Instruction::Trunc || Opcode == Instruction::FPTrunc) {
       auto GetOnlyUser = [](const VPSingleDefRecipe *R) -> VPRecipeBase * {
@@ -1059,8 +1058,10 @@ InstructionCost VPRecipeWithIRFlags::getCostForRecipeWithOpcode(
         return dyn_cast<VPRecipeBase>(*R->user_begin());
       };
       if (VPRecipeBase *Recipe = GetOnlyUser(this)) {
-        if (match(Recipe, m_Reverse(m_VPValue())))
+        if (match(Recipe, m_Reverse(m_VPValue()))) {
           Recipe = GetOnlyUser(cast<VPInstruction>(Recipe));
+          IsReverse = true;
+        }
         if (Recipe)
           CCH = ComputeCCH(Recipe);
       }
@@ -1070,12 +1071,16 @@ InstructionCost VPRecipeWithIRFlags::getCostForRecipeWithOpcode(
              Opcode == Instruction::FPExt) {
       if (auto *Recipe = Operand->getDefiningRecipe()) {
         VPValue *ReverseOp;
-        if (match(Recipe, m_Reverse(m_VPValue(ReverseOp))))
+        if (match(Recipe, m_Reverse(m_VPValue(ReverseOp)))) {
           Recipe = ReverseOp->getDefiningRecipe();
+	  IsReverse = true;
+	}
         if (Recipe)
           CCH = ComputeCCH(Recipe);
       }
     }
+    if (IsReverse && CCH != TTI::CastContextHint::None)
+      CCH = TTI::CastContextHint::Reversed;
 
     auto *ScalarSrcTy = Ctx.Types.inferScalarType(Operand);
     Type *SrcTy = VF.isVector() ? toVectorTy(ScalarSrcTy, VF) : ScalarSrcTy;
@@ -1251,8 +1256,13 @@ InstructionCost VPInstruction::computeCost(ElementCount VF,
   }
   case VPInstruction::Reverse: {
     assert(VF.isVector() && "Reverse operation must be vector type");
-    auto *VectorTy = cast<VectorType>(
-        toVectorTy(Ctx.Types.inferScalarType(getOperand(0)), VF));
+    Type *EltTy = Ctx.Types.inferScalarType(getOperand(0));
+    // Skip the reverse operation cost for the mask.
+    // FIXME: Remove this once redundant mask reverse operations can be
+    // eliminated by VPlanTransforms::cse before cost computation.
+    if (EltTy->isIntegerTy(1))
+      return 0;
+    auto *VectorTy = cast<VectorType>(toVectorTy(EltTy, VF));
     return Ctx.TTI.getShuffleCost(TargetTransformInfo::SK_Reverse, VectorTy,
                                   VectorTy, /*Mask=*/{}, Ctx.CostKind,
                                   /*Index=*/0);
@@ -3636,9 +3646,6 @@ InstructionCost VPWidenMemoryRecipe::computeCost(ElementCount VF,
     // TODO: Using the original IR may not be accurate.
     // Currently, ARM will use the underlying IR to calculate gather/scatter
     // instruction cost.
-    assert(!Reverse &&
-           "Inconsecutive memory access should not have the order.");
-
     const Value *Ptr = getLoadStorePointerOperand(&Ingredient);
     Type *PtrTy = Ptr->getType();
 
@@ -3682,13 +3689,8 @@ void VPWidenLoadRecipe::execute(VPTransformState &State) {
 
   auto &Builder = State.Builder;
   Value *Mask = nullptr;
-  if (auto *VPMask = getMask()) {
-    // Mask reversal is only needed for non-all-one (null) masks, as reverse
-    // of a null all-one mask is a null mask.
+  if (auto *VPMask = getMask())
     Mask = State.get(VPMask);
-    if (isReverse())
-      Mask = Builder.CreateVectorReverse(Mask, "reverse");
-  }
 
   Value *Addr = State.get(getAddr(), /*IsScalar*/ !CreateGather);
   Value *NewLI;
@@ -3716,17 +3718,6 @@ void VPWidenLoadRecipe::printRecipe(raw_ostream &O, const Twine &Indent,
 }
 #endif
 
-/// Use all-true mask for reverse rather than actual mask, as it avoids a
-/// dependence w/o affecting the result.
-static Instruction *createReverseEVL(IRBuilderBase &Builder, Value *Operand,
-                                     Value *EVL, const Twine &Name) {
-  VectorType *ValTy = cast<VectorType>(Operand->getType());
-  Value *AllTrueMask =
-      Builder.CreateVectorSplat(ValTy->getElementCount(), Builder.getTrue());
-  return Builder.CreateIntrinsic(ValTy, Intrinsic::experimental_vp_reverse,
-                                 {Operand, AllTrueMask, EVL}, nullptr, Name);
-}
-
 void VPWidenLoadEVLRecipe::execute(VPTransformState &State) {
   Type *ScalarDataTy = getLoadStoreType(&Ingredient);
   auto *DataTy = VectorType::get(ScalarDataTy, State.VF);
@@ -3737,13 +3728,10 @@ void VPWidenLoadEVLRecipe::execute(VPTransformState &State) {
   Value *EVL = State.get(getEVL(), VPLane(0));
   Value *Addr = State.get(getAddr(), !CreateGather);
   Value *Mask = nullptr;
-  if (VPValue *VPMask = getMask()) {
+  if (VPValue *VPMask = getMask())
     Mask = State.get(VPMask);
-    if (isReverse())
-      Mask = createReverseEVL(Builder, Mask, EVL, "vp.reverse.mask");
-  } else {
+  else
     Mask = Builder.CreateVectorSplat(State.VF, Builder.getTrue());
-  }
 
   if (CreateGather) {
     NewLI =
@@ -3795,13 +3783,8 @@ void VPWidenStoreRecipe::execute(VPTransformState &State) {
   auto &Builder = State.Builder;
 
   Value *Mask = nullptr;
-  if (auto *VPMask = getMask()) {
-    // Mask reversal is only needed for non-all-one (null) masks, as reverse
-    // of a null all-one mask is a null mask.
+  if (auto *VPMask = getMask())
     Mask = State.get(VPMask);
-    if (isReverse())
-      Mask = Builder.CreateVectorReverse(Mask, "reverse");
-  }
 
   Value *StoredVal = State.get(StoredVPValue);
   Value *Addr = State.get(getAddr(), /*IsScalar*/ !CreateScatter);
@@ -3833,13 +3816,11 @@ void VPWidenStoreEVLRecipe::execute(VPTransformState &State) {
   Value *StoredVal = State.get(StoredValue);
   Value *EVL = State.get(getEVL(), VPLane(0));
   Value *Mask = nullptr;
-  if (VPValue *VPMask = getMask()) {
+  if (VPValue *VPMask = getMask())
     Mask = State.get(VPMask);
-    if (isReverse())
-      Mask = createReverseEVL(Builder, Mask, EVL, "vp.reverse.mask");
-  } else {
+  else
     Mask = Builder.CreateVectorSplat(State.VF, Builder.getTrue());
-  }
+
   Value *Addr = State.get(getAddr(), !CreateScatter);
   if (CreateScatter) {
     NewSI = Builder.CreateIntrinsic(Type::getVoidTy(EVL->getContext()),
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index bfef277070db7..2fa575683f88d 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -79,12 +79,11 @@ bool VPlanTransforms::tryToConvertVPInstructionsToVPRecipes(
         if (LoadInst *Load = dyn_cast<LoadInst>(Inst)) {
           NewRecipe = new VPWidenLoadRecipe(
               *Load, Ingredient.getOperand(0), nullptr /*Mask*/,
-              false /*Consecutive*/, false /*Reverse*/, *VPI,
-              Ingredient.getDebugLoc());
+              false /*Consecutive*/, *VPI, Ingredient.getDebugLoc());
         } else if (StoreInst *Store = dyn_cast<StoreInst>(Inst)) {
           NewRecipe = new VPWidenStoreRecipe(
               *Store, Ingredient.getOperand(1), Ingredient.getOperand(0),
-              nullptr /*Mask*/, false /*Consecutive*/, false /*Reverse*/, *VPI,
+              nullptr /*Mask*/, false /*Consecutive*/, *VPI,
               Ingredient.getDebugLoc());
         } else if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(Inst)) {
           NewRecipe = new VPWidenGEPRecipe(GEP, Ingredient.operands(), *VPI,
@@ -1608,8 +1607,6 @@ static void narrowToSingleScalarRecipes(VPlan &Plan) {
       auto *WidenStoreR = dyn_cast<VPWidenStoreRecipe>(&R);
       if (WidenStoreR && vputils::isSingleScalar(WidenStoreR->getAddr()) &&
           !WidenStoreR->isConsecutive()) {
-        assert(!WidenStoreR->isReverse() &&
-               "Not consecutive memory recipes shouldn't be reversed");
         VPValue *Mask = WidenStoreR->getMask();
 
         // Only convert the scatter to a scalar store if it is unmasked.
@@ -2918,18 +2915,28 @@ static VPRecipeBase *optimizeMaskToEVL(VPValue *HeaderMask,
     return EVLEndPtr;
   };
 
+  auto GetVPReverse = [&CurRecipe, &EVL, &TypeInfo, Plan,
+                       DL](VPValue *V) -> VPWidenIntrinsicRecipe * {
+    auto *Reverse = new VPWidenIntrinsicRecipe(
+        Intrinsic::experimental_vp_reverse, {V, Plan->getTrue(), &EVL},
+        TypeInfo.inferScalarType(V), {}, {}, DL);
+    Reverse->insertBefore(&CurRecipe);
+    return Reverse;
+  };
+
   if (match(&CurRecipe,
-            m_MaskedLoad(m_VPValue(Addr), m_RemoveMask(HeaderMask, Mask))) &&
-      !cast<VPWidenLoadRecipe>(CurRecipe).isReverse())
+            m_MaskedLoad(m_VPValue(Addr), m_RemoveMask(HeaderMask, Mask))))
     return new VPWidenLoadEVLRecipe(cast<VPWidenLoadRecipe>(CurRecipe), Addr,
                                     EVL, Mask);
 
   VPValue *ReversedVal;
   if (match(&CurRecipe, m_Reverse(m_VPValue(ReversedVal))) &&
       match(ReversedVal,
-            m_MaskedLoad(m_VPValue(EndPtr), m_RemoveMask(HeaderMask, Mask))) &&
-      match(EndPtr, m_VecEndPtr(m_VPValue(Addr), m_Specific(&Plan->getVF()))) &&
-      cast<VPWidenLoadRecipe>(ReversedVal)->isReverse()) {
+            m_MaskedLoad(m_VPValue(EndPtr),
+                         m_Reverse(m_RemoveMask(HeaderMask, Mask)))) &&
+      match(EndPtr, m_VecEndPtr(m_VPValue(Addr), m_Specific(&Plan->getVF())))) {
+    if (Mask)
+      Mask = GetVPReverse(Mask);
     auto *LoadR = new VPWidenLoadEVLRecipe(
         *cast<VPWidenLoadRecipe>(ReversedVal), AdjustEndPtr(EndPtr), EVL, Mask);
     LoadR->insertBefore(&CurRecipe);
@@ -2940,24 +2947,19 @@ static VPRecipeBase *optimizeMaskToEVL(VPValue *HeaderMask,
 
   VPValue *StoredVal;
   if (match(&CurRecipe, m_MaskedStore(m_VPValue(Addr), m_VPValue(StoredVal),
-                                      m_RemoveMask(HeaderMask, Mask))) &&
-      !cast<VPWidenStoreRecipe>(CurRecipe).isReverse())
+                                      m_RemoveMask(HeaderMask, Mask))))
     return new VPWidenStoreEVLRecipe(cast<VPWidenStoreRecipe>(CurRecipe), Addr,
                                      StoredVal, EVL, Mask);
 
   if (match(&CurRecipe,
             m_MaskedStore(m_VPValue(EndPtr), m_Reverse(m_VPValue(ReversedVal)),
-                          m_RemoveMask(HeaderMask, Mask))) &&
-      match(EndPtr, m_VecEndPtr(m_VPValue(Addr), m_Specific(&Plan->getVF()))) &&
-      cast<VPWidenStoreRecipe>(CurRecipe).isReverse()) {
-    auto *NewReverse = new VPWidenIntrinsicRecipe(
-        Intrinsic::experimental_vp_reverse,
-        {ReversedVal, Plan->getTrue(), &EVL},
-        TypeInfo.inferScalarType(ReversedVal), {}, {}, DL);
-    NewReverse->insertBefore(&CurRecipe);
+                          m_Reverse(m_RemoveMask(HeaderMask, Mask)))) &&
+      match(EndPtr, m_VecEndPtr(m_VPValue(Addr), m_Specific(&Plan->getVF())))) {
+    if (Mask)
+      Mask = GetVPReverse(Mask);
     return new VPWidenStoreEVLRecipe(cast<VPWidenStoreRecipe>(CurRecipe),
-                                     AdjustEndPtr(EndPtr), NewReverse, EVL,
-                                     Mask);
+                                     AdjustEndPtr(EndPtr),
+                                     GetVPReverse(ReversedVal), EVL, Mask);
   }
 
   if (auto *Rdx = dyn_cast<VPReductionRecipe>(&CurRecipe))
@@ -4953,9 +4955,9 @@ narrowInterleaveGroupOp(VPValue *V, SmallPtrSetImpl<VPValue *> &NarrowedOps) {
     // Narrow interleave group to wi...
[truncated]

llvmbot · 2026-01-15T06:43:19Z

@llvm/pr-subscribers-backend-risc-v

Author: Mel Chen (Mel-Chen)

Changes

Following #146525, separate the reverse mask from reverse access recipes.
At the same time, remove the unused member variable Reverse from VPWidenMemoryRecipe.
This will help to reduce redundant reverse mask computations once VPlan-based CSE is supported.

Base on #146525

Patch is 57.95 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/155579.diff

13 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+20-14)
(modified) llvm/lib/Transforms/Vectorize/VPlan.h (+12-26)
(modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+23-42)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+30-28)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse-mask4.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse-mask4.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/dbg-tail-folding-by-evl.ll (+8-8)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll (+8-8)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-reverse-load-store.ll (+12-14)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-uniform-store.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vplan-riscv-vector-reverse.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/X86/masked_load_store.ll (+6-6)
(modified) llvm/unittests/Transforms/Vectorize/VPlanTest.cpp (+3-4)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 3f1e12e5d1cd0..cf8bd87bcb203 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7097,18 +7097,21 @@ static bool planContainsAdditionalSimplifications(VPlan &Plan,
                          CostCtx.isLegacyUniformAfterVectorization(AddrI, VF))
           return true;
 
-        if (WidenMemR->isReverse()) {
-          // If the stored value of a reverse store is invariant, LICM will
-          // hoist the reverse operation to the preheader. In this case, the
-          // result of the VPlan-based cost model will diverge from that of
-          // the legacy model.
+        // If the stored value of a reverse store is invariant, LICM will
+        // hoist the reverse operation to the preheader. In this case, the
+        // result of the VPlan-based cost model will diverge from that of
+        // the legacy model.
+        if (isa<VPWidenStoreRecipe, VPWidenStoreEVLRecipe>(WidenMemR)) {
+          VPValue *StoredVal;
           if (auto *StoreR = dyn_cast<VPWidenStoreRecipe>(WidenMemR))
-            if (StoreR->getStoredValue()->isDefinedOutsideLoopRegions())
-              return true;
+            StoredVal = StoreR->getStoredValue();
+          else
+            StoredVal =
+                cast<VPWidenStoreEVLRecipe>(WidenMemR)->getStoredValue();
 
-          if (auto *StoreR = dyn_cast<VPWidenStoreEVLRecipe>(WidenMemR))
-            if (StoreR->getStoredValue()->isDefinedOutsideLoopRegions())
-              return true;
+          using namespace VPlanPatternMatch;
+          if (match(StoredVal, m_Reverse(m_VPValue())))
+            return StoredVal->isDefinedOutsideLoopRegions();
         }
       }
 
@@ -7747,10 +7750,13 @@ VPRecipeBase *VPRecipeBuilder::tryToWidenMemory(VPInstruction *VPI,
     Ptr = VectorPtr;
   }
 
+  if (Reverse && Mask)
+    Mask = Builder.createNaryOp(VPInstruction::Reverse, Mask, I->getDebugLoc());
+
   if (VPI->getOpcode() == Instruction::Load) {
     auto *Load = cast<LoadInst>(I);
-    auto *LoadR = new VPWidenLoadRecipe(*Load, Ptr, Mask, Consecutive, Reverse,
-                                        *VPI, Load->getDebugLoc());
+    auto *LoadR = new VPWidenLoadRecipe(*Load, Ptr, Mask, Consecutive, *VPI,
+                                        Load->getDebugLoc());
     if (Reverse) {
       Builder.insert(LoadR);
       return new VPInstruction(VPInstruction::Reverse, LoadR, {}, {},
@@ -7764,8 +7770,8 @@ VPRecipeBase *VPRecipeBuilder::tryToWidenMemory(VPInstruction *VPI,
   if (Reverse)
     StoredVal = Builder.createNaryOp(VPInstruction::Reverse, StoredVal,
                                      Store->getDebugLoc());
-  return new VPWidenStoreRecipe(*Store, Ptr, StoredVal, Mask, Consecutive,
-                                Reverse, *VPI, Store->getDebugLoc());
+  return new VPWidenStoreRecipe(*Store, Ptr, StoredVal, Mask, Consecutive, *VPI,
+                                Store->getDebugLoc());
 }
 
 VPWidenIntOrFpInductionRecipe *
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 1d250322ccf63..5354ab266ed83 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -3300,9 +3300,6 @@ class LLVM_ABI_FOR_TEST VPWidenMemoryRecipe : public VPRecipeBase,
   /// Whether the accessed addresses are consecutive.
   bool Consecutive;
 
-  /// Whether the consecutive accessed addresses are in reverse order.
-  bool Reverse;
-
   /// Whether the memory access is masked.
   bool IsMasked = false;
 
@@ -3316,15 +3313,10 @@ class LLVM_ABI_FOR_TEST VPWidenMemoryRecipe : public VPRecipeBase,
 
   VPWidenMemoryRecipe(const char unsigned SC, Instruction &I,
                       std::initializer_list<VPValue *> Operands,
-                      bool Consecutive, bool Reverse,
-                      const VPIRMetadata &Metadata, DebugLoc DL)
+                      bool Consecutive, const VPIRMetadata &Metadata,
+                      DebugLoc DL)
       : VPRecipeBase(SC, Operands, DL), VPIRMetadata(Metadata), Ingredient(I),
-        Alignment(getLoadStoreAlignment(&I)), Consecutive(Consecutive),
-        Reverse(Reverse) {
-    assert((Consecutive || !Reverse) && "Reverse implies consecutive");
-    assert((isa<VPVectorEndPointerRecipe>(getAddr()) || !Reverse) &&
-           "Reversed acccess without VPVectorEndPointerRecipe address?");
-  }
+        Alignment(getLoadStoreAlignment(&I)), Consecutive(Consecutive) {}
 
 public:
   VPWidenMemoryRecipe *clone() override {
@@ -3346,10 +3338,6 @@ class LLVM_ABI_FOR_TEST VPWidenMemoryRecipe : public VPRecipeBase,
   /// Return whether the loaded-from / stored-to addresses are consecutive.
   bool isConsecutive() const { return Consecutive; }
 
-  /// Return whether the consecutive loaded/stored addresses are in reverse
-  /// order.
-  bool isReverse() const { return Reverse; }
-
   /// Return the address accessed by this recipe.
   VPValue *getAddr() const { return getOperand(0); }
 
@@ -3383,18 +3371,16 @@ class LLVM_ABI_FOR_TEST VPWidenMemoryRecipe : public VPRecipeBase,
 struct LLVM_ABI_FOR_TEST VPWidenLoadRecipe final : public VPWidenMemoryRecipe,
                                                    public VPRecipeValue {
   VPWidenLoadRecipe(LoadInst &Load, VPValue *Addr, VPValue *Mask,
-                    bool Consecutive, bool Reverse,
-                    const VPIRMetadata &Metadata, DebugLoc DL)
+                    bool Consecutive, const VPIRMetadata &Metadata, DebugLoc DL)
       : VPWidenMemoryRecipe(VPDef::VPWidenLoadSC, Load, {Addr}, Consecutive,
-                            Reverse, Metadata, DL),
+                            Metadata, DL),
         VPRecipeValue(this, &Load) {
     setMask(Mask);
   }
 
   VPWidenLoadRecipe *clone() override {
     return new VPWidenLoadRecipe(cast<LoadInst>(Ingredient), getAddr(),
-                                 getMask(), Consecutive, Reverse, *this,
-                                 getDebugLoc());
+                                 getMask(), Consecutive, *this, getDebugLoc());
   }
 
   VP_CLASSOF_IMPL(VPDef::VPWidenLoadSC);
@@ -3427,7 +3413,7 @@ struct VPWidenLoadEVLRecipe final : public VPWidenMemoryRecipe,
   VPWidenLoadEVLRecipe(VPWidenLoadRecipe &L, VPValue *Addr, VPValue &EVL,
                        VPValue *Mask)
       : VPWidenMemoryRecipe(VPDef::VPWidenLoadEVLSC, L.getIngredient(),
-                            {Addr, &EVL}, L.isConsecutive(), L.isReverse(), L,
+                            {Addr, &EVL}, L.isConsecutive(), L,
                             L.getDebugLoc()),
         VPRecipeValue(this, &getIngredient()) {
     setMask(Mask);
@@ -3466,17 +3452,17 @@ struct VPWidenLoadEVLRecipe final : public VPWidenMemoryRecipe,
 /// to store to and an optional mask.
 struct LLVM_ABI_FOR_TEST VPWidenStoreRecipe final : public VPWidenMemoryRecipe {
   VPWidenStoreRecipe(StoreInst &Store, VPValue *Addr, VPValue *StoredVal,
-                     VPValue *Mask, bool Consecutive, bool Reverse,
+                     VPValue *Mask, bool Consecutive,
                      const VPIRMetadata &Metadata, DebugLoc DL)
       : VPWidenMemoryRecipe(VPDef::VPWidenStoreSC, Store, {Addr, StoredVal},
-                            Consecutive, Reverse, Metadata, DL) {
+                            Consecutive, Metadata, DL) {
     setMask(Mask);
   }
 
   VPWidenStoreRecipe *clone() override {
     return new VPWidenStoreRecipe(cast<StoreInst>(Ingredient), getAddr(),
                                   getStoredValue(), getMask(), Consecutive,
-                                  Reverse, *this, getDebugLoc());
+                                  *this, getDebugLoc());
   }
 
   VP_CLASSOF_IMPL(VPDef::VPWidenStoreSC);
@@ -3511,8 +3497,8 @@ struct VPWidenStoreEVLRecipe final : public VPWidenMemoryRecipe {
   VPWidenStoreEVLRecipe(VPWidenStoreRecipe &S, VPValue *Addr,
                         VPValue *StoredVal, VPValue &EVL, VPValue *Mask)
       : VPWidenMemoryRecipe(VPDef::VPWidenStoreEVLSC, S.getIngredient(),
-                            {Addr, StoredVal, &EVL}, S.isConsecutive(),
-                            S.isReverse(), S, S.getDebugLoc()) {
+                            {Addr, StoredVal, &EVL}, S.isConsecutive(), S,
+                            S.getDebugLoc()) {
     setMask(Mask);
   }
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 0c6100c04f785..49297bc12cf18 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -1042,8 +1042,6 @@ InstructionCost VPRecipeWithIRFlags::getCostForRecipeWithOpcode(
         return TTI::CastContextHint::Normal;
       if (!WidenMemoryRecipe->isConsecutive())
         return TTI::CastContextHint::GatherScatter;
-      if (WidenMemoryRecipe->isReverse())
-        return TTI::CastContextHint::Reversed;
       if (WidenMemoryRecipe->isMasked())
         return TTI::CastContextHint::Masked;
       return TTI::CastContextHint::Normal;
@@ -1051,6 +1049,7 @@ InstructionCost VPRecipeWithIRFlags::getCostForRecipeWithOpcode(
 
     VPValue *Operand = getOperand(0);
     TTI::CastContextHint CCH = TTI::CastContextHint::None;
+    bool IsReverse = false;
     // For Trunc/FPTrunc, get the context from the only user.
     if (Opcode == Instruction::Trunc || Opcode == Instruction::FPTrunc) {
       auto GetOnlyUser = [](const VPSingleDefRecipe *R) -> VPRecipeBase * {
@@ -1059,8 +1058,10 @@ InstructionCost VPRecipeWithIRFlags::getCostForRecipeWithOpcode(
         return dyn_cast<VPRecipeBase>(*R->user_begin());
       };
       if (VPRecipeBase *Recipe = GetOnlyUser(this)) {
-        if (match(Recipe, m_Reverse(m_VPValue())))
+        if (match(Recipe, m_Reverse(m_VPValue()))) {
           Recipe = GetOnlyUser(cast<VPInstruction>(Recipe));
+          IsReverse = true;
+        }
         if (Recipe)
           CCH = ComputeCCH(Recipe);
       }
@@ -1070,12 +1071,16 @@ InstructionCost VPRecipeWithIRFlags::getCostForRecipeWithOpcode(
              Opcode == Instruction::FPExt) {
       if (auto *Recipe = Operand->getDefiningRecipe()) {
         VPValue *ReverseOp;
-        if (match(Recipe, m_Reverse(m_VPValue(ReverseOp))))
+        if (match(Recipe, m_Reverse(m_VPValue(ReverseOp)))) {
           Recipe = ReverseOp->getDefiningRecipe();
+	  IsReverse = true;
+	}
         if (Recipe)
           CCH = ComputeCCH(Recipe);
       }
     }
+    if (IsReverse && CCH != TTI::CastContextHint::None)
+      CCH = TTI::CastContextHint::Reversed;
 
     auto *ScalarSrcTy = Ctx.Types.inferScalarType(Operand);
     Type *SrcTy = VF.isVector() ? toVectorTy(ScalarSrcTy, VF) : ScalarSrcTy;
@@ -1251,8 +1256,13 @@ InstructionCost VPInstruction::computeCost(ElementCount VF,
   }
   case VPInstruction::Reverse: {
     assert(VF.isVector() && "Reverse operation must be vector type");
-    auto *VectorTy = cast<VectorType>(
-        toVectorTy(Ctx.Types.inferScalarType(getOperand(0)), VF));
+    Type *EltTy = Ctx.Types.inferScalarType(getOperand(0));
+    // Skip the reverse operation cost for the mask.
+    // FIXME: Remove this once redundant mask reverse operations can be
+    // eliminated by VPlanTransforms::cse before cost computation.
+    if (EltTy->isIntegerTy(1))
+      return 0;
+    auto *VectorTy = cast<VectorType>(toVectorTy(EltTy, VF));
     return Ctx.TTI.getShuffleCost(TargetTransformInfo::SK_Reverse, VectorTy,
                                   VectorTy, /*Mask=*/{}, Ctx.CostKind,
                                   /*Index=*/0);
@@ -3636,9 +3646,6 @@ InstructionCost VPWidenMemoryRecipe::computeCost(ElementCount VF,
     // TODO: Using the original IR may not be accurate.
     // Currently, ARM will use the underlying IR to calculate gather/scatter
     // instruction cost.
-    assert(!Reverse &&
-           "Inconsecutive memory access should not have the order.");
-
     const Value *Ptr = getLoadStorePointerOperand(&Ingredient);
     Type *PtrTy = Ptr->getType();
 
@@ -3682,13 +3689,8 @@ void VPWidenLoadRecipe::execute(VPTransformState &State) {
 
   auto &Builder = State.Builder;
   Value *Mask = nullptr;
-  if (auto *VPMask = getMask()) {
-    // Mask reversal is only needed for non-all-one (null) masks, as reverse
-    // of a null all-one mask is a null mask.
+  if (auto *VPMask = getMask())
     Mask = State.get(VPMask);
-    if (isReverse())
-      Mask = Builder.CreateVectorReverse(Mask, "reverse");
-  }
 
   Value *Addr = State.get(getAddr(), /*IsScalar*/ !CreateGather);
   Value *NewLI;
@@ -3716,17 +3718,6 @@ void VPWidenLoadRecipe::printRecipe(raw_ostream &O, const Twine &Indent,
 }
 #endif
 
-/// Use all-true mask for reverse rather than actual mask, as it avoids a
-/// dependence w/o affecting the result.
-static Instruction *createReverseEVL(IRBuilderBase &Builder, Value *Operand,
-                                     Value *EVL, const Twine &Name) {
-  VectorType *ValTy = cast<VectorType>(Operand->getType());
-  Value *AllTrueMask =
-      Builder.CreateVectorSplat(ValTy->getElementCount(), Builder.getTrue());
-  return Builder.CreateIntrinsic(ValTy, Intrinsic::experimental_vp_reverse,
-                                 {Operand, AllTrueMask, EVL}, nullptr, Name);
-}
-
 void VPWidenLoadEVLRecipe::execute(VPTransformState &State) {
   Type *ScalarDataTy = getLoadStoreType(&Ingredient);
   auto *DataTy = VectorType::get(ScalarDataTy, State.VF);
@@ -3737,13 +3728,10 @@ void VPWidenLoadEVLRecipe::execute(VPTransformState &State) {
   Value *EVL = State.get(getEVL(), VPLane(0));
   Value *Addr = State.get(getAddr(), !CreateGather);
   Value *Mask = nullptr;
-  if (VPValue *VPMask = getMask()) {
+  if (VPValue *VPMask = getMask())
     Mask = State.get(VPMask);
-    if (isReverse())
-      Mask = createReverseEVL(Builder, Mask, EVL, "vp.reverse.mask");
-  } else {
+  else
     Mask = Builder.CreateVectorSplat(State.VF, Builder.getTrue());
-  }
 
   if (CreateGather) {
     NewLI =
@@ -3795,13 +3783,8 @@ void VPWidenStoreRecipe::execute(VPTransformState &State) {
   auto &Builder = State.Builder;
 
   Value *Mask = nullptr;
-  if (auto *VPMask = getMask()) {
-    // Mask reversal is only needed for non-all-one (null) masks, as reverse
-    // of a null all-one mask is a null mask.
+  if (auto *VPMask = getMask())
     Mask = State.get(VPMask);
-    if (isReverse())
-      Mask = Builder.CreateVectorReverse(Mask, "reverse");
-  }
 
   Value *StoredVal = State.get(StoredVPValue);
   Value *Addr = State.get(getAddr(), /*IsScalar*/ !CreateScatter);
@@ -3833,13 +3816,11 @@ void VPWidenStoreEVLRecipe::execute(VPTransformState &State) {
   Value *StoredVal = State.get(StoredValue);
   Value *EVL = State.get(getEVL(), VPLane(0));
   Value *Mask = nullptr;
-  if (VPValue *VPMask = getMask()) {
+  if (VPValue *VPMask = getMask())
     Mask = State.get(VPMask);
-    if (isReverse())
-      Mask = createReverseEVL(Builder, Mask, EVL, "vp.reverse.mask");
-  } else {
+  else
     Mask = Builder.CreateVectorSplat(State.VF, Builder.getTrue());
-  }
+
   Value *Addr = State.get(getAddr(), !CreateScatter);
   if (CreateScatter) {
     NewSI = Builder.CreateIntrinsic(Type::getVoidTy(EVL->getContext()),
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index bfef277070db7..2fa575683f88d 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -79,12 +79,11 @@ bool VPlanTransforms::tryToConvertVPInstructionsToVPRecipes(
         if (LoadInst *Load = dyn_cast<LoadInst>(Inst)) {
           NewRecipe = new VPWidenLoadRecipe(
               *Load, Ingredient.getOperand(0), nullptr /*Mask*/,
-              false /*Consecutive*/, false /*Reverse*/, *VPI,
-              Ingredient.getDebugLoc());
+              false /*Consecutive*/, *VPI, Ingredient.getDebugLoc());
         } else if (StoreInst *Store = dyn_cast<StoreInst>(Inst)) {
           NewRecipe = new VPWidenStoreRecipe(
               *Store, Ingredient.getOperand(1), Ingredient.getOperand(0),
-              nullptr /*Mask*/, false /*Consecutive*/, false /*Reverse*/, *VPI,
+              nullptr /*Mask*/, false /*Consecutive*/, *VPI,
               Ingredient.getDebugLoc());
         } else if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(Inst)) {
           NewRecipe = new VPWidenGEPRecipe(GEP, Ingredient.operands(), *VPI,
@@ -1608,8 +1607,6 @@ static void narrowToSingleScalarRecipes(VPlan &Plan) {
       auto *WidenStoreR = dyn_cast<VPWidenStoreRecipe>(&R);
       if (WidenStoreR && vputils::isSingleScalar(WidenStoreR->getAddr()) &&
           !WidenStoreR->isConsecutive()) {
-        assert(!WidenStoreR->isReverse() &&
-               "Not consecutive memory recipes shouldn't be reversed");
         VPValue *Mask = WidenStoreR->getMask();
 
         // Only convert the scatter to a scalar store if it is unmasked.
@@ -2918,18 +2915,28 @@ static VPRecipeBase *optimizeMaskToEVL(VPValue *HeaderMask,
     return EVLEndPtr;
   };
 
+  auto GetVPReverse = [&CurRecipe, &EVL, &TypeInfo, Plan,
+                       DL](VPValue *V) -> VPWidenIntrinsicRecipe * {
+    auto *Reverse = new VPWidenIntrinsicRecipe(
+        Intrinsic::experimental_vp_reverse, {V, Plan->getTrue(), &EVL},
+        TypeInfo.inferScalarType(V), {}, {}, DL);
+    Reverse->insertBefore(&CurRecipe);
+    return Reverse;
+  };
+
   if (match(&CurRecipe,
-            m_MaskedLoad(m_VPValue(Addr), m_RemoveMask(HeaderMask, Mask))) &&
-      !cast<VPWidenLoadRecipe>(CurRecipe).isReverse())
+            m_MaskedLoad(m_VPValue(Addr), m_RemoveMask(HeaderMask, Mask))))
     return new VPWidenLoadEVLRecipe(cast<VPWidenLoadRecipe>(CurRecipe), Addr,
                                     EVL, Mask);
 
   VPValue *ReversedVal;
   if (match(&CurRecipe, m_Reverse(m_VPValue(ReversedVal))) &&
       match(ReversedVal,
-            m_MaskedLoad(m_VPValue(EndPtr), m_RemoveMask(HeaderMask, Mask))) &&
-      match(EndPtr, m_VecEndPtr(m_VPValue(Addr), m_Specific(&Plan->getVF()))) &&
-      cast<VPWidenLoadRecipe>(ReversedVal)->isReverse()) {
+            m_MaskedLoad(m_VPValue(EndPtr),
+                         m_Reverse(m_RemoveMask(HeaderMask, Mask)))) &&
+      match(EndPtr, m_VecEndPtr(m_VPValue(Addr), m_Specific(&Plan->getVF())))) {
+    if (Mask)
+      Mask = GetVPReverse(Mask);
     auto *LoadR = new VPWidenLoadEVLRecipe(
         *cast<VPWidenLoadRecipe>(ReversedVal), AdjustEndPtr(EndPtr), EVL, Mask);
     LoadR->insertBefore(&CurRecipe);
@@ -2940,24 +2947,19 @@ static VPRecipeBase *optimizeMaskToEVL(VPValue *HeaderMask,
 
   VPValue *StoredVal;
   if (match(&CurRecipe, m_MaskedStore(m_VPValue(Addr), m_VPValue(StoredVal),
-                                      m_RemoveMask(HeaderMask, Mask))) &&
-      !cast<VPWidenStoreRecipe>(CurRecipe).isReverse())
+                                      m_RemoveMask(HeaderMask, Mask))))
     return new VPWidenStoreEVLRecipe(cast<VPWidenStoreRecipe>(CurRecipe), Addr,
                                      StoredVal, EVL, Mask);
 
   if (match(&CurRecipe,
             m_MaskedStore(m_VPValue(EndPtr), m_Reverse(m_VPValue(ReversedVal)),
-                          m_RemoveMask(HeaderMask, Mask))) &&
-      match(EndPtr, m_VecEndPtr(m_VPValue(Addr), m_Specific(&Plan->getVF()))) &&
-      cast<VPWidenStoreRecipe>(CurRecipe).isReverse()) {
-    auto *NewReverse = new VPWidenIntrinsicRecipe(
-        Intrinsic::experimental_vp_reverse,
-        {ReversedVal, Plan->getTrue(), &EVL},
-        TypeInfo.inferScalarType(ReversedVal), {}, {}, DL);
-    NewReverse->insertBefore(&CurRecipe);
+                          m_Reverse(m_RemoveMask(HeaderMask, Mask)))) &&
+      match(EndPtr, m_VecEndPtr(m_VPValue(Addr), m_Specific(&Plan->getVF())))) {
+    if (Mask)
+      Mask = GetVPReverse(Mask);
     return new VPWidenStoreEVLRecipe(cast<VPWidenStoreRecipe>(CurRecipe),
-                                     AdjustEndPtr(EndPtr), NewReverse, EVL,
-                                     Mask);
+                                     AdjustEndPtr(EndPtr),
+                                     GetVPReverse(ReversedVal), EVL, Mask);
   }
 
   if (auto *Rdx = dyn_cast<VPReductionRecipe>(&CurRecipe))
@@ -4953,9 +4955,9 @@ narrowInterleaveGroupOp(VPValue *V, SmallPtrSetImpl<VPValue *> &NarrowedOps) {
     // Narrow interleave group to wi...
[truncated]

Mel-Chen · 2026-01-15T07:02:39Z

We already include the cost of some VPInstructions after computing cost from the legacy cost model IIRC, we could do the same for reverse to get the more accurate estimates

I can understand @lukel97's approach, so I’ll go with this method for now to rebase and move the patch forward.
However, I’m not entirely clear on @fhahn's suggestion. Could you please elaborate a bit more on your thoughts?

github-actions · 2026-01-15T07:04:24Z

🐧 Linux x64 Test Results

192817 tests passed
4967 tests skipped

✅ The build succeeded and all tests passed.

github-actions · 2026-01-15T07:04:24Z

🪟 Windows x64 Test Results

132804 tests passed
3036 tests skipped

✅ The build succeeded and all tests passed.

Mel-Chen · 2026-03-26T09:08:40Z

LGTM w/ florian's comment. Also it looks like the Windows CI tests are failing but I'm not sure why. I've restarted the CI job hoping it's just a dodgy cache, but if it fails again we might need to look into it

cf38142
Fixed the unspecified evaluation order. Hopefully, this resolves the Windows CI failures.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

fhahn · 2026-03-26T22:07:21Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

    // TODO: Using the original IR may not be accurate.
    // Currently, ARM will use the underlying IR to calculate gather/scatter
    // instruction cost.
-    assert(!Reverse &&


Could we retain this, by checking that the operand / single user is not reverse?

Sure.
f3f4df1

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

artagnon

Looks good, I'll defer to Florian and Luke's existing approvals.

fhahn

LGTM, thanks!

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

…fications

lukel97 · 2026-03-31T09:58:27Z

I think one of the aarch64 buildbots is failing on llvm-test-suite with this: https://lab.llvm.org/buildbot/#/builders/198/builds/12563

Edit, also on the RISC-V buildbots: https://lab.llvm.org/buildbot/#/builders/210/builds/9728

clang: /home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/llvm/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp:3840: virtual InstructionCost llvm::VPWidenMemoryRecipe::computeCost(ElementCount, VPCostContext &) const: Assertion `!IsReverse() && "Inconsecutive memory access should not have reverse order"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: /home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/stage1.install/bin/clang -DNDEBUG -mcpu=neoverse-v2 -mllvm -scalable-vectorization=preferred -O3 -std=gnu17 -fcommon -Wno-error=incompatible-pointer-types -MD -MT MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/CMakeFiles/timberwolfmc.dir/finalpin.c.o -MF CMakeFiles/timberwolfmc.dir/finalpin.c.o.d -o CMakeFiles/timberwolfmc.dir/finalpin.c.o -c /home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/test/test-suite/MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/finalpin.c

This reverts commit f76f41f.

fhahn · 2026-03-31T13:04:51Z

Yes that looks related. I'll revert for now to get things back to green

Reverts #155579 Assertion added triggers on some buildbots clang: /home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/llvm/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp:3840: virtual InstructionCost llvm::VPWidenMemoryRecipe::computeCost(ElementCount, VPCostContext &) const: Assertion `!IsReverse() && "Inconsecutive memory access should not have reverse order"' failed. PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script. Stack dump: 0. Program arguments: /home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/stage1.install/bin/clang -DNDEBUG -mcpu=neoverse-v2 -mllvm -scalable-vectorization=preferred -O3 -std=gnu17 -fcommon -Wno-error=incompatible-pointer-types -MD -MT MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/CMakeFiles/timberwolfmc.dir/finalpin.c.o -MF CMakeFiles/timberwolfmc.dir/finalpin.c.o.d -o CMakeFiles/timberwolfmc.dir/finalpin.c.o -c /home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/test/test-suite/MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/finalpin.c

…" (#189637) Reverts llvm/llvm-project#155579 Assertion added triggers on some buildbots clang: /home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/llvm/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp:3840: virtual InstructionCost llvm::VPWidenMemoryRecipe::computeCost(ElementCount, VPCostContext &) const: Assertion `!IsReverse() && "Inconsecutive memory access should not have reverse order"' failed. PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script. Stack dump: 0. Program arguments: /home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/stage1.install/bin/clang -DNDEBUG -mcpu=neoverse-v2 -mllvm -scalable-vectorization=preferred -O3 -std=gnu17 -fcommon -Wno-error=incompatible-pointer-types -MD -MT MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/CMakeFiles/timberwolfmc.dir/finalpin.c.o -MF CMakeFiles/timberwolfmc.dir/finalpin.c.o.d -o CMakeFiles/timberwolfmc.dir/finalpin.c.o -c /home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/test/test-suite/MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/finalpin.c

Mel-Chen · 2026-04-01T10:14:39Z

#189928
#189930
Try to reapply this, please help to take a look, thanks.

Mel-Chen requested review from alexey-bataev, arcbbb, artagnon, david-arm, fhahn and lukel97 August 27, 2025 09:13

Mel-Chen mentioned this pull request Aug 27, 2025

[VPlan] Extract reverse operation for reverse accesses #146525

Merged

Mel-Chen force-pushed the reverse-mask branch from 93c97bb to f541c0b Compare August 27, 2025 09:27

Mel-Chen commented Aug 27, 2025

View reviewed changes

llvm/test/Transforms/LoopVectorize/X86/masked_load_store.ll Outdated

Copy link
Copy Markdown

Contributor Author

Mel-Chen Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#146525 (comment)

Mel-Chen commented Aug 27, 2025

View reviewed changes

artagnon reviewed Aug 27, 2025

View reviewed changes

llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse-mask4.ll Outdated Show resolved Hide resolved

Mel-Chen force-pushed the reverse-mask branch 2 times, most recently from 6fd0244 to 53be850 Compare August 27, 2025 10:42

lukel97 mentioned this pull request Sep 1, 2025

[VPlan] Perform optimizeMaskToEVL in terms of pattern matching #155394

Merged

fhahn reviewed Jan 12, 2026

View reviewed changes

Mel-Chen force-pushed the reverse-mask branch from 53be850 to 48fa6f5 Compare January 15, 2026 06:42

llvmbot added backend:RISC-V vectorizers llvm:transforms labels Jan 15, 2026

Mel-Chen force-pushed the reverse-mask branch 2 times, most recently from e4bd5b8 to cf38142 Compare March 26, 2026 09:03

lukel97 reviewed Mar 26, 2026

View reviewed changes

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp Outdated Show resolved Hide resolved

artagnon reviewed Mar 26, 2026

View reviewed changes

fhahn reviewed Mar 26, 2026

View reviewed changes

Mel-Chen force-pushed the reverse-mask branch from cf38142 to f3f4df1 Compare March 27, 2026 17:24

artagnon reviewed Mar 27, 2026

View reviewed changes

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp Outdated Show resolved Hide resolved

artagnon reviewed Mar 27, 2026

View reviewed changes

fhahn approved these changes Mar 29, 2026

View reviewed changes

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp Outdated Show resolved Hide resolved

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp Outdated Show resolved Hide resolved

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp Outdated Show resolved Hide resolved

Mel-Chen added 9 commits March 31, 2026 00:17

Also extract reverse mask

5bda9df

Remove experimental_vp_reverse rule from planContainsAdditionalSimpli…

6c55b7e

…fications

remove blank

46261dc

Fix Unspecified Evaluation Order

12ee934

remove redundant using namespace VPlanPatternMatch;

d4aa8d0

use this for inferScalarType

5c1e5dc

inline all-true mask check into GetVPReverse

2af537d

Restore the reverse assertion

7179194

update IsReverse

c4b9848

Mel-Chen force-pushed the reverse-mask branch from f3f4df1 to c4b9848 Compare March 31, 2026 08:18

Mel-Chen enabled auto-merge (squash) March 31, 2026 08:21

Mel-Chen merged commit f76f41f into llvm:main Mar 31, 2026
9 of 10 checks passed

fhahn added a commit that referenced this pull request Mar 31, 2026

Revert "[VPlan] Extract reverse mask from reverse accesses (#155579)"

636ae5c

This reverts commit f76f41f.

fhahn mentioned this pull request Mar 31, 2026

Revert "[VPlan] Extract reverse mask from reverse accesses" #189637

Merged

Mel-Chen mentioned this pull request Apr 1, 2026

Reapply "[VPlan] Extract reverse mask from reverse accesses" #189930

Open

Conversation

Mel-Chen commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Mel-Chen Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

Mel-Chen Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

Mel-Chen Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Mel-Chen Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

Mel-Chen Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lukel97 commented Jan 8, 2026

Uh oh!

Mel-Chen commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukel97 commented Jan 12, 2026

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

lukel97 commented Jan 12, 2026

Uh oh!

llvmbot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jan 15, 2026

Uh oh!

Mel-Chen commented Jan 15, 2026

Uh oh!

github-actions bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🐧 Linux x64 Test Results

Uh oh!

github-actions bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🪟 Windows x64 Test Results

Uh oh!

Mel-Chen commented Mar 26, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fhahn Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Mel-Chen Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

artagnon left a comment

Choose a reason for hiding this comment

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lukel97 commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Mel-Chen commented Aug 27, 2025 •

edited

Loading

github-actions bot commented Aug 27, 2025 •

edited

Loading

Mel-Chen Aug 27, 2025 •

edited

Loading

Mel-Chen commented Jan 12, 2026 •

edited

Loading

llvmbot commented Jan 15, 2026 •

edited

Loading

github-actions bot commented Jan 15, 2026 •

edited

Loading

github-actions bot commented Jan 15, 2026 •

edited

Loading

lukel97 commented Mar 31, 2026 •

edited

Loading