Skip to content

[LoongArch] Enable tail calls for sret and byval functions#168506

Merged
heiher merged 6 commits intomainfrom
users/hev/issue-168152
Jan 13, 2026
Merged

[LoongArch] Enable tail calls for sret and byval functions#168506
heiher merged 6 commits intomainfrom
users/hev/issue-168152

Conversation

@heiher
Copy link
Copy Markdown
Member

@heiher heiher commented Nov 18, 2025

Allow tail calls for functions returning via sret when the caller's sret pointer can be reused. Also support tail calls for byval arguments.

The previous restriction requiring exact match of caller and callee arguments is relaxed: tail calls are allowed as long as the callee does not use more stack space than the caller.

Fixes #168152

@llvmbot
Copy link
Copy Markdown
Member

llvmbot commented Nov 18, 2025

@llvm/pr-subscribers-backend-loongarch

Author: hev (heiher)

Changes

Allow tail-calling functions that return via sret when the caller has an incoming sret pointer that can be forwarded.

Remove the overly strict requirement that tail-call argument values must exactly match the caller's incoming arguments. The real constraint is only that the callee uses no more argument stack space than the caller.

This fixes musttail codegen and enables significantly more tail-call optimizations.

Fixes #168152


Patch is 24.66 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/168506.diff

5 Files Affected:

  • (modified) llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp (+65-10)
  • (modified) llvm/lib/Target/LoongArch/LoongArchISelLowering.h (+6)
  • (modified) llvm/lib/Target/LoongArch/LoongArchMachineFunctionInfo.h (+7)
  • (added) llvm/test/CodeGen/LoongArch/musttail.ll (+397)
  • (modified) llvm/test/CodeGen/LoongArch/tail-calls.ll (+4-9)
diff --git a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
index cf4ffc82f6009..2a55558e00e78 100644
--- a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
@@ -8069,6 +8069,7 @@ SDValue LoongArchTargetLowering::LowerFormalArguments(
     SelectionDAG &DAG, SmallVectorImpl<SDValue> &InVals) const {
 
   MachineFunction &MF = DAG.getMachineFunction();
+  auto *LoongArchFI = MF.getInfo<LoongArchMachineFunctionInfo>();
 
   switch (CallConv) {
   default:
@@ -8140,7 +8141,6 @@ SDValue LoongArchTargetLowering::LowerFormalArguments(
     const TargetRegisterClass *RC = &LoongArch::GPRRegClass;
     MachineFrameInfo &MFI = MF.getFrameInfo();
     MachineRegisterInfo &RegInfo = MF.getRegInfo();
-    auto *LoongArchFI = MF.getInfo<LoongArchMachineFunctionInfo>();
 
     // Offset of the first variable argument from stack pointer, and size of
     // the vararg save area. For now, the varargs save area is either zero or
@@ -8190,6 +8190,8 @@ SDValue LoongArchTargetLowering::LowerFormalArguments(
     LoongArchFI->setVarArgsSaveSize(VarArgsSaveSize);
   }
 
+  LoongArchFI->setArgumentStackSize(CCInfo.getStackSize());
+
   // All stores are grouped in one node to allow the matching between
   // the size of Ins and InVals. This only happens for vararg functions.
   if (!OutChains.empty()) {
@@ -8246,9 +8248,11 @@ bool LoongArchTargetLowering::isEligibleForTailCallOptimization(
   auto &Outs = CLI.Outs;
   auto &Caller = MF.getFunction();
   auto CallerCC = Caller.getCallingConv();
+  auto *LoongArchFI = MF.getInfo<LoongArchMachineFunctionInfo>();
 
-  // Do not tail call opt if the stack is used to pass parameters.
-  if (CCInfo.getStackSize() != 0)
+  // If the stack arguments for this call do not fit into our own save area then
+  // the call cannot be made tail.
+  if (CCInfo.getStackSize() > LoongArchFI->getArgumentStackSize())
     return false;
 
   // Do not tail call opt if any parameters need to be passed indirectly.
@@ -8260,7 +8264,7 @@ bool LoongArchTargetLowering::isEligibleForTailCallOptimization(
   // semantics.
   auto IsCallerStructRet = Caller.hasStructRetAttr();
   auto IsCalleeStructRet = Outs.empty() ? false : Outs[0].Flags.isSRet();
-  if (IsCallerStructRet || IsCalleeStructRet)
+  if (IsCallerStructRet != IsCalleeStructRet)
     return false;
 
   // Do not tail call opt if either the callee or caller has a byval argument.
@@ -8276,9 +8280,47 @@ bool LoongArchTargetLowering::isEligibleForTailCallOptimization(
     if (!TRI->regmaskSubsetEqual(CallerPreserved, CalleePreserved))
       return false;
   }
+
+  // If the callee takes no arguments then go on to check the results of the
+  // call.
+  const MachineRegisterInfo &MRI = MF.getRegInfo();
+  const SmallVectorImpl<SDValue> &OutVals = CLI.OutVals;
+  if (!parametersInCSRMatch(MRI, CallerPreserved, ArgLocs, OutVals))
+    return false;
+
   return true;
 }
 
+SDValue LoongArchTargetLowering::addTokenForArgument(SDValue Chain,
+                                                     SelectionDAG &DAG,
+                                                     MachineFrameInfo &MFI,
+                                                     int ClobberedFI) const {
+  SmallVector<SDValue, 8> ArgChains;
+  int64_t FirstByte = MFI.getObjectOffset(ClobberedFI);
+  int64_t LastByte = FirstByte + MFI.getObjectSize(ClobberedFI) - 1;
+
+  // Include the original chain at the beginning of the list. When this is
+  // used by target LowerCall hooks, this helps legalize find the
+  // CALLSEQ_BEGIN node.
+  ArgChains.push_back(Chain);
+
+  // Add a chain value for each stack argument corresponding
+  for (SDNode *U : DAG.getEntryNode().getNode()->users())
+    if (LoadSDNode *L = dyn_cast<LoadSDNode>(U))
+      if (FrameIndexSDNode *FI = dyn_cast<FrameIndexSDNode>(L->getBasePtr()))
+        if (FI->getIndex() < 0) {
+          int64_t InFirstByte = MFI.getObjectOffset(FI->getIndex());
+          int64_t InLastByte = InFirstByte;
+          InLastByte += MFI.getObjectSize(FI->getIndex()) - 1;
+
+          if ((InFirstByte <= FirstByte && FirstByte <= InLastByte) ||
+              (FirstByte <= InFirstByte && InFirstByte <= LastByte))
+            ArgChains.push_back(SDValue(L, 1));
+        }
+
+  // Build a tokenfactor for all the chains.
+  return DAG.getNode(ISD::TokenFactor, SDLoc(Chain), MVT::Other, ArgChains);
+}
 static Align getPrefTypeAlign(EVT VT, SelectionDAG &DAG) {
   return DAG.getDataLayout().getPrefTypeAlign(
       VT.getTypeForEVT(*DAG.getContext()));
@@ -8454,19 +8496,32 @@ LoongArchTargetLowering::LowerCall(CallLoweringInfo &CLI,
       RegsToPass.push_back(std::make_pair(VA.getLocReg(), ArgValue));
     } else {
       assert(VA.isMemLoc() && "Argument not register or memory");
-      assert(!IsTailCall && "Tail call not allowed if stack is used "
-                            "for passing parameters");
+      SDValue DstAddr;
+      MachinePointerInfo DstInfo;
+      int32_t Offset = VA.getLocMemOffset();
 
       // Work out the address of the stack slot.
       if (!StackPtr.getNode())
         StackPtr = DAG.getCopyFromReg(Chain, DL, LoongArch::R3, PtrVT);
-      SDValue Address =
-          DAG.getNode(ISD::ADD, DL, PtrVT, StackPtr,
-                      DAG.getIntPtrConstant(VA.getLocMemOffset(), DL));
+
+      if (IsTailCall) {
+        unsigned OpSize = (VA.getValVT().getSizeInBits() + 7) / 8;
+        int FI = MF.getFrameInfo().CreateFixedObject(OpSize, Offset, true);
+        DstAddr = DAG.getFrameIndex(FI, PtrVT);
+        DstInfo = MachinePointerInfo::getFixedStack(MF, FI);
+        // Make sure any stack arguments overlapping with where we're storing
+        // are loaded before this eventual operation. Otherwise they'll be
+        // clobbered.
+        Chain = addTokenForArgument(Chain, DAG, MF.getFrameInfo(), FI);
+      } else {
+        SDValue PtrOff = DAG.getIntPtrConstant(Offset, DL);
+        DstAddr = DAG.getNode(ISD::ADD, DL, PtrVT, StackPtr, PtrOff);
+        DstInfo = MachinePointerInfo::getStack(MF, Offset);
+      }
 
       // Emit the store.
       MemOpChains.push_back(
-          DAG.getStore(Chain, DL, ArgValue, Address, MachinePointerInfo()));
+          DAG.getStore(Chain, DL, ArgValue, DstAddr, DstInfo));
     }
   }
 
diff --git a/llvm/lib/Target/LoongArch/LoongArchISelLowering.h b/llvm/lib/Target/LoongArch/LoongArchISelLowering.h
index 8a4d7748467c7..e95f70f06cc7b 100644
--- a/llvm/lib/Target/LoongArch/LoongArchISelLowering.h
+++ b/llvm/lib/Target/LoongArch/LoongArchISelLowering.h
@@ -438,6 +438,12 @@ class LoongArchTargetLowering : public TargetLowering {
       CCState &CCInfo, CallLoweringInfo &CLI, MachineFunction &MF,
       const SmallVectorImpl<CCValAssign> &ArgLocs) const;
 
+  /// Finds the incoming stack arguments which overlap the given fixed stack
+  /// object and incorporates their load into the current chain. This prevents
+  /// an upcoming store from clobbering the stack argument before it's used.
+  SDValue addTokenForArgument(SDValue Chain, SelectionDAG &DAG,
+                              MachineFrameInfo &MFI, int ClobberedFI) const;
+
   bool softPromoteHalfType() const override { return true; }
 
   bool
diff --git a/llvm/lib/Target/LoongArch/LoongArchMachineFunctionInfo.h b/llvm/lib/Target/LoongArch/LoongArchMachineFunctionInfo.h
index 904985c189dba..cf0837cbf09c7 100644
--- a/llvm/lib/Target/LoongArch/LoongArchMachineFunctionInfo.h
+++ b/llvm/lib/Target/LoongArch/LoongArchMachineFunctionInfo.h
@@ -32,6 +32,10 @@ class LoongArchMachineFunctionInfo : public MachineFunctionInfo {
   /// Size of stack frame to save callee saved registers
   unsigned CalleeSavedStackSize = 0;
 
+  /// ArgumentStackSize - amount of bytes on stack consumed by the arguments
+  /// being passed on the stack
+  unsigned ArgumentStackSize = 0;
+
   /// FrameIndex of the spill slot when there is no scavenged register in
   /// insertIndirectBranch.
   int BranchRelaxationSpillFrameIndex = -1;
@@ -63,6 +67,9 @@ class LoongArchMachineFunctionInfo : public MachineFunctionInfo {
   unsigned getCalleeSavedStackSize() const { return CalleeSavedStackSize; }
   void setCalleeSavedStackSize(unsigned Size) { CalleeSavedStackSize = Size; }
 
+  unsigned getArgumentStackSize() const { return ArgumentStackSize; }
+  void setArgumentStackSize(unsigned size) { ArgumentStackSize = size; }
+
   int getBranchRelaxationSpillFrameIndex() {
     return BranchRelaxationSpillFrameIndex;
   }
diff --git a/llvm/test/CodeGen/LoongArch/musttail.ll b/llvm/test/CodeGen/LoongArch/musttail.ll
new file mode 100644
index 0000000000000..cf436e0505ad4
--- /dev/null
+++ b/llvm/test/CodeGen/LoongArch/musttail.ll
@@ -0,0 +1,397 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=loongarch32 %s -o - | FileCheck %s --check-prefix=LA32
+; RUN: llc -mtriple=loongarch64 %s -o - | FileCheck %s --check-prefix=LA64
+
+declare i32 @many_args_callee(i32 %0, i32 %1, i32 %2, i32 %3, i32 %4, i32 %5, i32 %6, i32 %7, i32 %8, i32 %9)
+
+define i32 @many_args_tail(i32 %0, i32 %1, i32 %2, i32 %3, i32 %4, i32 %5, i32 %6, i32 %7, i32 %8, i32 %9) {
+; LA32-LABEL: many_args_tail:
+; LA32:       # %bb.0:
+; LA32-NEXT:    ori $a0, $zero, 9
+; LA32-NEXT:    st.w $a0, $sp, 4
+; LA32-NEXT:    ori $a0, $zero, 8
+; LA32-NEXT:    ori $a1, $zero, 1
+; LA32-NEXT:    ori $a2, $zero, 2
+; LA32-NEXT:    ori $a3, $zero, 3
+; LA32-NEXT:    ori $a4, $zero, 4
+; LA32-NEXT:    ori $a5, $zero, 5
+; LA32-NEXT:    ori $a6, $zero, 6
+; LA32-NEXT:    ori $a7, $zero, 7
+; LA32-NEXT:    st.w $a0, $sp, 0
+; LA32-NEXT:    move $a0, $zero
+; LA32-NEXT:    b many_args_callee
+;
+; LA64-LABEL: many_args_tail:
+; LA64:       # %bb.0:
+; LA64-NEXT:    ori $a0, $zero, 9
+; LA64-NEXT:    st.d $a0, $sp, 8
+; LA64-NEXT:    ori $a0, $zero, 8
+; LA64-NEXT:    ori $a1, $zero, 1
+; LA64-NEXT:    ori $a2, $zero, 2
+; LA64-NEXT:    ori $a3, $zero, 3
+; LA64-NEXT:    ori $a4, $zero, 4
+; LA64-NEXT:    ori $a5, $zero, 5
+; LA64-NEXT:    ori $a6, $zero, 6
+; LA64-NEXT:    ori $a7, $zero, 7
+; LA64-NEXT:    st.d $a0, $sp, 0
+; LA64-NEXT:    move $a0, $zero
+; LA64-NEXT:    pcaddu18i $t8, %call36(many_args_callee)
+; LA64-NEXT:    jr $t8
+  %ret = tail call i32 @many_args_callee(i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9)
+  ret i32 %ret
+}
+
+define i32 @many_args_musttail(i32 %0, i32 %1, i32 %2, i32 %3, i32 %4, i32 %5, i32 %6, i32 %7, i32 %8, i32 %9) {
+; LA32-LABEL: many_args_musttail:
+; LA32:       # %bb.0:
+; LA32-NEXT:    ori $a0, $zero, 9
+; LA32-NEXT:    st.w $a0, $sp, 4
+; LA32-NEXT:    ori $a0, $zero, 8
+; LA32-NEXT:    ori $a1, $zero, 1
+; LA32-NEXT:    ori $a2, $zero, 2
+; LA32-NEXT:    ori $a3, $zero, 3
+; LA32-NEXT:    ori $a4, $zero, 4
+; LA32-NEXT:    ori $a5, $zero, 5
+; LA32-NEXT:    ori $a6, $zero, 6
+; LA32-NEXT:    ori $a7, $zero, 7
+; LA32-NEXT:    st.w $a0, $sp, 0
+; LA32-NEXT:    move $a0, $zero
+; LA32-NEXT:    b many_args_callee
+;
+; LA64-LABEL: many_args_musttail:
+; LA64:       # %bb.0:
+; LA64-NEXT:    ori $a0, $zero, 9
+; LA64-NEXT:    st.d $a0, $sp, 8
+; LA64-NEXT:    ori $a0, $zero, 8
+; LA64-NEXT:    ori $a1, $zero, 1
+; LA64-NEXT:    ori $a2, $zero, 2
+; LA64-NEXT:    ori $a3, $zero, 3
+; LA64-NEXT:    ori $a4, $zero, 4
+; LA64-NEXT:    ori $a5, $zero, 5
+; LA64-NEXT:    ori $a6, $zero, 6
+; LA64-NEXT:    ori $a7, $zero, 7
+; LA64-NEXT:    st.d $a0, $sp, 0
+; LA64-NEXT:    move $a0, $zero
+; LA64-NEXT:    pcaddu18i $t8, %call36(many_args_callee)
+; LA64-NEXT:    jr $t8
+  %ret = musttail call i32 @many_args_callee(i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9)
+  ret i32 %ret
+}
+
+; This function has more arguments than it's tail-callee. This isn't valid for
+; the musttail attribute, but can still be tail-called as a non-guaranteed
+; optimisation, because the outgoing arguments to @many_args_callee fit in the
+; stack space allocated by the caller of @more_args_tail.
+define i32 @more_args_tail(i32 %0, i32 %1, i32 %2, i32 %3, i32 %4, i32 %5, i32 %6, i32 %7, i32 %8, i32 %9) {
+; LA32-LABEL: more_args_tail:
+; LA32:       # %bb.0:
+; LA32-NEXT:    ori $a0, $zero, 9
+; LA32-NEXT:    st.w $a0, $sp, 4
+; LA32-NEXT:    ori $a0, $zero, 8
+; LA32-NEXT:    ori $a1, $zero, 1
+; LA32-NEXT:    ori $a2, $zero, 2
+; LA32-NEXT:    ori $a3, $zero, 3
+; LA32-NEXT:    ori $a4, $zero, 4
+; LA32-NEXT:    ori $a5, $zero, 5
+; LA32-NEXT:    ori $a6, $zero, 6
+; LA32-NEXT:    ori $a7, $zero, 7
+; LA32-NEXT:    st.w $a0, $sp, 0
+; LA32-NEXT:    move $a0, $zero
+; LA32-NEXT:    b many_args_callee
+;
+; LA64-LABEL: more_args_tail:
+; LA64:       # %bb.0:
+; LA64-NEXT:    ori $a0, $zero, 9
+; LA64-NEXT:    st.d $a0, $sp, 8
+; LA64-NEXT:    ori $a0, $zero, 8
+; LA64-NEXT:    ori $a1, $zero, 1
+; LA64-NEXT:    ori $a2, $zero, 2
+; LA64-NEXT:    ori $a3, $zero, 3
+; LA64-NEXT:    ori $a4, $zero, 4
+; LA64-NEXT:    ori $a5, $zero, 5
+; LA64-NEXT:    ori $a6, $zero, 6
+; LA64-NEXT:    ori $a7, $zero, 7
+; LA64-NEXT:    st.d $a0, $sp, 0
+; LA64-NEXT:    move $a0, $zero
+; LA64-NEXT:    pcaddu18i $t8, %call36(many_args_callee)
+; LA64-NEXT:    jr $t8
+  %ret = tail call i32 @many_args_callee(i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9)
+  ret i32 %ret
+}
+
+; Again, this isn't valid for musttail, but can be tail-called in practice
+; because the stack size if the same.
+define i32 @different_args_tail_32bit(i64 %0, i64 %1, i64 %2, i64 %3, i64 %4) {
+; LA32-LABEL: different_args_tail_32bit:
+; LA32:       # %bb.0:
+; LA32-NEXT:    ori $a0, $zero, 9
+; LA32-NEXT:    st.w $a0, $sp, 4
+; LA32-NEXT:    ori $a0, $zero, 8
+; LA32-NEXT:    ori $a1, $zero, 1
+; LA32-NEXT:    ori $a2, $zero, 2
+; LA32-NEXT:    ori $a3, $zero, 3
+; LA32-NEXT:    ori $a4, $zero, 4
+; LA32-NEXT:    ori $a5, $zero, 5
+; LA32-NEXT:    ori $a6, $zero, 6
+; LA32-NEXT:    ori $a7, $zero, 7
+; LA32-NEXT:    st.w $a0, $sp, 0
+; LA32-NEXT:    move $a0, $zero
+; LA32-NEXT:    b many_args_callee
+;
+; LA64-LABEL: different_args_tail_32bit:
+; LA64:       # %bb.0:
+; LA64-NEXT:    addi.d $sp, $sp, -32
+; LA64-NEXT:    .cfi_def_cfa_offset 32
+; LA64-NEXT:    st.d $ra, $sp, 24 # 8-byte Folded Spill
+; LA64-NEXT:    .cfi_offset 1, -8
+; LA64-NEXT:    ori $a0, $zero, 9
+; LA64-NEXT:    st.d $a0, $sp, 8
+; LA64-NEXT:    ori $a0, $zero, 8
+; LA64-NEXT:    ori $a1, $zero, 1
+; LA64-NEXT:    ori $a2, $zero, 2
+; LA64-NEXT:    ori $a3, $zero, 3
+; LA64-NEXT:    ori $a4, $zero, 4
+; LA64-NEXT:    ori $a5, $zero, 5
+; LA64-NEXT:    ori $a6, $zero, 6
+; LA64-NEXT:    ori $a7, $zero, 7
+; LA64-NEXT:    st.d $a0, $sp, 0
+; LA64-NEXT:    move $a0, $zero
+; LA64-NEXT:    pcaddu18i $ra, %call36(many_args_callee)
+; LA64-NEXT:    jirl $ra, $ra, 0
+; LA64-NEXT:    ld.d $ra, $sp, 24 # 8-byte Folded Reload
+; LA64-NEXT:    addi.d $sp, $sp, 32
+; LA64-NEXT:    ret
+  %ret = tail call i32 @many_args_callee(i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9)
+  ret i32 %ret
+}
+
+define i32 @different_args_tail_64bit(i128 %0, i128 %1, i128 %2, i128 %3, i128 %4) {
+; LA32-LABEL: different_args_tail_64bit:
+; LA32:       # %bb.0:
+; LA32-NEXT:    addi.w $sp, $sp, -16
+; LA32-NEXT:    .cfi_def_cfa_offset 16
+; LA32-NEXT:    st.w $ra, $sp, 12 # 4-byte Folded Spill
+; LA32-NEXT:    .cfi_offset 1, -4
+; LA32-NEXT:    ori $a0, $zero, 9
+; LA32-NEXT:    st.w $a0, $sp, 4
+; LA32-NEXT:    ori $a0, $zero, 8
+; LA32-NEXT:    ori $a1, $zero, 1
+; LA32-NEXT:    ori $a2, $zero, 2
+; LA32-NEXT:    ori $a3, $zero, 3
+; LA32-NEXT:    ori $a4, $zero, 4
+; LA32-NEXT:    ori $a5, $zero, 5
+; LA32-NEXT:    ori $a6, $zero, 6
+; LA32-NEXT:    ori $a7, $zero, 7
+; LA32-NEXT:    st.w $a0, $sp, 0
+; LA32-NEXT:    move $a0, $zero
+; LA32-NEXT:    bl many_args_callee
+; LA32-NEXT:    ld.w $ra, $sp, 12 # 4-byte Folded Reload
+; LA32-NEXT:    addi.w $sp, $sp, 16
+; LA32-NEXT:    ret
+;
+; LA64-LABEL: different_args_tail_64bit:
+; LA64:       # %bb.0:
+; LA64-NEXT:    ori $a0, $zero, 9
+; LA64-NEXT:    st.d $a0, $sp, 8
+; LA64-NEXT:    ori $a0, $zero, 8
+; LA64-NEXT:    ori $a1, $zero, 1
+; LA64-NEXT:    ori $a2, $zero, 2
+; LA64-NEXT:    ori $a3, $zero, 3
+; LA64-NEXT:    ori $a4, $zero, 4
+; LA64-NEXT:    ori $a5, $zero, 5
+; LA64-NEXT:    ori $a6, $zero, 6
+; LA64-NEXT:    ori $a7, $zero, 7
+; LA64-NEXT:    st.d $a0, $sp, 0
+; LA64-NEXT:    move $a0, $zero
+; LA64-NEXT:    pcaddu18i $t8, %call36(many_args_callee)
+; LA64-NEXT:    jr $t8
+  %ret = tail call i32 @many_args_callee(i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9)
+  ret i32 %ret
+}
+
+; Here, the caller requires less stack space for it's arguments than the
+; callee, so it would not ba valid to do a tail-call.
+define i32 @fewer_args_tail(i32 %0, i32 %1, i32 %2, i32 %3, i32 %4) {
+; LA32-LABEL: fewer_args_tail:
+; LA32:       # %bb.0:
+; LA32-NEXT:    addi.w $sp, $sp, -16
+; LA32-NEXT:    .cfi_def_cfa_offset 16
+; LA32-NEXT:    st.w $ra, $sp, 12 # 4-byte Folded Spill
+; LA32-NEXT:    .cfi_offset 1, -4
+; LA32-NEXT:    ori $a0, $zero, 9
+; LA32-NEXT:    st.w $a0, $sp, 4
+; LA32-NEXT:    ori $a0, $zero, 8
+; LA32-NEXT:    ori $a1, $zero, 1
+; LA32-NEXT:    ori $a2, $zero, 2
+; LA32-NEXT:    ori $a3, $zero, 3
+; LA32-NEXT:    ori $a4, $zero, 4
+; LA32-NEXT:    ori $a5, $zero, 5
+; LA32-NEXT:    ori $a6, $zero, 6
+; LA32-NEXT:    ori $a7, $zero, 7
+; LA32-NEXT:    st.w $a0, $sp, 0
+; LA32-NEXT:    move $a0, $zero
+; LA32-NEXT:    bl many_args_callee
+; LA32-NEXT:    ld.w $ra, $sp, 12 # 4-byte Folded Reload
+; LA32-NEXT:    addi.w $sp, $sp, 16
+; LA32-NEXT:    ret
+;
+; LA64-LABEL: fewer_args_tail:
+; LA64:       # %bb.0:
+; LA64-NEXT:    addi.d $sp, $sp, -32
+; LA64-NEXT:    .cfi_def_cfa_offset 32
+; LA64-NEXT:    st.d $ra, $sp, 24 # 8-byte Folded Spill
+; LA64-NEXT:    .cfi_offset 1, -8
+; LA64-NEXT:    ori $a0, $zero, 9
+; LA64-NEXT:    st.d $a0, $sp, 8
+; LA64-NEXT:    ori $a0, $zero, 8
+; LA64-NEXT:    ori $a1, $zero, 1
+; LA64-NEXT:    ori $a2, $zero, 2
+; LA64-NEXT:    ori $a3, $zero, 3
+; LA64-NEXT:    ori $a4, $zero, 4
+; LA64-NEXT:    ori $a5, $zero, 5
+; LA64-NEXT:    ori $a6, $zero, 6
+; LA64-NEXT:    ori $a7, $zero, 7
+; LA64-NEXT:    st.d $a0, $sp, 0
+; LA64-NEXT:    move $a0, $zero
+; LA64-NEXT:    pcaddu18i $ra, %call36(many_args_callee)
+; LA64-NEXT:    jirl $ra, $ra, 0
+; LA64-NEXT:    ld.d $ra, $sp, 24 # 8-byte Folded Reload
+; LA64-NEXT:    addi.d $sp, $sp, 32
+; LA64-NEXT:    ret
+  %ret = tail call i32 @many_args_callee(i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9)
+  ret i32 %ret
+}
+
+declare void @foo(i32, i32, i32, i32, i32, i32, i32, i32, i32)
+
+define void @bar(i32 %0, i32 %1, i32 %2, i32 %3, i32 %4, i32 %5, i32 %6, i32 %7, i32 %8) nounwind {
+; LA32-LABEL: bar:
+; LA32:       # %bb.0: # %entry
+; LA32-NEXT:    addi.w $sp, $sp, -48
+; LA32-NEXT:    st.w $ra, $sp, 44 # 4-byte Folded Spill
+; LA32-NEXT:    st.w $fp, $sp, 40 # 4-byte Folded Spill
+; LA32-NEXT:    st.w $s0, $sp, 36 # 4-byte Folded Spill
+; LA32-NEXT:    st.w $s1, $sp, 32 # 4-byte Folded Spill
+; LA32-NEXT:    st.w $s2, $sp, 28 # 4-byte Folded Spill
+; LA32-NEXT:    st.w $s3, $sp, 24 # 4-byte Folded Spill
+; LA32-NEXT:    st.w $s4, $sp, 20 # 4-byte Folded Spill
+; LA32-NEXT:    st.w $s5, $sp, 16 # 4-byte Folded Spill
+; LA32-NEXT:    st.w $s6, $sp, 12 # 4-byte Folded Spill
+; LA32-NEXT:    move $fp, $a7
+; LA32-NEXT:    move $s0, $a6
+; LA32-NEXT:    move $s1, $a5
+; LA32-NEXT:    move $s2, $a4
+; LA32-NEXT:    move $s3, $a3
+; LA32-NEXT:    move $s4, $a2
+; LA32-NEXT:    move $s5, $a1
+; LA32-NEXT:    move $s6, $a0
+; LA32-NEXT:    ori $a0, $zero, 1
+; LA32-NEXT:    st.w $a0, $sp, 0
+; LA32-NEXT:    move $a0, $s6
+; LA32-NEXT:    bl foo
+; LA32-NEXT:    ori $a0, $zero, 2
+; LA32-NEXT:    st.w $a0, $sp, 48
+; LA32-NEXT:    move $a0, $s6
+; LA32-NEXT:    move $a1, $s5
+; LA32-NEXT:    move $a2, $s4
+; LA32-NEXT:    move $a3, $s3
+; LA32-NEXT:    move $a4, $s2
+; LA32-NEXT:    move $a5, $s1
+; LA32-NEXT:    move $a6, $s0
+; LA32-NEXT:    move $a7, $fp
+; LA32-NEXT:    ld.w $s6, $sp, 12 # 4-byte Folded Reload
+; LA32-NEXT:    ld.w $s5, $sp, 16 # 4-byte Folded Reload
+; LA32-NEXT:    ld.w $s4, $sp, 20 # 4-byte Folded Reload
+; LA32-NEXT:    ld.w $s3, $sp, 24 # 4-byte Folded Reload
+; LA32-NEXT:    ld.w $s2, $sp, 28 # 4-byte Folded Reload
+; LA32-NEXT:    ld.w $s1, $sp, 32 # 4-byte Folded Reload
+; LA32-NEXT:    ...
[truncated]

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Nov 18, 2025

🐧 Linux x64 Test Results

  • 188341 tests passed
  • 5000 tests skipped

✅ The build succeeded and all tests passed.

@folkertdev
Copy link
Copy Markdown
Contributor

folkertdev commented Nov 20, 2025

Thanks so much for this! It looks like the riscv code is very similar to loongarch, so this approach should also work there, covering all targets that rustc would reasonably care about for an MVP.

It might be worthwhile to also test this case https://godbolt.org/z/5evjsn1zs, which x86_64 miscompiles causing a segmentation fault. (aarch64 seems to have fixed this in LLVM 20, it also miscompiled before).

@heiher
Copy link
Copy Markdown
Member Author

heiher commented Nov 21, 2025

It looks like the riscv code is very similar to loongarch, so this approach should also work there

I agree with you. It should also work there.

this case https://godbolt.org/z/5evjsn1zs

It looks like LoongArch is already generating correct code for this case. I also noticed that the byval argument test cases are still failing on LoongArch (and AArch64). Does the Rust become strictly require support for byval arguments?

define dso_local i32 @callee_byval(ptr byval(i32) %0) nounwind {
  ret i32 0
}

define dso_local i32 @caller_byval(ptr byval(i32) %0) nounwind {
  %r = musttail call i32 @callee_byval(ptr byval(i32) %0)
  ret i32 %r
}

@folkertdev
Copy link
Copy Markdown
Contributor

I also noticed that the byval argument test cases are still failing on LoongArch (and AArch64).

I'm not sure what you mean here.

Rust currently disallows a become on a function call that uses any PassMode::Indirect arguments. I want to relax that restriction to accept (what LLVM calls) byval arguments.

However it turns out that many LLVM backends also do not support byval arguments correctly: the arm backends do support it now (I suspect since #109943), and I'll try to add support for x86_64 in #168956 by basically emulating the aarch64 approach. With loongarch (this PR), riscv (where we can basically copy this PR) and maybe powerpc/s390x (who have been quite responsive in fixing issues we run into) that should cover all of the major architectures I think.


What rust requires is a subset of musttail: we currently only allow sibcalls (so, ABI between caller and callee is a perfect match), but we want to support an arbitrary number of arguments, and want any rust type (that is FFI-safe/has a stable layout) to work as an argument.

@folkertdev
Copy link
Copy Markdown
Contributor

I believe I understand your question better now. The answer is yes, rust would need e.g. the following to work correctly:

%struct.5xi32 = type { [5 x i32] }

declare dso_local i32 @FuncFlip(ptr byval(%struct.5xi32) %0, ptr byval(%struct.5xi32) %1)

define dso_local i32 @testFlip(ptr byval(%struct.5xi32) %0, ptr byval(%struct.5xi32) %1) {
  %r = musttail call i32 @FuncFlip(ptr byval(%struct.5xi32) %1, ptr byval(%struct.5xi32) %0)
  ret i32 %r
}

Allow tail calls for functions returning via sret when the caller's sret
pointer can be reused. Also support tail calls for byval arguments.

The previous restriction requiring exact match of caller and callee
arguments is relaxed: tail calls are allowed as long as the callee
does not use more stack space than the caller.
@heiher heiher force-pushed the users/hev/issue-168152 branch from 5fefddd to 7fa089a Compare December 3, 2025 11:30
@heiher heiher changed the title [LoongArch] Enable tail calls for sret functions and relax argument matching [LoongArch] Enable tail calls for sret and byval functions Dec 3, 2025
@heiher
Copy link
Copy Markdown
Member Author

heiher commented Dec 3, 2025

@folkertdev Thanks for pointing that out. We now support tail calls for functions with byval arguments. Let me know if I missed anything.

@folkertdev
Copy link
Copy Markdown
Contributor

I'm not fluent in the loongarch assembly, but your tests seem to cover everything. Thanks!

Comment thread llvm/lib/Target/LoongArch/LoongArchMachineFunctionInfo.h Outdated
@heiher heiher merged commit 2b839f6 into main Jan 13, 2026
10 checks passed
@heiher heiher deleted the users/hev/issue-168152 branch January 13, 2026 06:02
@llvm-ci
Copy link
Copy Markdown

llvm-ci commented Jan 13, 2026

LLVM Buildbot has detected a new failure on builder sanitizer-aarch64-linux-bootstrap-msan running on sanitizer-buildbot10 while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/94/builds/14335

Here is the relevant piece of the build log for the reference
Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
[1654/5716] Building CXX object tools/clang/lib/AST/CMakeFiles/obj.clangAST.dir/StmtObjC.cpp.o
[1655/5716] Building CXX object tools/clang/lib/AST/CMakeFiles/obj.clangAST.dir/ScanfFormatString.cpp.o
[1656/5716] Building CXX object tools/clang/lib/AST/CMakeFiles/obj.clangAST.dir/OpenACCClause.cpp.o
[1657/5716] Building CXX object tools/clang/lib/AST/CMakeFiles/obj.clangAST.dir/DeclPrinter.cpp.o
[1658/5716] Building CXX object tools/clang/lib/Format/CMakeFiles/obj.clangFormat.dir/NumericLiteralInfo.cpp.o
[1659/5716] Building CXX object tools/clang/lib/Tooling/Inclusions/CMakeFiles/obj.clangToolingInclusions.dir/HeaderAnalysis.cpp.o
[1660/5716] Building CXX object tools/clang/lib/AST/CMakeFiles/obj.clangAST.dir/Randstruct.cpp.o
[1661/5716] Building CXX object tools/clang/lib/AST/CMakeFiles/obj.clangAST.dir/OSLog.cpp.o
[1662/5716] Building CXX object tools/clang/lib/Format/CMakeFiles/obj.clangFormat.dir/AffectedRangeManager.cpp.o
[1663/5716] Building CXX object tools/clang/lib/AST/CMakeFiles/obj.clangAST.dir/StmtOpenACC.cpp.o
FAILED: tools/clang/lib/AST/CMakeFiles/obj.clangAST.dir/StmtOpenACC.cpp.o 
/home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm_build_msan/bin/clang++ -DCLANG_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GLIBCXX_USE_CXX11_ABI=1 -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm_build2_msan/tools/clang/lib/AST -I/home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/clang/lib/AST -I/home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/clang/include -I/home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm_build2_msan/tools/clang/include -I/home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm_build2_msan/include -I/home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wno-pass-failed -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -Wno-nested-anon-types -O3 -DNDEBUG -std=c++17 -UNDEBUG -fno-exceptions -funwind-tables -fno-rtti -MD -MT tools/clang/lib/AST/CMakeFiles/obj.clangAST.dir/StmtOpenACC.cpp.o -MF tools/clang/lib/AST/CMakeFiles/obj.clangAST.dir/StmtOpenACC.cpp.o.d -o tools/clang/lib/AST/CMakeFiles/obj.clangAST.dir/StmtOpenACC.cpp.o -c /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/clang/lib/AST/StmtOpenACC.cpp
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm_build_msan/bin/clang++ -DCLANG_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GLIBCXX_USE_CXX11_ABI=1 -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm_build2_msan/tools/clang/lib/AST -I/home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/clang/lib/AST -I/home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/clang/include -I/home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm_build2_msan/tools/clang/include -I/home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm_build2_msan/include -I/home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wno-pass-failed -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -Wno-nested-anon-types -O3 -DNDEBUG -std=c++17 -UNDEBUG -fno-exceptions -funwind-tables -fno-rtti -MD -MT tools/clang/lib/AST/CMakeFiles/obj.clangAST.dir/StmtOpenACC.cpp.o -MF tools/clang/lib/AST/CMakeFiles/obj.clangAST.dir/StmtOpenACC.cpp.o.d -o tools/clang/lib/AST/CMakeFiles/obj.clangAST.dir/StmtOpenACC.cpp.o -c /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/clang/lib/AST/StmtOpenACC.cpp
1.	/home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/clang/lib/AST/StmtOpenACC.cpp:369:73: current parser token '{'
2.	/usr/lib/gcc/aarch64-linux-gnu/14/../../../../aarch64-linux-gnu/include/c++/14/optional:703:11: instantiating class definition 'std::optional<clang::OpenACCAtomicConstruct::SingleStmtInfo>'
3.	/usr/lib/gcc/aarch64-linux-gnu/14/../../../../aarch64-linux-gnu/include/c++/14/type_traits:3405:25: instantiating variable definition 'std::is_move_constructible_v<clang::OpenACCAtomicConstruct::SingleStmtInfo>'
 #0 0x0000afccc5fca234 ___interceptor_backtrace /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/compiler-rt/lib/msan/../sanitizer_common/sanitizer_common_interceptors.inc:4556:13
 #1 0x0000afccccd99a64 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:846:7
 #2 0x0000afccccd94050 llvm::sys::RunSignalHandlers() /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/llvm/lib/Support/Signals.cpp:109:18
 #3 0x0000afccccc1358c HandleCrash /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/llvm/lib/Support/CrashRecoveryContext.cpp:73:5
 #4 0x0000afccccc1358c CrashRecoverySignalHandler(int) /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/llvm/lib/Support/CrashRecoveryContext.cpp:390:51
 #5 0x0000afccc5ffc140 ~ScopedThreadLocalStateBackup /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/compiler-rt/lib/msan/msan.h:353:37
 #6 0x0000afccc5ffc140 SignalHandler(int) /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:1154:1
 #7 0x0000f000084f99c0 (linux-vdso.so.1+0x9c0)
 #8 0x0000f00007fb7294 (/lib/aarch64-linux-gnu/libc.so.6+0xa7294)
 #9 0x0000afccc5f9f9ec MsanAllocate(__sanitizer::BufferedStackTrace*, unsigned long, unsigned long, bool) /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/compiler-rt/lib/msan/msan_allocator.cpp:256:9
#10 0x0000afccc5fa01f8 SetErrnoOnNull /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/compiler-rt/lib/msan/../sanitizer_common/sanitizer_allocator_checks.h:31:7
#11 0x0000afccc5fa01f8 __msan::msan_memalign(unsigned long, unsigned long, __sanitizer::BufferedStackTrace*) /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/compiler-rt/lib/msan/msan_allocator.cpp:432:10
#12 0x0000afccc600a148 operator new(unsigned long, std::align_val_t, std::nothrow_t const&) /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/compiler-rt/lib/msan/msan_new_delete.cpp:70:3
#13 0x0000afccccc788fc llvm::allocate_buffer(unsigned long, unsigned long) /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/llvm/lib/Support/MemAlloc.cpp:21:14
#14 0x0000afccc60e83e8 capacity /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/llvm/include/llvm/ADT/SmallVector.h:81:36
#15 0x0000afccc60e83e8 reserveForParamAndGetAddressImpl<llvm::SmallVectorTemplateBase<void *, true> > /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/llvm/include/llvm/ADT/SmallVector.h:238:38
#16 0x0000afccc60e83e8 reserveForParamAndGetAddress /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/llvm/include/llvm/ADT/SmallVector.h:540:9
#17 0x0000afccc60e83e8 push_back /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/llvm/include/llvm/ADT/SmallVector.h:565:23
#18 0x0000afccc60e83e8 StartNewSlab /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/llvm/include/llvm/Support/Allocator.h:353:11
#19 0x0000afccc60e83e8 llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096ul, 4096ul, 128ul>::AllocateSlow(unsigned long, unsigned long, llvm::Align) /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/llvm/include/llvm/Support/Allocator.h:203:5
#20 0x0000afccd4637c30 Allocate /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/llvm/include/llvm/Support/Allocator.h:179:12
#21 0x0000afccd4637c30 Allocate /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/llvm/include/llvm/Support/Allocator.h:217:12
#22 0x0000afccd4637c30 Allocate /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/clang/include/clang/AST/ASTContext.h:865:22
#23 0x0000afccd4637c30 clang::TypeTraitExpr::Create(clang::ASTContext const&, clang::QualType, clang::SourceLocation, clang::TypeTrait, llvm::ArrayRef<clang::TypeSourceInfo*>, clang::SourceLocation, bool) /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/clang/lib/AST/ExprCXX.cpp:1921:9
#24 0x0000afccd378844c operator= /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/clang/include/clang/Sema/Ownership.h:213:5
#25 0x0000afccd378844c ActionResult /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/clang/include/clang/Sema/Ownership.h:193:33
#26 0x0000afccd378844c clang::Sema::BuildTypeTrait(clang::TypeTrait, clang::SourceLocation, llvm::ArrayRef<clang::TypeSourceInfo*>, clang::SourceLocation) /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/clang/lib/Sema/SemaTypeTraits.cpp:1474:12
#27 0x0000afccd35a9ffc clang::TreeTransform<(anonymous namespace)::TemplateInstantiator>::TransformTypeTraitExpr(clang::TypeTraitExpr*) /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/clang/lib/Sema/TreeTransform.h:15159:3
#28 0x0000afccd350a71c clang::TreeTransform<(anonymous namespace)::TemplateInstantiator>::TransformExpr(clang::Expr*) /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm_build_msan/tools/clang/include/clang/AST/StmtNodes.inc:864:1
#29 0x0000afccd350c3c8 clang::TreeTransform<(anonymous namespace)::TemplateInstantiator>::TransformInitializer(clang::Expr*, bool) /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/clang/lib/Sema/TreeTransform.h:0:7
#30 0x0000afccd3503070 ~TemplateInstantiator /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/clang/lib/Sema/SemaTemplateInstantiate.cpp:1273:43
Step 13 (build stage3/msan build) failure: build stage3/msan build (failure)
...
[1654/5716] Building CXX object tools/clang/lib/AST/CMakeFiles/obj.clangAST.dir/StmtObjC.cpp.o
[1655/5716] Building CXX object tools/clang/lib/AST/CMakeFiles/obj.clangAST.dir/ScanfFormatString.cpp.o
[1656/5716] Building CXX object tools/clang/lib/AST/CMakeFiles/obj.clangAST.dir/OpenACCClause.cpp.o
[1657/5716] Building CXX object tools/clang/lib/AST/CMakeFiles/obj.clangAST.dir/DeclPrinter.cpp.o
[1658/5716] Building CXX object tools/clang/lib/Format/CMakeFiles/obj.clangFormat.dir/NumericLiteralInfo.cpp.o
[1659/5716] Building CXX object tools/clang/lib/Tooling/Inclusions/CMakeFiles/obj.clangToolingInclusions.dir/HeaderAnalysis.cpp.o
[1660/5716] Building CXX object tools/clang/lib/AST/CMakeFiles/obj.clangAST.dir/Randstruct.cpp.o
[1661/5716] Building CXX object tools/clang/lib/AST/CMakeFiles/obj.clangAST.dir/OSLog.cpp.o
[1662/5716] Building CXX object tools/clang/lib/Format/CMakeFiles/obj.clangFormat.dir/AffectedRangeManager.cpp.o
[1663/5716] Building CXX object tools/clang/lib/AST/CMakeFiles/obj.clangAST.dir/StmtOpenACC.cpp.o
FAILED: tools/clang/lib/AST/CMakeFiles/obj.clangAST.dir/StmtOpenACC.cpp.o 
/home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm_build_msan/bin/clang++ -DCLANG_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GLIBCXX_USE_CXX11_ABI=1 -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm_build2_msan/tools/clang/lib/AST -I/home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/clang/lib/AST -I/home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/clang/include -I/home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm_build2_msan/tools/clang/include -I/home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm_build2_msan/include -I/home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wno-pass-failed -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -Wno-nested-anon-types -O3 -DNDEBUG -std=c++17 -UNDEBUG -fno-exceptions -funwind-tables -fno-rtti -MD -MT tools/clang/lib/AST/CMakeFiles/obj.clangAST.dir/StmtOpenACC.cpp.o -MF tools/clang/lib/AST/CMakeFiles/obj.clangAST.dir/StmtOpenACC.cpp.o.d -o tools/clang/lib/AST/CMakeFiles/obj.clangAST.dir/StmtOpenACC.cpp.o -c /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/clang/lib/AST/StmtOpenACC.cpp
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm_build_msan/bin/clang++ -DCLANG_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GLIBCXX_USE_CXX11_ABI=1 -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm_build2_msan/tools/clang/lib/AST -I/home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/clang/lib/AST -I/home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/clang/include -I/home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm_build2_msan/tools/clang/include -I/home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm_build2_msan/include -I/home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wno-pass-failed -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -Wno-nested-anon-types -O3 -DNDEBUG -std=c++17 -UNDEBUG -fno-exceptions -funwind-tables -fno-rtti -MD -MT tools/clang/lib/AST/CMakeFiles/obj.clangAST.dir/StmtOpenACC.cpp.o -MF tools/clang/lib/AST/CMakeFiles/obj.clangAST.dir/StmtOpenACC.cpp.o.d -o tools/clang/lib/AST/CMakeFiles/obj.clangAST.dir/StmtOpenACC.cpp.o -c /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/clang/lib/AST/StmtOpenACC.cpp
1.	/home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/clang/lib/AST/StmtOpenACC.cpp:369:73: current parser token '{'
2.	/usr/lib/gcc/aarch64-linux-gnu/14/../../../../aarch64-linux-gnu/include/c++/14/optional:703:11: instantiating class definition 'std::optional<clang::OpenACCAtomicConstruct::SingleStmtInfo>'
3.	/usr/lib/gcc/aarch64-linux-gnu/14/../../../../aarch64-linux-gnu/include/c++/14/type_traits:3405:25: instantiating variable definition 'std::is_move_constructible_v<clang::OpenACCAtomicConstruct::SingleStmtInfo>'
 #0 0x0000afccc5fca234 ___interceptor_backtrace /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/compiler-rt/lib/msan/../sanitizer_common/sanitizer_common_interceptors.inc:4556:13
 #1 0x0000afccccd99a64 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:846:7
 #2 0x0000afccccd94050 llvm::sys::RunSignalHandlers() /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/llvm/lib/Support/Signals.cpp:109:18
 #3 0x0000afccccc1358c HandleCrash /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/llvm/lib/Support/CrashRecoveryContext.cpp:73:5
 #4 0x0000afccccc1358c CrashRecoverySignalHandler(int) /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/llvm/lib/Support/CrashRecoveryContext.cpp:390:51
 #5 0x0000afccc5ffc140 ~ScopedThreadLocalStateBackup /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/compiler-rt/lib/msan/msan.h:353:37
 #6 0x0000afccc5ffc140 SignalHandler(int) /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:1154:1
 #7 0x0000f000084f99c0 (linux-vdso.so.1+0x9c0)
 #8 0x0000f00007fb7294 (/lib/aarch64-linux-gnu/libc.so.6+0xa7294)
 #9 0x0000afccc5f9f9ec MsanAllocate(__sanitizer::BufferedStackTrace*, unsigned long, unsigned long, bool) /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/compiler-rt/lib/msan/msan_allocator.cpp:256:9
#10 0x0000afccc5fa01f8 SetErrnoOnNull /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/compiler-rt/lib/msan/../sanitizer_common/sanitizer_allocator_checks.h:31:7
#11 0x0000afccc5fa01f8 __msan::msan_memalign(unsigned long, unsigned long, __sanitizer::BufferedStackTrace*) /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/compiler-rt/lib/msan/msan_allocator.cpp:432:10
#12 0x0000afccc600a148 operator new(unsigned long, std::align_val_t, std::nothrow_t const&) /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/compiler-rt/lib/msan/msan_new_delete.cpp:70:3
#13 0x0000afccccc788fc llvm::allocate_buffer(unsigned long, unsigned long) /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/llvm/lib/Support/MemAlloc.cpp:21:14
#14 0x0000afccc60e83e8 capacity /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/llvm/include/llvm/ADT/SmallVector.h:81:36
#15 0x0000afccc60e83e8 reserveForParamAndGetAddressImpl<llvm::SmallVectorTemplateBase<void *, true> > /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/llvm/include/llvm/ADT/SmallVector.h:238:38
#16 0x0000afccc60e83e8 reserveForParamAndGetAddress /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/llvm/include/llvm/ADT/SmallVector.h:540:9
#17 0x0000afccc60e83e8 push_back /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/llvm/include/llvm/ADT/SmallVector.h:565:23
#18 0x0000afccc60e83e8 StartNewSlab /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/llvm/include/llvm/Support/Allocator.h:353:11
#19 0x0000afccc60e83e8 llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096ul, 4096ul, 128ul>::AllocateSlow(unsigned long, unsigned long, llvm::Align) /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/llvm/include/llvm/Support/Allocator.h:203:5
#20 0x0000afccd4637c30 Allocate /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/llvm/include/llvm/Support/Allocator.h:179:12
#21 0x0000afccd4637c30 Allocate /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/llvm/include/llvm/Support/Allocator.h:217:12
#22 0x0000afccd4637c30 Allocate /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/clang/include/clang/AST/ASTContext.h:865:22
#23 0x0000afccd4637c30 clang::TypeTraitExpr::Create(clang::ASTContext const&, clang::QualType, clang::SourceLocation, clang::TypeTrait, llvm::ArrayRef<clang::TypeSourceInfo*>, clang::SourceLocation, bool) /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/clang/lib/AST/ExprCXX.cpp:1921:9
#24 0x0000afccd378844c operator= /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/clang/include/clang/Sema/Ownership.h:213:5
#25 0x0000afccd378844c ActionResult /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/clang/include/clang/Sema/Ownership.h:193:33
#26 0x0000afccd378844c clang::Sema::BuildTypeTrait(clang::TypeTrait, clang::SourceLocation, llvm::ArrayRef<clang::TypeSourceInfo*>, clang::SourceLocation) /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/clang/lib/Sema/SemaTypeTraits.cpp:1474:12
#27 0x0000afccd35a9ffc clang::TreeTransform<(anonymous namespace)::TemplateInstantiator>::TransformTypeTraitExpr(clang::TypeTraitExpr*) /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/clang/lib/Sema/TreeTransform.h:15159:3
#28 0x0000afccd350a71c clang::TreeTransform<(anonymous namespace)::TemplateInstantiator>::TransformExpr(clang::Expr*) /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm_build_msan/tools/clang/include/clang/AST/StmtNodes.inc:864:1
#29 0x0000afccd350c3c8 clang::TreeTransform<(anonymous namespace)::TemplateInstantiator>::TransformInitializer(clang::Expr*, bool) /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/clang/lib/Sema/TreeTransform.h:0:7
#30 0x0000afccd3503070 ~TemplateInstantiator /home/b/sanitizer-aarch64-linux-bootstrap-msan/build/llvm-project/clang/lib/Sema/SemaTemplateInstantiate.cpp:1273:43

Priyanshu3820 pushed a commit to Priyanshu3820/llvm-project that referenced this pull request Jan 18, 2026
Allow tail calls for functions returning via sret when the caller's sret
pointer can be reused. Also support tail calls for byval arguments.
    
The previous restriction requiring exact match of caller and callee
arguments is relaxed: tail calls are allowed as long as the callee does
not use more stack space than the caller.

Fixes llvm#168152
folkertdev added a commit that referenced this pull request Feb 11, 2026
Basically #168506 but for
riscv, so to be clear the hard work here is @heiher 's. I figured we may
as well get some extra eyeballs on this from riscv too.

Previously the riscv backend could not handle `musttail` calls with more
arguments than fit in registers, or any explicit `byval` or `sret`
parameters/return values. Those have now been implemented.

This is part of my push to get more LLVM backends to support `byval` and
`sret` parameters so that rust can stabilize guaranteed tail call
support. See also:

- #168956
- rust-lang/rust#148748

---------

Co-authored-by: WANG Rui <[email protected]>
llvm-sync Bot pushed a commit to arm/arm-toolchain that referenced this pull request Feb 11, 2026
Basically llvm/llvm-project#168506 but for
riscv, so to be clear the hard work here is @heiher 's. I figured we may
as well get some extra eyeballs on this from riscv too.

Previously the riscv backend could not handle `musttail` calls with more
arguments than fit in registers, or any explicit `byval` or `sret`
parameters/return values. Those have now been implemented.

This is part of my push to get more LLVM backends to support `byval` and
`sret` parameters so that rust can stabilize guaranteed tail call
support. See also:

- llvm/llvm-project#168956
- rust-lang/rust#148748

---------

Co-authored-by: WANG Rui <[email protected]>
kevinwkt pushed a commit to kevinwkt/llvm-project that referenced this pull request Feb 16, 2026
Basically llvm#168506 but for
riscv, so to be clear the hard work here is @heiher 's. I figured we may
as well get some extra eyeballs on this from riscv too.

Previously the riscv backend could not handle `musttail` calls with more
arguments than fit in registers, or any explicit `byval` or `sret`
parameters/return values. Those have now been implemented.

This is part of my push to get more LLVM backends to support `byval` and
`sret` parameters so that rust can stabilize guaranteed tail call
support. See also:

- llvm#168956
- rust-lang/rust#148748

---------

Co-authored-by: WANG Rui <[email protected]>
Copy link
Copy Markdown
Member

@lenary lenary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will be reverting this, the comment inline is the problem that caused me to want to revert on RISC-V, but the reality is there is a follow-on commit (ab17b54) that touches both RISC-V and LoongArch to fix a different bug, and I think it's much cleaner to revert all three patches entirely and start again.

We're in the middle of the release cycle, so I think there is ample time to fix this and re-land it before the branch for LLVM 23.

}
InVals.push_back(ArgValue);
if (Ins[InsIdx].Flags.isByVal())
LoongArchFI->addIncomingByValArgs(ArgValue);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has a significant lifetime issue, that we feel is enough to do a revert on RISC-V. The SDValue has a lifetime which matches the current block, not the whole function. the LoongArchFI has function lifetime.

In short, if you have a musttail call in any block except the entry block, this code is not going to work.

lenary added a commit to lenary/llvm-project that referenced this pull request Apr 10, 2026
This reverts:
- 2b839f6 (llvm#168506)
- 6a81656 (llvm#170547)
- ab17b54 (llvm#188006)
- e65dd1f (llvm#191093)

The changes in llvm#168506 and llvm#170547 both have a lifetime issue where an
SDValue is kept for the duration of a function, despite being valid only
when processing the same basic block.

Reverting both on LoongArch and RISC-V as the implementations are
identical and one of the fix commits touches both targets, rather than
doing only a RISC-V revert. I also think this more cleanly shows what is
being undone when starting again with the changes.
lenary added a commit that referenced this pull request Apr 10, 2026
This reverts:
- 2b839f6 (#168506)
- 6a81656 (#170547)
- ab17b54 (#188006)
- e65dd1f (#191093)

The changes in #168506 and #170547 both have a lifetime issue where an
SDValue is kept for the duration of a function, despite being valid only
when processing the same basic block.

Reverting both on LoongArch and RISC-V as the implementations are
identical and one of the fix commits touches both targets, rather than
doing only a RISC-V revert. I also think this more cleanly shows what is
being undone when starting again with the changes.
lenary added a commit to lenary/llvm-project that referenced this pull request Apr 10, 2026
This reverts:
- 2b839f6 (llvm#168506)
- d40e607 (llvm#188006)

There's a lifetime issue in the implementation, where an SDValue is
saved and may be used outside the current basic block.

The corresponding revert on `main` is
501417b (llvm#191508) - in this case only
the LoongArch changes made it to the 22.x branches, so this commit only
affects that architecture.
c-rhodes pushed a commit to lenary/llvm-project that referenced this pull request Apr 17, 2026
This reverts:
- 2b839f6 (llvm#168506)
- d40e607 (llvm#188006)

There's a lifetime issue in the implementation, where an SDValue is
saved and may be used outside the current basic block.

The corresponding revert on `main` is
501417b (llvm#191508) - in this case only
the LoongArch changes made it to the 22.x branches, so this commit only
affects that architecture.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

loongarch64: failed to perform tail call elimination on a call site marked musttail

6 participants