Skip to content

[RISCV] Fix musttail with indirect arguments by forwarding incoming pointers#185094

Open
xroche wants to merge 16 commits intollvm:mainfrom
xroche:riscv-musttail-indirect-fix
Open

[RISCV] Fix musttail with indirect arguments by forwarding incoming pointers#185094
xroche wants to merge 16 commits intollvm:mainfrom
xroche:riscv-musttail-indirect-fix

Conversation

@xroche
Copy link
Copy Markdown
Contributor

@xroche xroche commented Mar 6, 2026

When a musttail call passes arguments indirectly (fp128 on RV32, i128 on RV32), the backend allocates a stack temporary and hands the callee a pointer. The tail call deallocates the caller's frame, and the pointer dangles.

Fix by forwarding the incoming indirect pointers instead. They point to the caller's caller's frame, which stays valid after the tail call. Forwarded formal parameters reuse the pointer directly; computed values get stored into the incoming buffer first.

The pointers are saved in virtual registers (CopyToReg/CopyFromReg) rather than SDValues. The SelectionDAG is cleared between basic blocks and musttail calls can appear in non-entry blocks, so storing raw SDValues across BBs is unsound (this was the bug that led to the revert in 501417b). The vreg save only fires when the function has musttail calls; other functions see no codegen change.

Non-musttail tail calls with indirect args are still rejected.

Fixes #185089.

AI Disclosure

This patch was developed with assistance from Claude (Anthropic). All code has been reviewed and validated by the author.

…ointers

When a musttail call has arguments passed indirectly
(CCValAssign::Indirect), the current code creates a new stack
temporary and copies the data there. The tail call then deallocates
the caller's stack frame, leaving the pointer dangling.

Fix this by forwarding the original incoming indirect pointer
instead of re-spilling. Since musttail guarantees matching
prototypes, incoming and outgoing indirect args have a 1:1
correspondence, and the incoming pointer (from the caller's caller's
frame) remains valid after the tail call.

This also subsumes the non-musttail indirect args fix: non-musttail
calls with indirect args are rejected from tail call optimization
(the pointer would dangle), while musttail calls forward the
incoming pointer.

Fixes llvm#185089.

Assisted-by: Claude (Anthropic)
Co-Authored-By: Claude Opus 4.6 <[email protected]>
@llvmbot
Copy link
Copy Markdown
Member

llvmbot commented Mar 6, 2026

@llvm/pr-subscribers-backend-risc-v

Author: Xavier Roche (xroche)

Changes

When a musttail call has arguments passed indirectly (CCValAssign::Indirect), the current code creates a new stack temporary and copies the data there. The tail call then deallocates the caller's stack frame, leaving the pointer dangling (use-after-free).

Fix this by forwarding the original incoming indirect pointer instead of re-spilling. Since musttail guarantees matching prototypes, incoming and outgoing indirect args have a 1:1 correspondence, and the incoming pointer (from the caller's caller's frame) remains valid after the tail call.

This patch also includes the non-musttail fix from PR #184972 (reject tail calls with indirect args), since both changes are in the same function and the musttail exception depends on the indirect args check being present. PR #184972 can be closed in favor of this one.

Changes

  1. RISCVMachineFunctionInfo.h: Add IncomingIndirectArgs storage to save incoming indirect pointers during LowerFormalArguments.

  2. RISCVISelLowering.cpp (3 sites):

    • LowerFormalArguments: Save the raw pointer (before loading) for indirect args.
    • isEligibleForTailCallOptimization: Reject indirect args for non-musttail; allow for musttail.
    • LowerCall: For musttail + indirect, forward the incoming pointer and skip split-arg parts.
  3. Tests:

    • musttail-indirect-args.ll: New test with 3 functions (musttail fp128, non-musttail fp128, musttail i128).
    • tail-calls.ll: CHECK lines updated for the indirect args rejection (same as PR #184972).

Fixes #185089.

AI Disclosure

This patch was developed with assistance from Claude (Anthropic). All code has been reviewed and validated by the author.


Full diff: https://github.com/llvm/llvm-project/pull/185094.diff

4 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+76-42)
  • (modified) llvm/lib/Target/RISCV/RISCVMachineFunctionInfo.h (+13)
  • (added) llvm/test/CodeGen/RISCV/musttail-indirect-args.ll (+66)
  • (modified) llvm/test/CodeGen/RISCV/tail-calls.ll (+12-6)
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 17d7db95886ab..68cff3332b873 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -24454,6 +24454,8 @@ SDValue RISCVTargetLowering::LowerFormalArguments(
       ArgValue = unpackFromMemLoc(DAG, Chain, VA, DL, *this);
 
     if (VA.getLocInfo() == CCValAssign::Indirect) {
+      // Save the incoming indirect pointer for musttail forwarding.
+      RVFI->addIncomingIndirectArg(ArgValue);
       // If the original argument was split and passed by reference (e.g. i128
       // on RV32), we need to load all parts of it here (using the same
       // address). Vectors may be partly split to registers and partly to the
@@ -24578,6 +24580,19 @@ bool RISCVTargetLowering::isEligibleForTailCallOptimization(
   if (CCInfo.getStackSize() > RVFI->getArgumentStackSize())
     return false;
 
+  // Do not tail call optimize if any argument needs to be passed indirectly.
+  // The caller allocates stack space and passes a pointer to the callee. On a
+  // tail call the caller's stack frame is deallocated before the callee
+  // executes, invalidating the pointer (use-after-free).
+  // musttail is excluded: callers forward incoming indirect pointers that
+  // point to the caller's caller's frame, which remains valid.
+  if (!CLI.CB || !CLI.CB->isMustTailCall()) {
+    for (const auto &VA : ArgLocs) {
+      if (VA.getLocInfo() == CCValAssign::Indirect)
+        return false;
+    }
+  }
+
   // Do not tail call opt if either caller or callee uses struct return
   // semantics.
   auto IsCallerStructRet = Caller.hasStructRetAttr();
@@ -24765,51 +24780,70 @@ SDValue RISCVTargetLowering::LowerCall(CallLoweringInfo &CLI,
     // Promote the value if needed.
     // For now, only handle fully promoted and indirect arguments.
     if (VA.getLocInfo() == CCValAssign::Indirect) {
-      // Store the argument in a stack slot and pass its address.
-      Align StackAlign =
-          std::max(getPrefTypeAlign(Outs[OutIdx].ArgVT, DAG),
-                   getPrefTypeAlign(ArgValue.getValueType(), DAG));
-      TypeSize StoredSize = ArgValue.getValueType().getStoreSize();
-      // If the original argument was split (e.g. i128), we need
-      // to store the required parts of it here (and pass just one address).
-      // Vectors may be partly split to registers and partly to the stack, in
-      // which case the base address is partly offset and subsequent stores are
-      // relative to that.
-      unsigned ArgIndex = Outs[OutIdx].OrigArgIndex;
-      unsigned ArgPartOffset = Outs[OutIdx].PartOffset;
-      assert(VA.getValVT().isVector() || ArgPartOffset == 0);
-      // Calculate the total size to store. We don't have access to what we're
-      // actually storing other than performing the loop and collecting the
-      // info.
-      SmallVector<std::pair<SDValue, SDValue>> Parts;
-      while (i + 1 != e && Outs[OutIdx + 1].OrigArgIndex == ArgIndex) {
-        SDValue PartValue = OutVals[OutIdx + 1];
-        unsigned PartOffset = Outs[OutIdx + 1].PartOffset - ArgPartOffset;
-        SDValue Offset = DAG.getIntPtrConstant(PartOffset, DL);
-        EVT PartVT = PartValue.getValueType();
-        if (PartVT.isScalableVector())
-          Offset = DAG.getNode(ISD::VSCALE, DL, XLenVT, Offset);
-        StoredSize += PartVT.getStoreSize();
-        StackAlign = std::max(StackAlign, getPrefTypeAlign(PartVT, DAG));
-        Parts.push_back(std::make_pair(PartValue, Offset));
-        ++i;
-        ++OutIdx;
-      }
-      SDValue SpillSlot = DAG.CreateStackTemporary(StoredSize, StackAlign);
-      int FI = cast<FrameIndexSDNode>(SpillSlot)->getIndex();
-      MemOpChains.push_back(
-          DAG.getStore(Chain, DL, ArgValue, SpillSlot,
-                       MachinePointerInfo::getFixedStack(MF, FI)));
-      for (const auto &Part : Parts) {
-        SDValue PartValue = Part.first;
-        SDValue PartOffset = Part.second;
-        SDValue Address =
-            DAG.getNode(ISD::ADD, DL, PtrVT, SpillSlot, PartOffset);
+      // For musttail calls, forward the incoming indirect pointer instead
+      // of creating a new stack temporary. The incoming pointer points to
+      // the caller's caller's frame, which remains valid after a tail call.
+      if (IsTailCall && CLI.CB && CLI.CB->isMustTailCall()) {
+        unsigned IndirectIdx = 0;
+        for (unsigned k = 0; k < OutIdx; ++k) {
+          if (ArgLocs[k].getLocInfo() == CCValAssign::Indirect)
+            ++IndirectIdx;
+        }
+        ArgValue = RVFI->getIncomingIndirectArg(IndirectIdx);
+        // Skip any split parts of this argument (they are covered by the
+        // forwarded pointer).
+        unsigned ArgIndex = Outs[OutIdx].OrigArgIndex;
+        while (i + 1 != e && Outs[OutIdx + 1].OrigArgIndex == ArgIndex) {
+          ++i;
+          ++OutIdx;
+        }
+      } else {
+        // Store the argument in a stack slot and pass its address.
+        Align StackAlign =
+            std::max(getPrefTypeAlign(Outs[OutIdx].ArgVT, DAG),
+                     getPrefTypeAlign(ArgValue.getValueType(), DAG));
+        TypeSize StoredSize = ArgValue.getValueType().getStoreSize();
+        // If the original argument was split (e.g. i128), we need
+        // to store the required parts of it here (and pass just one address).
+        // Vectors may be partly split to registers and partly to the stack, in
+        // which case the base address is partly offset and subsequent stores
+        // are relative to that.
+        unsigned ArgIndex = Outs[OutIdx].OrigArgIndex;
+        unsigned ArgPartOffset = Outs[OutIdx].PartOffset;
+        assert(VA.getValVT().isVector() || ArgPartOffset == 0);
+        // Calculate the total size to store. We don't have access to what
+        // we're actually storing other than performing the loop and collecting
+        // the info.
+        SmallVector<std::pair<SDValue, SDValue>> Parts;
+        while (i + 1 != e && Outs[OutIdx + 1].OrigArgIndex == ArgIndex) {
+          SDValue PartValue = OutVals[OutIdx + 1];
+          unsigned PartOffset = Outs[OutIdx + 1].PartOffset - ArgPartOffset;
+          SDValue Offset = DAG.getIntPtrConstant(PartOffset, DL);
+          EVT PartVT = PartValue.getValueType();
+          if (PartVT.isScalableVector())
+            Offset = DAG.getNode(ISD::VSCALE, DL, XLenVT, Offset);
+          StoredSize += PartVT.getStoreSize();
+          StackAlign = std::max(StackAlign, getPrefTypeAlign(PartVT, DAG));
+          Parts.push_back(std::make_pair(PartValue, Offset));
+          ++i;
+          ++OutIdx;
+        }
+        SDValue SpillSlot = DAG.CreateStackTemporary(StoredSize, StackAlign);
+        int FI = cast<FrameIndexSDNode>(SpillSlot)->getIndex();
         MemOpChains.push_back(
-            DAG.getStore(Chain, DL, PartValue, Address,
+            DAG.getStore(Chain, DL, ArgValue, SpillSlot,
                          MachinePointerInfo::getFixedStack(MF, FI)));
+        for (const auto &Part : Parts) {
+          SDValue PartValue = Part.first;
+          SDValue PartOffset = Part.second;
+          SDValue Address =
+              DAG.getNode(ISD::ADD, DL, PtrVT, SpillSlot, PartOffset);
+          MemOpChains.push_back(
+              DAG.getStore(Chain, DL, PartValue, Address,
+                           MachinePointerInfo::getFixedStack(MF, FI)));
+        }
+        ArgValue = SpillSlot;
       }
-      ArgValue = SpillSlot;
     } else {
       ArgValue = convertValVTToLocVT(DAG, ArgValue, VA, DL, Subtarget);
     }
diff --git a/llvm/lib/Target/RISCV/RISCVMachineFunctionInfo.h b/llvm/lib/Target/RISCV/RISCVMachineFunctionInfo.h
index e23f162a317ef..65b7226025da5 100644
--- a/llvm/lib/Target/RISCV/RISCVMachineFunctionInfo.h
+++ b/llvm/lib/Target/RISCV/RISCVMachineFunctionInfo.h
@@ -73,6 +73,9 @@ class RISCVMachineFunctionInfo : public MachineFunctionInfo {
   /// Incoming ByVal arguments
   SmallVector<SDValue, 8> IncomingByValArgs;
 
+  /// Incoming indirect argument pointers (for musttail forwarding)
+  SmallVector<SDValue, 4> IncomingIndirectArgs;
+
   /// Is there any vector argument or return?
   bool IsVectorCall = false;
 
@@ -157,6 +160,16 @@ class RISCVMachineFunctionInfo : public MachineFunctionInfo {
   SDValue getIncomingByValArgs(unsigned Idx) { return IncomingByValArgs[Idx]; }
   unsigned getIncomingByValArgsSize() const { return IncomingByValArgs.size(); }
 
+  void addIncomingIndirectArg(SDValue Val) {
+    IncomingIndirectArgs.push_back(Val);
+  }
+  SDValue getIncomingIndirectArg(unsigned Idx) {
+    return IncomingIndirectArgs[Idx];
+  }
+  unsigned getIncomingIndirectArgsSize() const {
+    return IncomingIndirectArgs.size();
+  }
+
   enum class PushPopKind { None = 0, StdExtZcmp, VendorXqccmp };
 
   PushPopKind getPushPopKind(const MachineFunction &MF) const;
diff --git a/llvm/test/CodeGen/RISCV/musttail-indirect-args.ll b/llvm/test/CodeGen/RISCV/musttail-indirect-args.ll
new file mode 100644
index 0000000000000..447cacf13e2b8
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/musttail-indirect-args.ll
@@ -0,0 +1,66 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=riscv32 %s -o - | FileCheck %s --check-prefix=RV32
+; RUN: llc -mtriple=riscv64 %s -o - | FileCheck %s --check-prefix=RV64
+
+; Test that musttail with indirect args (fp128 on RV32) forwards the incoming
+; pointer instead of creating a new stack temporary. Without this fix, the
+; pointer would dangle after the tail call deallocates the caller's frame.
+
+declare i32 @callee_musttail_indirect(fp128 %a)
+
+; fp128 is indirect on RV32 (too large for registers), direct on RV64.
+; On RV32, musttail must forward the incoming indirect pointer (a0) directly.
+define i32 @caller_musttail_indirect(fp128 %a) nounwind {
+; RV32-LABEL: caller_musttail_indirect:
+; RV32:       # %bb.0:
+; RV32-NEXT:    tail callee_musttail_indirect
+;
+; RV64-LABEL: caller_musttail_indirect:
+; RV64:       # %bb.0:
+; RV64-NEXT:    tail callee_musttail_indirect
+  %call = musttail call i32 @callee_musttail_indirect(fp128 %a)
+  ret i32 %call
+}
+
+; Verify that non-musttail tail call with indirect args does NOT tail call
+; (this is the PR #184972 fix - indirect args are unsafe for regular tail calls).
+define void @caller_no_musttail_indirect() nounwind {
+; RV32-LABEL: caller_no_musttail_indirect:
+; RV32:       # %bb.0:
+; RV32-NEXT:    addi sp, sp, -32
+; RV32-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; RV32-NEXT:    lui a1, 262128
+; RV32-NEXT:    mv a0, sp
+; RV32-NEXT:    sw zero, 0(sp)
+; RV32-NEXT:    sw zero, 4(sp)
+; RV32-NEXT:    sw zero, 8(sp)
+; RV32-NEXT:    sw a1, 12(sp)
+; RV32-NEXT:    call callee_musttail_indirect
+; RV32-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
+; RV32-NEXT:    addi sp, sp, 32
+; RV32-NEXT:    ret
+;
+; RV64-LABEL: caller_no_musttail_indirect:
+; RV64:       # %bb.0:
+; RV64-NEXT:    lui a1, 16383
+; RV64-NEXT:    slli a1, a1, 36
+; RV64-NEXT:    li a0, 0
+; RV64-NEXT:    tail callee_musttail_indirect
+  %call = tail call i32 @callee_musttail_indirect(fp128 0xL00000000000000003FFF000000000000)
+  ret void
+}
+
+; Test musttail with i128 on RV32 (indirect, split into 4 x i32 parts).
+declare i64 @callee_musttail_i128(i128 %a)
+
+define i64 @caller_musttail_i128(i128 %a) nounwind {
+; RV32-LABEL: caller_musttail_i128:
+; RV32:       # %bb.0:
+; RV32-NEXT:    tail callee_musttail_i128
+;
+; RV64-LABEL: caller_musttail_i128:
+; RV64:       # %bb.0:
+; RV64-NEXT:    tail callee_musttail_i128
+  %call = musttail call i64 @callee_musttail_i128(i128 %a)
+  ret i64 %call
+}
diff --git a/llvm/test/CodeGen/RISCV/tail-calls.ll b/llvm/test/CodeGen/RISCV/tail-calls.ll
index 33feba3c6fba1..79855aa03adcf 100644
--- a/llvm/test/CodeGen/RISCV/tail-calls.ll
+++ b/llvm/test/CodeGen/RISCV/tail-calls.ll
@@ -247,20 +247,24 @@ declare i32 @callee_indirect_args(fp128 %a)
 define void @caller_indirect_args() nounwind {
 ; CHECK-LABEL: caller_indirect_args:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    addi sp, sp, -16
+; CHECK-NEXT:    addi sp, sp, -32
+; CHECK-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
 ; CHECK-NEXT:    lui a1, 262128
 ; CHECK-NEXT:    mv a0, sp
 ; CHECK-NEXT:    sw zero, 0(sp)
 ; CHECK-NEXT:    sw zero, 4(sp)
 ; CHECK-NEXT:    sw zero, 8(sp)
 ; CHECK-NEXT:    sw a1, 12(sp)
-; CHECK-NEXT:    addi sp, sp, 16
-; CHECK-NEXT:    tail callee_indirect_args
+; CHECK-NEXT:    call callee_indirect_args
+; CHECK-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    addi sp, sp, 32
+; CHECK-NEXT:    ret
 ;
 ; CHECK-LARGE-ZICFILP-LABEL: caller_indirect_args:
 ; CHECK-LARGE-ZICFILP:       # %bb.0: # %entry
 ; CHECK-LARGE-ZICFILP-NEXT:    lpad 0
-; CHECK-LARGE-ZICFILP-NEXT:    addi sp, sp, -16
+; CHECK-LARGE-ZICFILP-NEXT:    addi sp, sp, -32
+; CHECK-LARGE-ZICFILP-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
 ; CHECK-LARGE-ZICFILP-NEXT:    lui a1, 262128
 ; CHECK-LARGE-ZICFILP-NEXT:  .Lpcrel_hi9:
 ; CHECK-LARGE-ZICFILP-NEXT:    auipc a0, %pcrel_hi(.LCPI7_0)
@@ -270,8 +274,10 @@ define void @caller_indirect_args() nounwind {
 ; CHECK-LARGE-ZICFILP-NEXT:    sw zero, 4(sp)
 ; CHECK-LARGE-ZICFILP-NEXT:    sw zero, 8(sp)
 ; CHECK-LARGE-ZICFILP-NEXT:    sw a1, 12(sp)
-; CHECK-LARGE-ZICFILP-NEXT:    addi sp, sp, 16
-; CHECK-LARGE-ZICFILP-NEXT:    jr t2
+; CHECK-LARGE-ZICFILP-NEXT:    jalr t2
+; CHECK-LARGE-ZICFILP-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
+; CHECK-LARGE-ZICFILP-NEXT:    addi sp, sp, 32
+; CHECK-LARGE-ZICFILP-NEXT:    ret
 entry:
   %call = tail call i32 @callee_indirect_args(fp128 0xL00000000000000003FFF000000000000)
   ret void

Comment thread llvm/lib/Target/RISCV/RISCVISelLowering.cpp
Comment thread llvm/lib/Target/RISCV/RISCVISelLowering.cpp Outdated
@efriedma-quic efriedma-quic requested a review from folkertdev March 7, 2026 01:27
- Use DenseMap<unsigned, SDValue> keyed by OrigArgIndex instead of
  SmallVector for incoming indirect arg pointers. This avoids a fragile
  O(n) counting loop in LowerCall and directly maps argument indices.
- Add test cases for: two indirect args (DenseMap multi-key), mixed
  direct+indirect, and i128 split+trailing direct arg.
- Add test for non-musttail forwarding of indirect arg (shows it
  correctly falls back to call).

Assisted-by: Claude (Anthropic)
Co-Authored-By: Claude Opus 4.6 <[email protected]>
@xroche xroche requested a review from lenary March 7, 2026 07:33
Copy link
Copy Markdown
Contributor

@folkertdev folkertdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @heiher you probably want to update the loongarch backend in a similar way.

Comment thread llvm/test/CodeGen/RISCV/musttail-indirect-args.ll
…ered

Outs[].OrigArgIndex is the position in the call's argument list (callee
perspective), but the incoming indirect arg map is keyed by the caller's
formal parameter index. When musttail reorders arguments (e.g.,
`musttail call @f(fp128 %b, fp128 %a)`), these indices diverge, causing
the wrong pointers to be forwarded.

Fix by resolving the caller's formal parameter index via the IR: walk
CLI.CB->args() to find the Argument at the matching call position and
use its getArgNo() as the DenseMap lookup key.

Add comprehensive tests for swapped, rotated, duplicated, and
stack-spilled indirect args.

Assisted-by: Claude (Anthropic)
Co-Authored-By: Claude Opus 4.6 <[email protected]>
@xroche
Copy link
Copy Markdown
Contributor Author

xroche commented Mar 7, 2026

cc @heiher you probably want to update the loongarch backend in a similar way.

It seems to have the same non-musttail bug at LoongArchISelLowering.cpp:8427-8428 (unconditionally rejecting tail calls when any arg is indirect, with no musttail exception at all)

I could do a follow-up fix for loongarch, but considering I missed the swapped-args case, this might suggest there may be other subtle scenarios I missed...

@xroche xroche requested a review from folkertdev March 7, 2026 12:08
Comment thread llvm/lib/Target/RISCV/RISCVISelLowering.cpp Outdated
@folkertdev
Copy link
Copy Markdown
Contributor

I could do a follow-up fix for loongarch, but considering I missed the swapped-args case, this might suggest there may be other subtle scenarios I missed...

I tagged the loongarch maintainer because my riscv patch was heavily based on their patch for loongarch. The two backends are fairly similar.

When a musttail call passes a computed value (not a forwarded formal
parameter) as an indirect argument, we must store the value into the
incoming indirect pointer rather than blindly forwarding it. The
incoming pointer points to the caller's caller's frame, which remains
valid after the tail call and is writable per the RISC-V ABI.

Previously, computed values were silently dropped: the incoming pointer
was forwarded without modification, passing the original (stale) data.

The fix distinguishes two cases via dyn_cast<Argument> on the call arg:
- Forwarded formal parameter: zero-copy pointer forwarding (unchanged)
- Computed value: store into the incoming pointer, then forward it

Add tests for computed fp128, computed i128 (split indirect on RV32),
mixed computed+forwarded, and both-computed scenarios.

Assisted-by: Claude (Anthropic)
Co-Authored-By: Claude Opus 4.6 <[email protected]>
@xroche xroche requested a review from folkertdev March 9, 2026 12:26
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 9, 2026

✅ With the latest revision this PR passed the C/C++ code formatter.

Assisted-by: Claude (Anthropic)
Co-Authored-By: Claude Opus 4.6 <[email protected]>
Comment thread llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@xroche xroche requested a review from lenary March 12, 2026 08:37
@folkertdev
Copy link
Copy Markdown
Contributor

I think this tests the cases that it should, however I'd like someone more familiar with the LLVM/the riscv backend to sign off on the implementation.

Comment thread llvm/lib/Target/RISCV/RISCVISelLowering.cpp Outdated
@xroche xroche requested a review from lenary April 10, 2026 07:42
@xroche xroche force-pushed the riscv-musttail-indirect-fix branch 3 times, most recently from 7450ab9 to 349163e Compare April 19, 2026 11:11
@folkertdev
Copy link
Copy Markdown
Contributor

For easier reviewing it might be better to first cherry-pick the reverted commits, then add your changes on top? Now it's hard to see what you changed.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 19, 2026

🐧 Linux x64 Test Results

  • 194913 tests passed
  • 5174 tests skipped

✅ The build succeeded and all tests passed.

Comment thread llvm/lib/Target/RISCV/RISCVMachineFunctionInfo.h
@xroche xroche force-pushed the riscv-musttail-indirect-fix branch 3 times, most recently from 7067a9e to a07b4c8 Compare April 19, 2026 14:18
@xroche
Copy link
Copy Markdown
Contributor Author

xroche commented Apr 19, 2026

Yeah, I think that would help

Dang, I can't make it through. The revert on main conflicts with too many files and can't be cherry-picked as it, and if I revert first conflicting areas, the cherry-pick make the PR un-buildable (due to LoongArch modified files I believe). There are 5k commit behind the branch already, which makes the changes quite hard.

Edit: Finally managed to merge main after last reviewed commit, then apply the changes. But the final commit is a tad long

xroche and others added 2 commits April 19, 2026 17:35
…ct-fix

# Conflicts:
#	llvm/lib/Target/RISCV/RISCVISelLowering.cpp
#	llvm/lib/Target/RISCV/RISCVMachineFunctionInfo.h
#	llvm/test/CodeGen/RISCV/tail-calls.ll
The SDValue stored during LowerFormalArguments points into the entry
block's DAG, which is cleared at the end of CodeGenAndEmitDAG(). If the
musttail call is in a non-entry basic block, the stored SDValue is
stale.

Fix by saving the indirect pointer in a virtual register (CopyToReg
in LowerFormalArguments, CopyFromReg in LowerCall). The vreg survives
across basic blocks. The DenseMap in RVFI now stores Register instead
of SDValue, and the header does not include SelectionDAGNodes.h.

The vreg save is guarded behind a HasMusttail check so functions
without musttail calls see no codegen change.

Also adds musttail early return in isEligibleForTailCallOptimization
(musttail must always be tail-called), cross-BB test cases, and
converts musttail-call.ll from a crash test to a positive test.

Fixes llvm#185089.

Assisted-by: Claude (Anthropic)
Co-Authored-By: Claude Opus 4.6 <[email protected]>
@xroche xroche force-pushed the riscv-musttail-indirect-fix branch from a07b4c8 to 76df154 Compare April 19, 2026 15:42
Comment thread llvm/lib/Target/RISCV/RISCVISelLowering.cpp Outdated
Comment thread llvm/lib/Target/RISCV/RISCVISelLowering.cpp Outdated
Comment thread llvm/lib/Target/RISCV/RISCVISelLowering.cpp Outdated
Comment thread llvm/lib/Target/RISCV/RISCVISelLowering.cpp Outdated
Comment thread llvm/test/CodeGen/RISCV/musttail-indirect-args.ll
xroche and others added 3 commits April 21, 2026 15:11
The previous commit in this series made musttail take an early return true
from isEligibleForTailCallOptimization so indirect-arg forwarding could
kick in. That bypassed the pre-existing byval check as well, and byval
tail-call support was reverted in 501417b (llvm#191508) pending a
virtual-register-based re-implementation. The result was a silent
miscompile: for musttail with a byval arg the caller allocates a local
copy, writes a pointer into it, restores sp, and then tail calls with a
dangling pointer.

Move the byval rejection before the musttail bypass so it applies to
both regular and must-tail calls. A musttail site with a byval arg will
now hit reportFatalInternalError in LowerCall, matching the behavior on
main before the indirect-arg fix landed, until byval tail-call support
is reinstated on top of the new vreg-based forwarding path.

Assisted-by: Claude (Anthropic)
Co-Authored-By: Claude Opus 4.6 <[email protected]>
…nownStack

Three small fixes from lenary's review of PR llvm#185094:

- Thread the output chain from getCopyFromReg through MemOpChains so the
  TokenFactor that feeds the call sequences the copy with any stores
  emitted for the argument. The previous code dropped the chain output,
  which is belt-and-suspenders safe here thanks to SSA vregs but is
  inconsistent with how the rest of LowerCall threads chains.

- Use MachinePointerInfo::getUnknownStack(MF) for the stores into the
  forwarded incoming indirect pointer, instead of the default-constructed
  MachinePointerInfo(). The destination is a stack slot in the caller's
  caller's frame, so the alloca address space is the right metadata
  until we can track the actual frame index.

- Add a FIXME explaining why LowerCall has to walk the caller's formals
  again: InputArg::OrigArgIndex is Argument::getArgNo() (unfiltered) but
  OutputArg::OrigArgIndex is an index into a filtered arg list (empty
  types removed). The asymmetry is target-independent and should be
  normalized; for now backends have to re-derive the mapping. Switch
  the walk to MF.getFunction() since that is the caller's IR Function
  directly (equivalent to CLI.CB->getFunction() but clearer).

Folds the forwarded-formal and computed-value paths together since both
need the CopyFromReg now; only the computed-value path still emits
stores.

Assisted-by: Claude (Anthropic)
Co-Authored-By: Claude Opus 4.6 <[email protected]>
…t args

Cover three cases that the prior tests did not exercise:

- caller_musttail_stack_spill: 10 x i32 args on RV32/RV64. Args 9 and 10
  spill to the caller's incoming stack slots; musttail reuses the same
  layout for the outgoing args. This exercises the isEligibleFor
  TailCallOptimization stack-size bypass that kicks in for musttail
  (requested by lenary in review of PR llvm#185094).

- caller_musttail_sret: sret + musttail. The sret pointer is just a
  regular pointer arg in a0 and is forwarded unchanged. Confirms that
  the sret bypass in isEligibleForTailCallOptimization does not
  miscompile.

- caller_musttail_indirect_and_spill: mix of indirect fp128 and stack-
  spilled i32 args. Exercises the interaction between indirect arg
  forwarding and the stack-spill re-use path on the same call.

Also add a short note about byval + musttail: isEligibleFor
TailCallOptimization rejects byval, which causes the musttail site to
hit reportFatalInternalError. Tail-call support for byval was reverted
in 501417b (RISC-V/LoongArch) pending a vreg-based
re-implementation; once that lands, musttail + byval can be added here.

Assisted-by: Claude (Anthropic)
Co-Authored-By: Claude Opus 4.6 <[email protected]>
@xroche
Copy link
Copy Markdown
Contributor Author

xroche commented Apr 21, 2026

While addressing this round of feedback I noticed a regression I introduced in this PR: the musttail early return true in isEligibleForTailCallOptimization bypassed the pre-existing isByVal() rejection as well, so for musttail + byval we silently produced code that allocates a local copy, stores a pointer to it, restores sp, and then tail-calls with a dangling pointer.

This was fixed in 782cd6b

Tests in this PR checks asm, not execution; a runtime QEMU smoke test would have surfaced the dangling pointer early; do you think this could be needed here ?

The complexity of the fix makes me a bit nervous, especially since I am not fully grasping the potential important bits.

@xroche xroche requested a review from lenary April 21, 2026 15:12
Copy link
Copy Markdown
Member

@lenary lenary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking better. Sorry for taking another week to get back to you, some of these comments are from earlier in that time, and I found them today.

I think if you would feel more confident testing this in Qemu, then you should do what you need to feel confident :) We don't have qemu in in-tree upstream tests because we don't want that kind of dependency, but we do expect to have a full system that can execute programs in llvm-test-suite, so maybe running that would help your confidence? I don't know how many musttail tests we have, because that's what you're looking for coverage of.

Comment thread llvm/lib/Target/RISCV/RISCVISelLowering.cpp Outdated
Comment thread llvm/lib/Target/RISCV/RISCVISelLowering.cpp Outdated
Comment thread llvm/lib/Target/RISCV/RISCVISelLowering.cpp Outdated
Comment thread llvm/test/CodeGen/RISCV/musttail-indirect-args.ll
xroche and others added 2 commits April 28, 2026 13:22
In the computed-value branch of the musttail indirect-arg path, the two
stores into the incoming indirect pointer used the pre-copy Chain as
their input chain. Using CopyOp.getValue(1) makes the dependency on the
CopyFromReg explicit so the SDAG sequences each store after the copy
that produced its destination pointer.

Suggested by lenary in review of PR llvm#185094.

Assisted-by: Claude (Anthropic)
Co-Authored-By: Claude Opus 4.6 <[email protected]>
Side effect of sequencing musttail computed-arg stores after the
CopyFromReg in the previous commit: the scheduler emits the stores
into the two incoming pointers in the opposite order, and the
register allocator picks different callee-saved registers in
caller_musttail_both_computed. The generated asm is equivalent.

Regenerated with utils/update_llc_test_checks.py.

Assisted-by: Claude (Anthropic)
Co-Authored-By: Claude Opus 4.6 <[email protected]>
@xroche
Copy link
Copy Markdown
Contributor Author

xroche commented Apr 28, 2026

I think if you would feel more confident testing this in Qemu, then you should do what you need to feel confident :)

I crafted (still with the help of our Claude friend) locally a qemu set of tests:

  • forwarded indirect arg (the original PR fix)
  • computed indirect arg (the case you caught
  • swapped indirect args (the OrigArgIndex asymmetry in OutputArg vs InputArg
  • musttail behind a branch (the SDValue lifetime regression you flagged)
  • 10-arg stack spill (your earlier ask)
  • sret + musttail
  • indirect + spill on the same call

I'd be happy to push it on a public repo if this can be of any value to anyone!

Copy link
Copy Markdown
Member

@lenary lenary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please wait for Craig to review as well.

@lenary lenary requested a review from topperc May 5, 2026 17:48
// same-position formal parameter (musttail guarantees matching
// prototypes, so types match). The pointer survives the tail call
// since it points to the caller's caller's frame. Use the
// CopyFromReg output chain so the store is sequenced after the
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this sequencing already guaranteed by the store using IncomingPtr as the pointer?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked for this, as I think it makes the side effects clearer.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with the explicit chain, but maybe the comment should be improved about why it's there?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote the comment to make the dependency a bit more explicit - does it make more sense now ?

// Store any split parts at their respective offsets.
unsigned ArgPartOffset = Outs[OutIdx].PartOffset;
while (i + 1 != e && Outs[OutIdx + 1].OrigArgIndex == CallArgIdx) {
SDValue PartValue = OutVals[OutIdx + 1];
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for PartValue to a be a scalable vector?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you've updated the code, but I don't see a test.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, done in f888fe4 ; I didn't add a regression test though - this seems a tag complicated to do - and the existing non-musttail spill path does not seem to have one either.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I applied this patch to my repo and 2 tests failed. So I think that means there is test coverage?

diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index d12661631274..54474ee2b841 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -24391,8 +24391,10 @@ SDValue RISCVTargetLowering::LowerCall(CallLoweringInfo &CLI,
         unsigned PartOffset = Outs[OutIdx + 1].PartOffset - ArgPartOffset;
         SDValue Offset = DAG.getIntPtrConstant(PartOffset, DL);
         EVT PartVT = PartValue.getValueType();
-        if (PartVT.isScalableVector())
+        if (PartVT.isScalableVector()){
+          assert(0);
           Offset = DAG.getNode(ISD::VSCALE, DL, XLenVT, Offset);
+        }
         StoredSize += PartVT.getStoreSize();
         StackAlign = std::max(StackAlign, getPrefTypeAlign(PartVT, DAG));
         Parts.push_back(std::make_pair(PartValue, Offset));

Failed Tests (2):
LLVM :: CodeGen/RISCV/rvv/calling-conv-fastcc.ll
LLVM :: CodeGen/RISCV/rvv/calling-conv.ll

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, indeed, I missed caller_scalable_vector_split_indirect! Added rvv/musttail-indirect-args.ll with three musttail variants

Comment thread llvm/test/CodeGen/RISCV/musttail-indirect-args.ll Outdated
xroche and others added 3 commits May 6, 2026 18:47
The non-musttail spill path multiplies the part offset by VSCALE for
scalable vector parts. The musttail computed-value path was missing
that, so scalable vector args computed in the caller and forwarded via
musttail would land at the wrong offset relative to the incoming
indirect pointer.

Mirror the non-musttail spelling: build the offset with
DAG.getIntPtrConstant, then wrap with ISD::VSCALE when the part type
is a scalable vector. Reported by topperc on PR llvm#185094.

Assisted-by: Claude (Anthropic)
Co-Authored-By: Claude Opus 4.6 <[email protected]>
…romReg

topperc pointed out on PR llvm#185094 that the data-flow edge through
IncomingPtr already prevents the store from being scheduled before the
CopyFromReg, so the chain edge is not strictly required for ordering.
The chain edge is there to make the side-effect dependency explicit on
the chain side as well, matching the convention for memory ops chaining
off their producers (lenary's earlier review).

Rewrite the comment to describe this accurately. Code unchanged.

Assisted-by: Claude (Anthropic)
Co-Authored-By: Claude Opus 4.6 <[email protected]>
…st comments

topperc noted on PR llvm#185094 that the function name was wrapped across
two comment lines, which makes it hard to grep. Keep the name on one
line; the lines exceed 80 columns by a few chars but that matches the
file-wide style for long identifiers.

Assisted-by: Claude (Anthropic)
Co-Authored-By: Claude Opus 4.6 <[email protected]>
@xroche xroche requested a review from topperc May 6, 2026 18:37
Add a sibling to musttail-indirect-args.ll under rvv/ for the scalable
vector path that topperc demonstrated is exercised by upstream tests
(calling-conv.ll's caller_scalable_vector_split_indirect on PR llvm#185094).

<vscale x 32 x i32> is too large for v8-v23, so RISCVCallingConv classifies
it as CCValAssign::Indirect and the type is split into multiple parts.
Three test functions cover the three branches of the musttail indirect-arg
path:

- caller_musttail_scalable_forwarded: both args are caller formals,
  zero-copy pointer forwarding, no per-part offset computation.
- caller_musttail_scalable_computed: both args are arithmetic on the
  formals; dyn_cast<Argument> fails and we store into the incoming
  pointer with VSCALE-scaled per-part offsets. The asm shows the second
  store landing at IncomingPtr + VSCALE * 8 (vlenb << 3), confirming
  the VSCALE multiplication added in f888fe4.
- caller_musttail_scalable_mixed: first arg forwarded, second computed.

Assisted-by: Claude (Anthropic)
Co-Authored-By: Claude Opus 4.6 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RISCV] musttail with indirect arguments causes use-after-free

5 participants