[RISCV] Fix musttail with indirect arguments by forwarding incoming pointers#185094
[RISCV] Fix musttail with indirect arguments by forwarding incoming pointers#185094
Conversation
…ointers When a musttail call has arguments passed indirectly (CCValAssign::Indirect), the current code creates a new stack temporary and copies the data there. The tail call then deallocates the caller's stack frame, leaving the pointer dangling. Fix this by forwarding the original incoming indirect pointer instead of re-spilling. Since musttail guarantees matching prototypes, incoming and outgoing indirect args have a 1:1 correspondence, and the incoming pointer (from the caller's caller's frame) remains valid after the tail call. This also subsumes the non-musttail indirect args fix: non-musttail calls with indirect args are rejected from tail call optimization (the pointer would dangle), while musttail calls forward the incoming pointer. Fixes llvm#185089. Assisted-by: Claude (Anthropic) Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
@llvm/pr-subscribers-backend-risc-v Author: Xavier Roche (xroche) ChangesWhen a Fix this by forwarding the original incoming indirect pointer instead of re-spilling. Since This patch also includes the non- Changes
Fixes #185089. AI DisclosureThis patch was developed with assistance from Claude (Anthropic). All code has been reviewed and validated by the author. 4 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 17d7db95886ab..68cff3332b873 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -24454,6 +24454,8 @@ SDValue RISCVTargetLowering::LowerFormalArguments(
ArgValue = unpackFromMemLoc(DAG, Chain, VA, DL, *this);
if (VA.getLocInfo() == CCValAssign::Indirect) {
+ // Save the incoming indirect pointer for musttail forwarding.
+ RVFI->addIncomingIndirectArg(ArgValue);
// If the original argument was split and passed by reference (e.g. i128
// on RV32), we need to load all parts of it here (using the same
// address). Vectors may be partly split to registers and partly to the
@@ -24578,6 +24580,19 @@ bool RISCVTargetLowering::isEligibleForTailCallOptimization(
if (CCInfo.getStackSize() > RVFI->getArgumentStackSize())
return false;
+ // Do not tail call optimize if any argument needs to be passed indirectly.
+ // The caller allocates stack space and passes a pointer to the callee. On a
+ // tail call the caller's stack frame is deallocated before the callee
+ // executes, invalidating the pointer (use-after-free).
+ // musttail is excluded: callers forward incoming indirect pointers that
+ // point to the caller's caller's frame, which remains valid.
+ if (!CLI.CB || !CLI.CB->isMustTailCall()) {
+ for (const auto &VA : ArgLocs) {
+ if (VA.getLocInfo() == CCValAssign::Indirect)
+ return false;
+ }
+ }
+
// Do not tail call opt if either caller or callee uses struct return
// semantics.
auto IsCallerStructRet = Caller.hasStructRetAttr();
@@ -24765,51 +24780,70 @@ SDValue RISCVTargetLowering::LowerCall(CallLoweringInfo &CLI,
// Promote the value if needed.
// For now, only handle fully promoted and indirect arguments.
if (VA.getLocInfo() == CCValAssign::Indirect) {
- // Store the argument in a stack slot and pass its address.
- Align StackAlign =
- std::max(getPrefTypeAlign(Outs[OutIdx].ArgVT, DAG),
- getPrefTypeAlign(ArgValue.getValueType(), DAG));
- TypeSize StoredSize = ArgValue.getValueType().getStoreSize();
- // If the original argument was split (e.g. i128), we need
- // to store the required parts of it here (and pass just one address).
- // Vectors may be partly split to registers and partly to the stack, in
- // which case the base address is partly offset and subsequent stores are
- // relative to that.
- unsigned ArgIndex = Outs[OutIdx].OrigArgIndex;
- unsigned ArgPartOffset = Outs[OutIdx].PartOffset;
- assert(VA.getValVT().isVector() || ArgPartOffset == 0);
- // Calculate the total size to store. We don't have access to what we're
- // actually storing other than performing the loop and collecting the
- // info.
- SmallVector<std::pair<SDValue, SDValue>> Parts;
- while (i + 1 != e && Outs[OutIdx + 1].OrigArgIndex == ArgIndex) {
- SDValue PartValue = OutVals[OutIdx + 1];
- unsigned PartOffset = Outs[OutIdx + 1].PartOffset - ArgPartOffset;
- SDValue Offset = DAG.getIntPtrConstant(PartOffset, DL);
- EVT PartVT = PartValue.getValueType();
- if (PartVT.isScalableVector())
- Offset = DAG.getNode(ISD::VSCALE, DL, XLenVT, Offset);
- StoredSize += PartVT.getStoreSize();
- StackAlign = std::max(StackAlign, getPrefTypeAlign(PartVT, DAG));
- Parts.push_back(std::make_pair(PartValue, Offset));
- ++i;
- ++OutIdx;
- }
- SDValue SpillSlot = DAG.CreateStackTemporary(StoredSize, StackAlign);
- int FI = cast<FrameIndexSDNode>(SpillSlot)->getIndex();
- MemOpChains.push_back(
- DAG.getStore(Chain, DL, ArgValue, SpillSlot,
- MachinePointerInfo::getFixedStack(MF, FI)));
- for (const auto &Part : Parts) {
- SDValue PartValue = Part.first;
- SDValue PartOffset = Part.second;
- SDValue Address =
- DAG.getNode(ISD::ADD, DL, PtrVT, SpillSlot, PartOffset);
+ // For musttail calls, forward the incoming indirect pointer instead
+ // of creating a new stack temporary. The incoming pointer points to
+ // the caller's caller's frame, which remains valid after a tail call.
+ if (IsTailCall && CLI.CB && CLI.CB->isMustTailCall()) {
+ unsigned IndirectIdx = 0;
+ for (unsigned k = 0; k < OutIdx; ++k) {
+ if (ArgLocs[k].getLocInfo() == CCValAssign::Indirect)
+ ++IndirectIdx;
+ }
+ ArgValue = RVFI->getIncomingIndirectArg(IndirectIdx);
+ // Skip any split parts of this argument (they are covered by the
+ // forwarded pointer).
+ unsigned ArgIndex = Outs[OutIdx].OrigArgIndex;
+ while (i + 1 != e && Outs[OutIdx + 1].OrigArgIndex == ArgIndex) {
+ ++i;
+ ++OutIdx;
+ }
+ } else {
+ // Store the argument in a stack slot and pass its address.
+ Align StackAlign =
+ std::max(getPrefTypeAlign(Outs[OutIdx].ArgVT, DAG),
+ getPrefTypeAlign(ArgValue.getValueType(), DAG));
+ TypeSize StoredSize = ArgValue.getValueType().getStoreSize();
+ // If the original argument was split (e.g. i128), we need
+ // to store the required parts of it here (and pass just one address).
+ // Vectors may be partly split to registers and partly to the stack, in
+ // which case the base address is partly offset and subsequent stores
+ // are relative to that.
+ unsigned ArgIndex = Outs[OutIdx].OrigArgIndex;
+ unsigned ArgPartOffset = Outs[OutIdx].PartOffset;
+ assert(VA.getValVT().isVector() || ArgPartOffset == 0);
+ // Calculate the total size to store. We don't have access to what
+ // we're actually storing other than performing the loop and collecting
+ // the info.
+ SmallVector<std::pair<SDValue, SDValue>> Parts;
+ while (i + 1 != e && Outs[OutIdx + 1].OrigArgIndex == ArgIndex) {
+ SDValue PartValue = OutVals[OutIdx + 1];
+ unsigned PartOffset = Outs[OutIdx + 1].PartOffset - ArgPartOffset;
+ SDValue Offset = DAG.getIntPtrConstant(PartOffset, DL);
+ EVT PartVT = PartValue.getValueType();
+ if (PartVT.isScalableVector())
+ Offset = DAG.getNode(ISD::VSCALE, DL, XLenVT, Offset);
+ StoredSize += PartVT.getStoreSize();
+ StackAlign = std::max(StackAlign, getPrefTypeAlign(PartVT, DAG));
+ Parts.push_back(std::make_pair(PartValue, Offset));
+ ++i;
+ ++OutIdx;
+ }
+ SDValue SpillSlot = DAG.CreateStackTemporary(StoredSize, StackAlign);
+ int FI = cast<FrameIndexSDNode>(SpillSlot)->getIndex();
MemOpChains.push_back(
- DAG.getStore(Chain, DL, PartValue, Address,
+ DAG.getStore(Chain, DL, ArgValue, SpillSlot,
MachinePointerInfo::getFixedStack(MF, FI)));
+ for (const auto &Part : Parts) {
+ SDValue PartValue = Part.first;
+ SDValue PartOffset = Part.second;
+ SDValue Address =
+ DAG.getNode(ISD::ADD, DL, PtrVT, SpillSlot, PartOffset);
+ MemOpChains.push_back(
+ DAG.getStore(Chain, DL, PartValue, Address,
+ MachinePointerInfo::getFixedStack(MF, FI)));
+ }
+ ArgValue = SpillSlot;
}
- ArgValue = SpillSlot;
} else {
ArgValue = convertValVTToLocVT(DAG, ArgValue, VA, DL, Subtarget);
}
diff --git a/llvm/lib/Target/RISCV/RISCVMachineFunctionInfo.h b/llvm/lib/Target/RISCV/RISCVMachineFunctionInfo.h
index e23f162a317ef..65b7226025da5 100644
--- a/llvm/lib/Target/RISCV/RISCVMachineFunctionInfo.h
+++ b/llvm/lib/Target/RISCV/RISCVMachineFunctionInfo.h
@@ -73,6 +73,9 @@ class RISCVMachineFunctionInfo : public MachineFunctionInfo {
/// Incoming ByVal arguments
SmallVector<SDValue, 8> IncomingByValArgs;
+ /// Incoming indirect argument pointers (for musttail forwarding)
+ SmallVector<SDValue, 4> IncomingIndirectArgs;
+
/// Is there any vector argument or return?
bool IsVectorCall = false;
@@ -157,6 +160,16 @@ class RISCVMachineFunctionInfo : public MachineFunctionInfo {
SDValue getIncomingByValArgs(unsigned Idx) { return IncomingByValArgs[Idx]; }
unsigned getIncomingByValArgsSize() const { return IncomingByValArgs.size(); }
+ void addIncomingIndirectArg(SDValue Val) {
+ IncomingIndirectArgs.push_back(Val);
+ }
+ SDValue getIncomingIndirectArg(unsigned Idx) {
+ return IncomingIndirectArgs[Idx];
+ }
+ unsigned getIncomingIndirectArgsSize() const {
+ return IncomingIndirectArgs.size();
+ }
+
enum class PushPopKind { None = 0, StdExtZcmp, VendorXqccmp };
PushPopKind getPushPopKind(const MachineFunction &MF) const;
diff --git a/llvm/test/CodeGen/RISCV/musttail-indirect-args.ll b/llvm/test/CodeGen/RISCV/musttail-indirect-args.ll
new file mode 100644
index 0000000000000..447cacf13e2b8
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/musttail-indirect-args.ll
@@ -0,0 +1,66 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=riscv32 %s -o - | FileCheck %s --check-prefix=RV32
+; RUN: llc -mtriple=riscv64 %s -o - | FileCheck %s --check-prefix=RV64
+
+; Test that musttail with indirect args (fp128 on RV32) forwards the incoming
+; pointer instead of creating a new stack temporary. Without this fix, the
+; pointer would dangle after the tail call deallocates the caller's frame.
+
+declare i32 @callee_musttail_indirect(fp128 %a)
+
+; fp128 is indirect on RV32 (too large for registers), direct on RV64.
+; On RV32, musttail must forward the incoming indirect pointer (a0) directly.
+define i32 @caller_musttail_indirect(fp128 %a) nounwind {
+; RV32-LABEL: caller_musttail_indirect:
+; RV32: # %bb.0:
+; RV32-NEXT: tail callee_musttail_indirect
+;
+; RV64-LABEL: caller_musttail_indirect:
+; RV64: # %bb.0:
+; RV64-NEXT: tail callee_musttail_indirect
+ %call = musttail call i32 @callee_musttail_indirect(fp128 %a)
+ ret i32 %call
+}
+
+; Verify that non-musttail tail call with indirect args does NOT tail call
+; (this is the PR #184972 fix - indirect args are unsafe for regular tail calls).
+define void @caller_no_musttail_indirect() nounwind {
+; RV32-LABEL: caller_no_musttail_indirect:
+; RV32: # %bb.0:
+; RV32-NEXT: addi sp, sp, -32
+; RV32-NEXT: sw ra, 28(sp) # 4-byte Folded Spill
+; RV32-NEXT: lui a1, 262128
+; RV32-NEXT: mv a0, sp
+; RV32-NEXT: sw zero, 0(sp)
+; RV32-NEXT: sw zero, 4(sp)
+; RV32-NEXT: sw zero, 8(sp)
+; RV32-NEXT: sw a1, 12(sp)
+; RV32-NEXT: call callee_musttail_indirect
+; RV32-NEXT: lw ra, 28(sp) # 4-byte Folded Reload
+; RV32-NEXT: addi sp, sp, 32
+; RV32-NEXT: ret
+;
+; RV64-LABEL: caller_no_musttail_indirect:
+; RV64: # %bb.0:
+; RV64-NEXT: lui a1, 16383
+; RV64-NEXT: slli a1, a1, 36
+; RV64-NEXT: li a0, 0
+; RV64-NEXT: tail callee_musttail_indirect
+ %call = tail call i32 @callee_musttail_indirect(fp128 0xL00000000000000003FFF000000000000)
+ ret void
+}
+
+; Test musttail with i128 on RV32 (indirect, split into 4 x i32 parts).
+declare i64 @callee_musttail_i128(i128 %a)
+
+define i64 @caller_musttail_i128(i128 %a) nounwind {
+; RV32-LABEL: caller_musttail_i128:
+; RV32: # %bb.0:
+; RV32-NEXT: tail callee_musttail_i128
+;
+; RV64-LABEL: caller_musttail_i128:
+; RV64: # %bb.0:
+; RV64-NEXT: tail callee_musttail_i128
+ %call = musttail call i64 @callee_musttail_i128(i128 %a)
+ ret i64 %call
+}
diff --git a/llvm/test/CodeGen/RISCV/tail-calls.ll b/llvm/test/CodeGen/RISCV/tail-calls.ll
index 33feba3c6fba1..79855aa03adcf 100644
--- a/llvm/test/CodeGen/RISCV/tail-calls.ll
+++ b/llvm/test/CodeGen/RISCV/tail-calls.ll
@@ -247,20 +247,24 @@ declare i32 @callee_indirect_args(fp128 %a)
define void @caller_indirect_args() nounwind {
; CHECK-LABEL: caller_indirect_args:
; CHECK: # %bb.0: # %entry
-; CHECK-NEXT: addi sp, sp, -16
+; CHECK-NEXT: addi sp, sp, -32
+; CHECK-NEXT: sw ra, 28(sp) # 4-byte Folded Spill
; CHECK-NEXT: lui a1, 262128
; CHECK-NEXT: mv a0, sp
; CHECK-NEXT: sw zero, 0(sp)
; CHECK-NEXT: sw zero, 4(sp)
; CHECK-NEXT: sw zero, 8(sp)
; CHECK-NEXT: sw a1, 12(sp)
-; CHECK-NEXT: addi sp, sp, 16
-; CHECK-NEXT: tail callee_indirect_args
+; CHECK-NEXT: call callee_indirect_args
+; CHECK-NEXT: lw ra, 28(sp) # 4-byte Folded Reload
+; CHECK-NEXT: addi sp, sp, 32
+; CHECK-NEXT: ret
;
; CHECK-LARGE-ZICFILP-LABEL: caller_indirect_args:
; CHECK-LARGE-ZICFILP: # %bb.0: # %entry
; CHECK-LARGE-ZICFILP-NEXT: lpad 0
-; CHECK-LARGE-ZICFILP-NEXT: addi sp, sp, -16
+; CHECK-LARGE-ZICFILP-NEXT: addi sp, sp, -32
+; CHECK-LARGE-ZICFILP-NEXT: sw ra, 28(sp) # 4-byte Folded Spill
; CHECK-LARGE-ZICFILP-NEXT: lui a1, 262128
; CHECK-LARGE-ZICFILP-NEXT: .Lpcrel_hi9:
; CHECK-LARGE-ZICFILP-NEXT: auipc a0, %pcrel_hi(.LCPI7_0)
@@ -270,8 +274,10 @@ define void @caller_indirect_args() nounwind {
; CHECK-LARGE-ZICFILP-NEXT: sw zero, 4(sp)
; CHECK-LARGE-ZICFILP-NEXT: sw zero, 8(sp)
; CHECK-LARGE-ZICFILP-NEXT: sw a1, 12(sp)
-; CHECK-LARGE-ZICFILP-NEXT: addi sp, sp, 16
-; CHECK-LARGE-ZICFILP-NEXT: jr t2
+; CHECK-LARGE-ZICFILP-NEXT: jalr t2
+; CHECK-LARGE-ZICFILP-NEXT: lw ra, 28(sp) # 4-byte Folded Reload
+; CHECK-LARGE-ZICFILP-NEXT: addi sp, sp, 32
+; CHECK-LARGE-ZICFILP-NEXT: ret
entry:
%call = tail call i32 @callee_indirect_args(fp128 0xL00000000000000003FFF000000000000)
ret void
|
- Use DenseMap<unsigned, SDValue> keyed by OrigArgIndex instead of SmallVector for incoming indirect arg pointers. This avoids a fragile O(n) counting loop in LowerCall and directly maps argument indices. - Add test cases for: two indirect args (DenseMap multi-key), mixed direct+indirect, and i128 split+trailing direct arg. - Add test for non-musttail forwarding of indirect arg (shows it correctly falls back to call). Assisted-by: Claude (Anthropic) Co-Authored-By: Claude Opus 4.6 <[email protected]>
folkertdev
left a comment
There was a problem hiding this comment.
cc @heiher you probably want to update the loongarch backend in a similar way.
…ered Outs[].OrigArgIndex is the position in the call's argument list (callee perspective), but the incoming indirect arg map is keyed by the caller's formal parameter index. When musttail reorders arguments (e.g., `musttail call @f(fp128 %b, fp128 %a)`), these indices diverge, causing the wrong pointers to be forwarded. Fix by resolving the caller's formal parameter index via the IR: walk CLI.CB->args() to find the Argument at the matching call position and use its getArgNo() as the DenseMap lookup key. Add comprehensive tests for swapped, rotated, duplicated, and stack-spilled indirect args. Assisted-by: Claude (Anthropic) Co-Authored-By: Claude Opus 4.6 <[email protected]>
It seems to have the same non-musttail bug at LoongArchISelLowering.cpp:8427-8428 (unconditionally rejecting tail calls when any arg is indirect, with no musttail exception at all) I could do a follow-up fix for loongarch, but considering I missed the swapped-args case, this might suggest there may be other subtle scenarios I missed... |
I tagged the loongarch maintainer because my riscv patch was heavily based on their patch for loongarch. The two backends are fairly similar. |
When a musttail call passes a computed value (not a forwarded formal parameter) as an indirect argument, we must store the value into the incoming indirect pointer rather than blindly forwarding it. The incoming pointer points to the caller's caller's frame, which remains valid after the tail call and is writable per the RISC-V ABI. Previously, computed values were silently dropped: the incoming pointer was forwarded without modification, passing the original (stale) data. The fix distinguishes two cases via dyn_cast<Argument> on the call arg: - Forwarded formal parameter: zero-copy pointer forwarding (unchanged) - Computed value: store into the incoming pointer, then forward it Add tests for computed fp128, computed i128 (split indirect on RV32), mixed computed+forwarded, and both-computed scenarios. Assisted-by: Claude (Anthropic) Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
Assisted-by: Claude (Anthropic) Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
I think this tests the cases that it should, however I'd like someone more familiar with the LLVM/the riscv backend to sign off on the implementation. |
7450ab9 to
349163e
Compare
|
For easier reviewing it might be better to first cherry-pick the reverted commits, then add your changes on top? Now it's hard to see what you changed. |
🐧 Linux x64 Test Results
✅ The build succeeded and all tests passed. |
7067a9e to
a07b4c8
Compare
Dang, I can't make it through. The revert on main conflicts with too many files and can't be cherry-picked as it, and if I revert first conflicting areas, the cherry-pick make the PR un-buildable (due to LoongArch modified files I believe). There are 5k commit behind the branch already, which makes the changes quite hard. Edit: Finally managed to merge main after last reviewed commit, then apply the changes. But the final commit is a tad long |
…ct-fix # Conflicts: # llvm/lib/Target/RISCV/RISCVISelLowering.cpp # llvm/lib/Target/RISCV/RISCVMachineFunctionInfo.h # llvm/test/CodeGen/RISCV/tail-calls.ll
The SDValue stored during LowerFormalArguments points into the entry block's DAG, which is cleared at the end of CodeGenAndEmitDAG(). If the musttail call is in a non-entry basic block, the stored SDValue is stale. Fix by saving the indirect pointer in a virtual register (CopyToReg in LowerFormalArguments, CopyFromReg in LowerCall). The vreg survives across basic blocks. The DenseMap in RVFI now stores Register instead of SDValue, and the header does not include SelectionDAGNodes.h. The vreg save is guarded behind a HasMusttail check so functions without musttail calls see no codegen change. Also adds musttail early return in isEligibleForTailCallOptimization (musttail must always be tail-called), cross-BB test cases, and converts musttail-call.ll from a crash test to a positive test. Fixes llvm#185089. Assisted-by: Claude (Anthropic) Co-Authored-By: Claude Opus 4.6 <[email protected]>
a07b4c8 to
76df154
Compare
The previous commit in this series made musttail take an early return true from isEligibleForTailCallOptimization so indirect-arg forwarding could kick in. That bypassed the pre-existing byval check as well, and byval tail-call support was reverted in 501417b (llvm#191508) pending a virtual-register-based re-implementation. The result was a silent miscompile: for musttail with a byval arg the caller allocates a local copy, writes a pointer into it, restores sp, and then tail calls with a dangling pointer. Move the byval rejection before the musttail bypass so it applies to both regular and must-tail calls. A musttail site with a byval arg will now hit reportFatalInternalError in LowerCall, matching the behavior on main before the indirect-arg fix landed, until byval tail-call support is reinstated on top of the new vreg-based forwarding path. Assisted-by: Claude (Anthropic) Co-Authored-By: Claude Opus 4.6 <[email protected]>
…nownStack Three small fixes from lenary's review of PR llvm#185094: - Thread the output chain from getCopyFromReg through MemOpChains so the TokenFactor that feeds the call sequences the copy with any stores emitted for the argument. The previous code dropped the chain output, which is belt-and-suspenders safe here thanks to SSA vregs but is inconsistent with how the rest of LowerCall threads chains. - Use MachinePointerInfo::getUnknownStack(MF) for the stores into the forwarded incoming indirect pointer, instead of the default-constructed MachinePointerInfo(). The destination is a stack slot in the caller's caller's frame, so the alloca address space is the right metadata until we can track the actual frame index. - Add a FIXME explaining why LowerCall has to walk the caller's formals again: InputArg::OrigArgIndex is Argument::getArgNo() (unfiltered) but OutputArg::OrigArgIndex is an index into a filtered arg list (empty types removed). The asymmetry is target-independent and should be normalized; for now backends have to re-derive the mapping. Switch the walk to MF.getFunction() since that is the caller's IR Function directly (equivalent to CLI.CB->getFunction() but clearer). Folds the forwarded-formal and computed-value paths together since both need the CopyFromReg now; only the computed-value path still emits stores. Assisted-by: Claude (Anthropic) Co-Authored-By: Claude Opus 4.6 <[email protected]>
…t args Cover three cases that the prior tests did not exercise: - caller_musttail_stack_spill: 10 x i32 args on RV32/RV64. Args 9 and 10 spill to the caller's incoming stack slots; musttail reuses the same layout for the outgoing args. This exercises the isEligibleFor TailCallOptimization stack-size bypass that kicks in for musttail (requested by lenary in review of PR llvm#185094). - caller_musttail_sret: sret + musttail. The sret pointer is just a regular pointer arg in a0 and is forwarded unchanged. Confirms that the sret bypass in isEligibleForTailCallOptimization does not miscompile. - caller_musttail_indirect_and_spill: mix of indirect fp128 and stack- spilled i32 args. Exercises the interaction between indirect arg forwarding and the stack-spill re-use path on the same call. Also add a short note about byval + musttail: isEligibleFor TailCallOptimization rejects byval, which causes the musttail site to hit reportFatalInternalError. Tail-call support for byval was reverted in 501417b (RISC-V/LoongArch) pending a vreg-based re-implementation; once that lands, musttail + byval can be added here. Assisted-by: Claude (Anthropic) Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
While addressing this round of feedback I noticed a regression I introduced in this PR: the musttail early This was fixed in 782cd6b Tests in this PR checks asm, not execution; a runtime QEMU smoke test would have surfaced the dangling pointer early; do you think this could be needed here ? The complexity of the fix makes me a bit nervous, especially since I am not fully grasping the potential important bits. |
lenary
left a comment
There was a problem hiding this comment.
This is looking better. Sorry for taking another week to get back to you, some of these comments are from earlier in that time, and I found them today.
I think if you would feel more confident testing this in Qemu, then you should do what you need to feel confident :) We don't have qemu in in-tree upstream tests because we don't want that kind of dependency, but we do expect to have a full system that can execute programs in llvm-test-suite, so maybe running that would help your confidence? I don't know how many musttail tests we have, because that's what you're looking for coverage of.
In the computed-value branch of the musttail indirect-arg path, the two stores into the incoming indirect pointer used the pre-copy Chain as their input chain. Using CopyOp.getValue(1) makes the dependency on the CopyFromReg explicit so the SDAG sequences each store after the copy that produced its destination pointer. Suggested by lenary in review of PR llvm#185094. Assisted-by: Claude (Anthropic) Co-Authored-By: Claude Opus 4.6 <[email protected]>
Side effect of sequencing musttail computed-arg stores after the CopyFromReg in the previous commit: the scheduler emits the stores into the two incoming pointers in the opposite order, and the register allocator picks different callee-saved registers in caller_musttail_both_computed. The generated asm is equivalent. Regenerated with utils/update_llc_test_checks.py. Assisted-by: Claude (Anthropic) Co-Authored-By: Claude Opus 4.6 <[email protected]>
I crafted (still with the help of our Claude friend) locally a qemu set of tests:
I'd be happy to push it on a public repo if this can be of any value to anyone! |
lenary
left a comment
There was a problem hiding this comment.
LGTM. Please wait for Craig to review as well.
| // same-position formal parameter (musttail guarantees matching | ||
| // prototypes, so types match). The pointer survives the tail call | ||
| // since it points to the caller's caller's frame. Use the | ||
| // CopyFromReg output chain so the store is sequenced after the |
There was a problem hiding this comment.
Isn't this sequencing already guaranteed by the store using IncomingPtr as the pointer?
There was a problem hiding this comment.
I asked for this, as I think it makes the side effects clearer.
There was a problem hiding this comment.
I'm fine with the explicit chain, but maybe the comment should be improved about why it's there?
There was a problem hiding this comment.
I rewrote the comment to make the dependency a bit more explicit - does it make more sense now ?
| // Store any split parts at their respective offsets. | ||
| unsigned ArgPartOffset = Outs[OutIdx].PartOffset; | ||
| while (i + 1 != e && Outs[OutIdx + 1].OrigArgIndex == CallArgIdx) { | ||
| SDValue PartValue = OutVals[OutIdx + 1]; |
There was a problem hiding this comment.
Is it possible for PartValue to a be a scalable vector?
There was a problem hiding this comment.
I see you've updated the code, but I don't see a test.
There was a problem hiding this comment.
Yes, done in f888fe4 ; I didn't add a regression test though - this seems a tag complicated to do - and the existing non-musttail spill path does not seem to have one either.
There was a problem hiding this comment.
I applied this patch to my repo and 2 tests failed. So I think that means there is test coverage?
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index d12661631274..54474ee2b841 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -24391,8 +24391,10 @@ SDValue RISCVTargetLowering::LowerCall(CallLoweringInfo &CLI,
unsigned PartOffset = Outs[OutIdx + 1].PartOffset - ArgPartOffset;
SDValue Offset = DAG.getIntPtrConstant(PartOffset, DL);
EVT PartVT = PartValue.getValueType();
- if (PartVT.isScalableVector())
+ if (PartVT.isScalableVector()){
+ assert(0);
Offset = DAG.getNode(ISD::VSCALE, DL, XLenVT, Offset);
+ }
StoredSize += PartVT.getStoreSize();
StackAlign = std::max(StackAlign, getPrefTypeAlign(PartVT, DAG));
Parts.push_back(std::make_pair(PartValue, Offset));
Failed Tests (2):
LLVM :: CodeGen/RISCV/rvv/calling-conv-fastcc.ll
LLVM :: CodeGen/RISCV/rvv/calling-conv.ll
There was a problem hiding this comment.
Oops, indeed, I missed caller_scalable_vector_split_indirect! Added rvv/musttail-indirect-args.ll with three musttail variants
The non-musttail spill path multiplies the part offset by VSCALE for scalable vector parts. The musttail computed-value path was missing that, so scalable vector args computed in the caller and forwarded via musttail would land at the wrong offset relative to the incoming indirect pointer. Mirror the non-musttail spelling: build the offset with DAG.getIntPtrConstant, then wrap with ISD::VSCALE when the part type is a scalable vector. Reported by topperc on PR llvm#185094. Assisted-by: Claude (Anthropic) Co-Authored-By: Claude Opus 4.6 <[email protected]>
…romReg topperc pointed out on PR llvm#185094 that the data-flow edge through IncomingPtr already prevents the store from being scheduled before the CopyFromReg, so the chain edge is not strictly required for ordering. The chain edge is there to make the side-effect dependency explicit on the chain side as well, matching the convention for memory ops chaining off their producers (lenary's earlier review). Rewrite the comment to describe this accurately. Code unchanged. Assisted-by: Claude (Anthropic) Co-Authored-By: Claude Opus 4.6 <[email protected]>
…st comments topperc noted on PR llvm#185094 that the function name was wrapped across two comment lines, which makes it hard to grep. Keep the name on one line; the lines exceed 80 columns by a few chars but that matches the file-wide style for long identifiers. Assisted-by: Claude (Anthropic) Co-Authored-By: Claude Opus 4.6 <[email protected]>
Add a sibling to musttail-indirect-args.ll under rvv/ for the scalable vector path that topperc demonstrated is exercised by upstream tests (calling-conv.ll's caller_scalable_vector_split_indirect on PR llvm#185094). <vscale x 32 x i32> is too large for v8-v23, so RISCVCallingConv classifies it as CCValAssign::Indirect and the type is split into multiple parts. Three test functions cover the three branches of the musttail indirect-arg path: - caller_musttail_scalable_forwarded: both args are caller formals, zero-copy pointer forwarding, no per-part offset computation. - caller_musttail_scalable_computed: both args are arithmetic on the formals; dyn_cast<Argument> fails and we store into the incoming pointer with VSCALE-scaled per-part offsets. The asm shows the second store landing at IncomingPtr + VSCALE * 8 (vlenb << 3), confirming the VSCALE multiplication added in f888fe4. - caller_musttail_scalable_mixed: first arg forwarded, second computed. Assisted-by: Claude (Anthropic) Co-Authored-By: Claude Opus 4.6 <[email protected]>
When a
musttailcall passes arguments indirectly (fp128 on RV32, i128 on RV32), the backend allocates a stack temporary and hands the callee a pointer. The tail call deallocates the caller's frame, and the pointer dangles.Fix by forwarding the incoming indirect pointers instead. They point to the caller's caller's frame, which stays valid after the tail call. Forwarded formal parameters reuse the pointer directly; computed values get stored into the incoming buffer first.
The pointers are saved in virtual registers (
CopyToReg/CopyFromReg) rather than SDValues. The SelectionDAG is cleared between basic blocks and musttail calls can appear in non-entry blocks, so storing raw SDValues across BBs is unsound (this was the bug that led to the revert in 501417b). The vreg save only fires when the function has musttail calls; other functions see no codegen change.Non-musttail tail calls with indirect args are still rejected.
Fixes #185089.
AI Disclosure
This patch was developed with assistance from Claude (Anthropic). All code has been reviewed and validated by the author.