X86: Fix win64 tail call regression for tail call to loaded pointer#158055
Merged
X86: Fix win64 tail call regression for tail call to loaded pointer#158055
Conversation
Fix regression after 62f2641. Previous patch handled the register case, but the memory case snuck another use of ptr_rc_tailcall hidden inside i64mem_TC
Contributor
Author
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Member
|
@llvm/pr-subscribers-backend-x86 Author: Matt Arsenault (arsenm) ChangesFix regression after 62f2641. Previous 9 Files Affected:
diff --git a/llvm/lib/Target/X86/X86AsmPrinter.cpp b/llvm/lib/Target/X86/X86AsmPrinter.cpp
index ff22ee8c86fac..a7734e9200a19 100644
--- a/llvm/lib/Target/X86/X86AsmPrinter.cpp
+++ b/llvm/lib/Target/X86/X86AsmPrinter.cpp
@@ -478,9 +478,9 @@ static bool isIndirectBranchOrTailCall(const MachineInstr &MI) {
Opc == X86::TAILJMPr64 || Opc == X86::TAILJMPm64 ||
Opc == X86::TCRETURNri || Opc == X86::TCRETURN_WIN64ri ||
Opc == X86::TCRETURN_HIPE32ri || Opc == X86::TCRETURNmi ||
- Opc == X86::TCRETURNri64 || Opc == X86::TCRETURNmi64 ||
- Opc == X86::TCRETURNri64_ImpCall || Opc == X86::TAILJMPr64_REX ||
- Opc == X86::TAILJMPm64_REX;
+ Opc == X86::TCRETURN_WINmi64 || Opc == X86::TCRETURNri64 ||
+ Opc == X86::TCRETURNmi64 || Opc == X86::TCRETURNri64_ImpCall ||
+ Opc == X86::TAILJMPr64_REX || Opc == X86::TAILJMPm64_REX;
}
void X86AsmPrinter::emitBasicBlockEnd(const MachineBasicBlock &MBB) {
diff --git a/llvm/lib/Target/X86/X86ExpandPseudo.cpp b/llvm/lib/Target/X86/X86ExpandPseudo.cpp
index 9457e718de699..4a9b824b0db14 100644
--- a/llvm/lib/Target/X86/X86ExpandPseudo.cpp
+++ b/llvm/lib/Target/X86/X86ExpandPseudo.cpp
@@ -276,8 +276,10 @@ bool X86ExpandPseudo::expandMI(MachineBasicBlock &MBB,
case X86::TCRETURNdi64cc:
case X86::TCRETURNri64:
case X86::TCRETURNri64_ImpCall:
- case X86::TCRETURNmi64: {
- bool isMem = Opcode == X86::TCRETURNmi || Opcode == X86::TCRETURNmi64;
+ case X86::TCRETURNmi64:
+ case X86::TCRETURN_WINmi64: {
+ bool isMem = Opcode == X86::TCRETURNmi || Opcode == X86::TCRETURNmi64 ||
+ Opcode == X86::TCRETURN_WINmi64;
MachineOperand &JumpTarget = MBBI->getOperand(0);
MachineOperand &StackAdjust = MBBI->getOperand(isMem ? X86::AddrNumOperands
: 1);
@@ -341,7 +343,8 @@ bool X86ExpandPseudo::expandMI(MachineBasicBlock &MBB,
MIB.addImm(MBBI->getOperand(2).getImm());
}
- } else if (Opcode == X86::TCRETURNmi || Opcode == X86::TCRETURNmi64) {
+ } else if (Opcode == X86::TCRETURNmi || Opcode == X86::TCRETURNmi64 ||
+ Opcode == X86::TCRETURN_WINmi64) {
unsigned Op = (Opcode == X86::TCRETURNmi)
? X86::TAILJMPm
: (IsX64 ? X86::TAILJMPm64_REX : X86::TAILJMPm64);
diff --git a/llvm/lib/Target/X86/X86FrameLowering.cpp b/llvm/lib/Target/X86/X86FrameLowering.cpp
index a293b4c87cfe4..08c9d738baceb 100644
--- a/llvm/lib/Target/X86/X86FrameLowering.cpp
+++ b/llvm/lib/Target/X86/X86FrameLowering.cpp
@@ -2402,7 +2402,7 @@ static bool isTailCallOpcode(unsigned Opc) {
Opc == X86::TCRETURN_HIPE32ri || Opc == X86::TCRETURNdi ||
Opc == X86::TCRETURNmi || Opc == X86::TCRETURNri64 ||
Opc == X86::TCRETURNri64_ImpCall || Opc == X86::TCRETURNdi64 ||
- Opc == X86::TCRETURNmi64;
+ Opc == X86::TCRETURNmi64 || Opc == X86::TCRETURN_WINmi64;
}
void X86FrameLowering::emitEpilogue(MachineFunction &MF,
diff --git a/llvm/lib/Target/X86/X86InstrCompiler.td b/llvm/lib/Target/X86/X86InstrCompiler.td
index 5a0df058b27f6..af7a33abaf758 100644
--- a/llvm/lib/Target/X86/X86InstrCompiler.td
+++ b/llvm/lib/Target/X86/X86InstrCompiler.td
@@ -1364,15 +1364,19 @@ def : Pat<(X86tcret ptr_rc_tailcall:$dst, timm:$off),
// There wouldn't be enough scratch registers for base+index.
def : Pat<(X86tcret_6regs (load addr:$dst), timm:$off),
(TCRETURNmi64 addr:$dst, timm:$off)>,
- Requires<[In64BitMode, NotUseIndirectThunkCalls]>;
+ Requires<[In64BitMode, IsNotWin64CCFunc, NotUseIndirectThunkCalls]>;
+
+def : Pat<(X86tcret_6regs (load addr:$dst), timm:$off),
+ (TCRETURN_WINmi64 addr:$dst, timm:$off)>,
+ Requires<[IsWin64CCFunc, NotUseIndirectThunkCalls]>;
def : Pat<(X86tcret ptr_rc_tailcall:$dst, timm:$off),
(INDIRECT_THUNK_TCRETURN64 ptr_rc_tailcall:$dst, timm:$off)>,
- Requires<[In64BitMode, UseIndirectThunkCalls]>;
+ Requires<[In64BitMode, IsNotWin64CCFunc, UseIndirectThunkCalls]>;
def : Pat<(X86tcret ptr_rc_tailcall:$dst, timm:$off),
(INDIRECT_THUNK_TCRETURN32 ptr_rc_tailcall:$dst, timm:$off)>,
- Requires<[Not64BitMode, UseIndirectThunkCalls]>;
+ Requires<[Not64BitMode, IsNotWin64CCFunc, UseIndirectThunkCalls]>;
def : Pat<(X86tcret (i64 tglobaladdr:$dst), timm:$off),
(TCRETURNdi64 tglobaladdr:$dst, timm:$off)>,
@@ -2215,7 +2219,7 @@ let Predicates = [HasZU] in {
def : Pat<(i64 (zext (i16 (mul (loadi16 addr:$src1), imm:$src2)))),
(SUBREG_TO_REG (i64 0), (IMULZU16rmi addr:$src1, imm:$src2), sub_16bit)>;
}
-
+
// mul reg, imm
def : Pat<(mul GR16:$src1, imm:$src2),
(IMUL16rri GR16:$src1, imm:$src2)>;
diff --git a/llvm/lib/Target/X86/X86InstrControl.td b/llvm/lib/Target/X86/X86InstrControl.td
index 139aedd473ebc..d962bfff1444d 100644
--- a/llvm/lib/Target/X86/X86InstrControl.td
+++ b/llvm/lib/Target/X86/X86InstrControl.td
@@ -372,6 +372,9 @@ let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1,
def TCRETURNmi64 : PseudoI<(outs),
(ins i64mem_TC:$dst, i32imm:$offset),
[]>, Sched<[WriteJumpLd]>;
+ def TCRETURN_WINmi64 : PseudoI<(outs),
+ (ins i64mem_w64TC:$dst, i32imm:$offset),
+ []>, Sched<[WriteJumpLd]>;
def TAILJMPd64 : PseudoI<(outs), (ins i64i32imm_brtarget:$dst),
[]>, Sched<[WriteJump]>;
diff --git a/llvm/lib/Target/X86/X86InstrOperands.td b/llvm/lib/Target/X86/X86InstrOperands.td
index 53a6b7c4c4c92..80843f6bb80e6 100644
--- a/llvm/lib/Target/X86/X86InstrOperands.td
+++ b/llvm/lib/Target/X86/X86InstrOperands.td
@@ -141,6 +141,11 @@ def i64mem_TC : X86MemOperand<"printqwordmem", X86Mem64AsmOperand, 64> {
ptr_rc_tailcall, i32imm, SEGMENT_REG);
}
+def i64mem_w64TC : X86MemOperand<"printqwordmem", X86Mem64AsmOperand, 64> {
+ let MIOperandInfo = (ops GR64_TCW64, i8imm,
+ GR64_TCW64, i32imm, SEGMENT_REG);
+}
+
// Special parser to detect 16-bit mode to select 16-bit displacement.
def X86AbsMemMode16AsmOperand : AsmOperandClass {
let Name = "AbsMemMode16";
diff --git a/llvm/lib/Target/X86/X86RegisterInfo.cpp b/llvm/lib/Target/X86/X86RegisterInfo.cpp
index 9ec04e740a08b..7963dc1b755c9 100644
--- a/llvm/lib/Target/X86/X86RegisterInfo.cpp
+++ b/llvm/lib/Target/X86/X86RegisterInfo.cpp
@@ -1010,6 +1010,7 @@ unsigned X86RegisterInfo::findDeadCallerSavedReg(
case X86::TCRETURNri64:
case X86::TCRETURNri64_ImpCall:
case X86::TCRETURNmi64:
+ case X86::TCRETURN_WINmi64:
case X86::EH_RETURN:
case X86::EH_RETURN64: {
LiveRegUnits LRU(*this);
diff --git a/llvm/lib/Target/X86/X86SpeculativeLoadHardening.cpp b/llvm/lib/Target/X86/X86SpeculativeLoadHardening.cpp
index 4cc456ece77e0..c28de14a97874 100644
--- a/llvm/lib/Target/X86/X86SpeculativeLoadHardening.cpp
+++ b/llvm/lib/Target/X86/X86SpeculativeLoadHardening.cpp
@@ -893,6 +893,7 @@ void X86SpeculativeLoadHardeningPass::unfoldCallAndJumpLoads(
case X86::TAILJMPm64_REX:
case X86::TAILJMPm:
case X86::TCRETURNmi64:
+ case X86::TCRETURN_WINmi64:
case X86::TCRETURNmi: {
// Use the generic unfold logic now that we know we're dealing with
// expected instructions.
diff --git a/llvm/test/CodeGen/X86/win64-tailcall-memory.ll b/llvm/test/CodeGen/X86/win64-tailcall-memory.ll
new file mode 100644
index 0000000000000..568f4fe04fea9
--- /dev/null
+++ b/llvm/test/CodeGen/X86/win64-tailcall-memory.ll
@@ -0,0 +1,48 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc -mtriple=x86_64-unknown-windows-gnu < %s | FileCheck %s
+
+; Check calling convention is correct for win64 when doing a tailcall
+; for a pointer loaded from memory.
+
+declare void @foo(i64, ptr)
+
+define void @do_tailcall(ptr %objp) nounwind {
+; CHECK-LABEL: do_tailcall:
+; CHECK: # %bb.0:
+; CHECK-NEXT: pushq %rsi
+; CHECK-NEXT: subq $32, %rsp
+; CHECK-NEXT: movq %rcx, %rsi
+; CHECK-NEXT: xorl %ecx, %ecx
+; CHECK-NEXT: xorl %edx, %edx
+; CHECK-NEXT: callq foo
+; CHECK-NEXT: xorl %ecx, %ecx
+; CHECK-NEXT: movq %rsi, %rax
+; CHECK-NEXT: addq $32, %rsp
+; CHECK-NEXT: popq %rsi
+; CHECK-NEXT: rex64 jmpq *(%rax) # TAILCALL
+ tail call void @foo(i64 0, ptr null)
+ %fptr = load ptr, ptr %objp, align 8
+ tail call void %fptr(ptr null)
+ ret void
+}
+
+; Make sure aliases of ccc are also treated as win64 functions
+define fastcc void @do_tailcall_fastcc(ptr %objp) nounwind {
+; CHECK-LABEL: do_tailcall_fastcc:
+; CHECK: # %bb.0:
+; CHECK-NEXT: pushq %rsi
+; CHECK-NEXT: subq $32, %rsp
+; CHECK-NEXT: movq %rcx, %rsi
+; CHECK-NEXT: xorl %ecx, %ecx
+; CHECK-NEXT: xorl %edx, %edx
+; CHECK-NEXT: callq foo
+; CHECK-NEXT: xorl %ecx, %ecx
+; CHECK-NEXT: movq %rsi, %rax
+; CHECK-NEXT: addq $32, %rsp
+; CHECK-NEXT: popq %rsi
+; CHECK-NEXT: rex64 jmpq *(%rax) # TAILCALL
+ tail call void @foo(i64 0, ptr null)
+ %fptr = load ptr, ptr %objp, align 8
+ tail call fastcc void %fptr(ptr null)
+ ret void
+}
|
Member
|
Thanks, I can confirm that this at least fixes the one case of build breakage from #156880 that I had narrowed down. (I ran into breakage in a couple more places as well, but reproducing those cases take a notable amount of time.) |
Member
|
If we can’t get this one landed soon, I’d like to revert #156880 to unbreak things. |
rnk
approved these changes
Sep 11, 2025
nikic
added a commit
to nikic/llvm-project
that referenced
this pull request
Jan 14, 2026
llvm#158055 added a IsNotWin64CCFunc predicate to these cases for reasons that are not super clear to me, which causes selection failures as this combination is not covered elsewhere.
llvm-sync Bot
pushed a commit
to arm/arm-toolchain
that referenced
this pull request
Jan 15, 2026
…ation (#175977) llvm/llvm-project#158055 added a IsNotWin64CCFunc predicate to these cases for reasons that are not super clear to me, which causes selection failures as this combination is not covered elsewhere. Fixes llvm/llvm-project#175965.
c-rhodes
pushed a commit
to llvmbot/llvm-project
that referenced
this pull request
Jan 15, 2026
…#175977) llvm#158055 added a IsNotWin64CCFunc predicate to these cases for reasons that are not super clear to me, which causes selection failures as this combination is not covered elsewhere. Fixes llvm#175965. (cherry picked from commit 2789ad2)
llvm-sync Bot
pushed a commit
to arm/arm-toolchain
that referenced
this pull request
Jan 15, 2026
…ation (#175977) llvm/llvm-project#158055 added a IsNotWin64CCFunc predicate to these cases for reasons that are not super clear to me, which causes selection failures as this combination is not covered elsewhere. Fixes llvm/llvm-project#175965. (cherry picked from commit 2789ad2)
Priyanshu3820
pushed a commit
to Priyanshu3820/llvm-project
that referenced
this pull request
Jan 18, 2026
…#175977) llvm#158055 added a IsNotWin64CCFunc predicate to these cases for reasons that are not super clear to me, which causes selection failures as this combination is not covered elsewhere. Fixes llvm#175965.
navaneethshan
pushed a commit
to qualcomm/cpullvm-toolchain
that referenced
this pull request
Jan 19, 2026
…977) llvm/llvm-project#158055 added a IsNotWin64CCFunc predicate to these cases for reasons that are not super clear to me, which causes selection failures as this combination is not covered elsewhere. Fixes llvm/llvm-project#175965. (cherry picked from commit 2789ad2) (cherry picked from commit 04c715a9701bbfec1b08cd0fce8e49edfcc0fc8f)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Fix regression after 62f2641. Previous
patch handled the register case, but the memory case snuck another use
of ptr_rc_tailcall hidden inside i64mem_TC