[AArch64][SME] Disable tail calls in new ZA/ZT0 functions#177152
Merged
Conversation
Allowing this can result in invalid tail calls to shared ZA functions. It may be possible to limit this to the case where the caller is private ZA and the callee shares ZA, but for now it is generally disabled.
Member
|
@llvm/pr-subscribers-backend-aarch64 Author: Benjamin Maxwell (MacDue) ChangesAllowing this can result in invalid tail calls to shared ZA functions. It may be possible to limit this to the case where the caller is private ZA and the callee shares ZA, but for now it is generally disabled. 2 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 2e96abfce72df..f16aa0188f9d5 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -9359,7 +9359,8 @@ bool AArch64TargetLowering::isEligibleForTailCallOptimization(
if (CallAttrs.requiresSMChange() || CallAttrs.requiresLazySave() ||
CallAttrs.requiresPreservingAllZAState() ||
CallAttrs.requiresPreservingZT0() ||
- CallAttrs.caller().hasStreamingBody())
+ CallAttrs.caller().hasStreamingBody() || CallAttrs.caller().isNewZA() ||
+ CallAttrs.caller().isNewZT0())
return false;
// Functions using the C or Fast calling convention that have an SVE signature
diff --git a/llvm/test/CodeGen/AArch64/sme-new-za-zt0-no-tail-call.ll b/llvm/test/CodeGen/AArch64/sme-new-za-zt0-no-tail-call.ll
new file mode 100644
index 0000000000000..3c76132556600
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/sme-new-za-zt0-no-tail-call.ll
@@ -0,0 +1,78 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sme2 -O3 -verify-machineinstrs < %s | FileCheck %s
+
+declare void @inout_za_zt0() "aarch64_inout_za" "aarch64_inout_zt0"
+
+define void @new_za_zt0() "aarch64_new_za" "aarch64_new_zt0" {
+; CHECK-LABEL: new_za_zt0:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill
+; CHECK-NEXT: .cfi_def_cfa_offset 16
+; CHECK-NEXT: .cfi_offset w30, -16
+; CHECK-NEXT: mrs x8, TPIDR2_EL0
+; CHECK-NEXT: cbz x8, .LBB0_2
+; CHECK-NEXT: // %bb.1: // %entry
+; CHECK-NEXT: bl __arm_tpidr2_save
+; CHECK-NEXT: msr TPIDR2_EL0, xzr
+; CHECK-NEXT: zero {za}
+; CHECK-NEXT: zero { zt0 }
+; CHECK-NEXT: .LBB0_2: // %entry
+; CHECK-NEXT: smstart za
+; CHECK-NEXT: bl inout_za_zt0
+; CHECK-NEXT: smstop za
+; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT: ret
+entry:
+ tail call void @inout_za_zt0()
+ ret void
+}
+
+declare void @inout_za() "aarch64_inout_za"
+
+define void @new_za() "aarch64_new_za" {
+; CHECK-LABEL: new_za:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill
+; CHECK-NEXT: .cfi_def_cfa_offset 16
+; CHECK-NEXT: .cfi_offset w30, -16
+; CHECK-NEXT: mrs x8, TPIDR2_EL0
+; CHECK-NEXT: cbz x8, .LBB1_2
+; CHECK-NEXT: // %bb.1: // %entry
+; CHECK-NEXT: bl __arm_tpidr2_save
+; CHECK-NEXT: msr TPIDR2_EL0, xzr
+; CHECK-NEXT: zero {za}
+; CHECK-NEXT: .LBB1_2: // %entry
+; CHECK-NEXT: smstart za
+; CHECK-NEXT: bl inout_za
+; CHECK-NEXT: smstop za
+; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT: ret
+entry:
+ tail call void @inout_za()
+ ret void
+}
+
+declare void @inout_zt0() "aarch64_inout_zt0"
+
+define void @new_zt0() "aarch64_new_zt0" {
+; CHECK-LABEL: new_zt0:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill
+; CHECK-NEXT: .cfi_def_cfa_offset 16
+; CHECK-NEXT: .cfi_offset w30, -16
+; CHECK-NEXT: mrs x8, TPIDR2_EL0
+; CHECK-NEXT: cbz x8, .LBB2_2
+; CHECK-NEXT: // %bb.1: // %entry
+; CHECK-NEXT: bl __arm_tpidr2_save
+; CHECK-NEXT: msr TPIDR2_EL0, xzr
+; CHECK-NEXT: zero { zt0 }
+; CHECK-NEXT: .LBB2_2: // %entry
+; CHECK-NEXT: smstart za
+; CHECK-NEXT: bl inout_zt0
+; CHECK-NEXT: smstop za
+; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT: ret
+entry:
+ tail call void @inout_zt0()
+ ret void
+}
|
sdesmalen-arm
approved these changes
Jan 21, 2026
Contributor
sdesmalen-arm
left a comment
There was a problem hiding this comment.
Good find! I hope this was the last of them.
Member
Author
|
/cherry-pick 10aca26 |
Member
|
/pull-request #177169 |
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/59/builds/30443 Here is the relevant piece of the build log for the reference |
c-rhodes
pushed a commit
to llvmbot/llvm-project
that referenced
this pull request
Jan 22, 2026
Allowing this can result in invalid tail calls to shared ZA functions. It may be possible to limit this to the case where the caller is private ZA and the callee shares ZA, but for now it is generally disabled. (cherry picked from commit 10aca26)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Allowing this can result in invalid tail calls to shared ZA functions.
It may be possible to limit this to the case where the caller is private ZA and the callee shares ZA, but for now it is generally disabled.