[PowerPC] Fix instruction sizes / branch relaxation#175556
Conversation
|
@llvm/pr-subscribers-backend-powerpc Author: Nikita Popov (nikic) ChangesFor PowerPC, having accurate (or at least not too small) instruction sizes is critical, because the PPCBranchSelector pass relies on them. Underestimating the size of an instruction can result in the wrong branch kind being chosen, which will result in an MC error. This patch introduces validation that the instruction size reported by TII matches the actually emitted instruction size, and fixes various cases where this was not the case. Fixes #175190. 4 Files Affected:
diff --git a/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp b/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
index 718d51c5a0673..863f35e6699b3 100644
--- a/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
+++ b/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
@@ -32,6 +32,7 @@
#include "llvm/ADT/StringExtras.h"
#include "llvm/ADT/StringRef.h"
#include "llvm/ADT/Twine.h"
+#include "llvm/ADT/ScopeExit.h"
#include "llvm/BinaryFormat/ELF.h"
#include "llvm/CodeGen/AsmPrinter.h"
#include "llvm/CodeGen/MachineBasicBlock.h"
@@ -916,6 +917,32 @@ void PPCAsmPrinter::emitInstruction(const MachineInstr *MI) {
return PPC::S_None;
};
+#ifndef NDEBUG
+ // Instruction sizes must be correct for PPCBranchSelector to pick the
+ // right branch kind. Verify that the reported sizes and the actually
+ // emitted sizes match.
+ unsigned ExpectedSize = Subtarget->getInstrInfo()->getInstSizeInBytes(*MI);
+ MCFragment *OldFragment = OutStreamer->getCurrentFragment();
+ size_t OldFragSize = OldFragment->getFixedSize();
+ scope_exit VerifyInstSize([&]() {
+ if (!OutStreamer->isObj())
+ return; // Can only verify size when streaming to object.
+ MCFragment *NewFragment = OutStreamer->getCurrentFragment();
+ if (NewFragment != OldFragment)
+ return; // Don't try to handle fragment splitting cases.
+ unsigned ActualSize = NewFragment->getFixedSize() - OldFragSize;
+ // For now, allow over-estimates here. Some pseudos expand to a variable
+ // number of instructions, and for correctness using the upper bound size
+ // is fine.
+ if (ActualSize > ExpectedSize) {
+ dbgs() << "Size mismatch for: " << *MI << "\n";
+ dbgs() << "Expected size: " << ExpectedSize << "\n";
+ dbgs() << "Actual size: " << ActualSize << "\n";
+ abort();
+ }
+ });
+#endif
+
// Lower multi-instruction pseudo operations.
switch (MI->getOpcode()) {
default: break;
diff --git a/llvm/lib/Target/PowerPC/PPCInstr64Bit.td b/llvm/lib/Target/PowerPC/PPCInstr64Bit.td
index 620dfd4738226..16d24f05525da 100644
--- a/llvm/lib/Target/PowerPC/PPCInstr64Bit.td
+++ b/llvm/lib/Target/PowerPC/PPCInstr64Bit.td
@@ -1507,21 +1507,25 @@ class GETtlsADDRPseudo <string asmstr> : PPCEmitTimePseudo<(outs g8rc:$rD), (ins
asmstr,
[(set i64:$rD,
(PPCgetTlsAddr i64:$reg, tglobaltlsaddr:$sym))]>,
- isPPC64;
+ isPPC64 {
+ // May be either 4 or 8 bytes, specify upper bound.
+ let Size = 8;
+}
class GETtlsldADDRPseudo <string asmstr> : PPCEmitTimePseudo<(outs g8rc:$rD), (ins g8rc:$reg, tlsgd:$sym),
asmstr,
[(set i64:$rD,
(PPCgetTlsldAddr i64:$reg, tglobaltlsaddr:$sym))]>,
- isPPC64;
+ isPPC64 {
+ // May be either 4 or 8 bytes, specify upper bound.
+ let Size = 8;
+}
let hasExtraSrcRegAllocReq = 1, hasExtraDefRegAllocReq = 1 in {
// LR8 is a true define, while the rest of the Defs are clobbers. X3 is
// explicitly defined when this op is created, so not mentioned here.
-// This is lowered to BL8_NOP_TLS by the assembly printer, so the size must be
-// correct because the branch select pass is relying on it.
-let Defs = [X0,X4,X5,X6,X7,X8,X9,X10,X11,X12,LR8,CTR8,CR0,CR1,CR5,CR6,CR7], Size = 8 in
+let Defs = [X0,X4,X5,X6,X7,X8,X9,X10,X11,X12,LR8,CTR8,CR0,CR1,CR5,CR6,CR7] in
def GETtlsADDR : GETtlsADDRPseudo <"#GETtlsADDR">;
-let Defs = [X0,X2,X4,X5,X6,X7,X8,X9,X10,X11,X12,LR8,CTR8,CR0,CR1,CR5,CR6,CR7], Size = 8 in
+let Defs = [X0,X2,X4,X5,X6,X7,X8,X9,X10,X11,X12,LR8,CTR8,CR0,CR1,CR5,CR6,CR7] in
def GETtlsADDRPCREL : GETtlsADDRPseudo <"#GETtlsADDRPCREL">;
// LR8 is a true define, while the rest of the Defs are clobbers. X3 is
diff --git a/llvm/lib/Target/PowerPC/PPCInstrInfo.cpp b/llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
index 58ad12e2ce65a..d6fc9cdfab5b0 100644
--- a/llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
+++ b/llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
@@ -3013,7 +3013,18 @@ unsigned PPCInstrInfo::getInstSizeInBytes(const MachineInstr &MI) const {
return Opers.getNumPatchBytes();
} else if (Opcode == TargetOpcode::PATCHPOINT) {
PatchPointOpers Opers(&MI);
- return Opers.getNumPatchBytes();
+ // The call sequence is up to 44 bytes large.
+ // TODO: Per LangRef the client must ensure that the number of bytes is
+ // large enough, but many tests use patchpoint with 40 bytes.
+ return std::max(Opers.getNumPatchBytes(), 44u);
+ } else if (Opcode == TargetOpcode::PATCHABLE_FUNCTION_ENTER) {
+ const MachineFunction *MF = MI.getParent()->getParent();
+ const Function &F = MF->getFunction();
+ unsigned Num = 0;
+ (void)F.getFnAttribute("patchable-function-entry")
+ .getValueAsString()
+ .getAsInteger(10, Num);
+ return Num * 4;
} else {
return get(Opcode).getSize();
}
diff --git a/llvm/lib/Target/PowerPC/PPCInstrInfo.td b/llvm/lib/Target/PowerPC/PPCInstrInfo.td
index fdccddd86b9b7..130ee2a7c4a97 100644
--- a/llvm/lib/Target/PowerPC/PPCInstrInfo.td
+++ b/llvm/lib/Target/PowerPC/PPCInstrInfo.td
@@ -2489,7 +2489,9 @@ def EnforceIEIO : XForm_24_eieio<31, 854, (outs), (ins),
"eieio", IIC_LdStLoad, []>;
def PseudoEIEIO : PPCEmitTimePseudo<(outs), (ins), "#PPCEIEIO",
- [(int_ppc_eieio)]>;
+ [(int_ppc_eieio)]> {
+ let Size = 12;
+}
def : Pat<(int_ppc_sync), (SYNC 0)>, Requires<[HasSYNC]>;
def : Pat<(int_ppc_iospace_sync), (SYNC 0)>, Requires<[HasSYNC]>;
@@ -3448,14 +3450,18 @@ def : Pat<(add i32:$in, (PPChi tblockaddress:$g, 0)),
// Support for thread-local storage.
def PPC32GOT: PPCEmitTimePseudo<(outs gprc:$rD), (ins), "#PPC32GOT",
- [(set i32:$rD, (PPCppc32GOT))]>;
+ [(set i32:$rD, (PPCppc32GOT))]> {
+ let Size = 8;
+}
// Get the _GLOBAL_OFFSET_TABLE_ in PIC mode.
// This uses two output registers, the first as the real output, the second as a
// temporary register, used internally in code generation. A "bl" also clobbers LR.
let Defs = [LR] in
def PPC32PICGOT: PPCEmitTimePseudo<(outs gprc:$rD, gprc:$rT), (ins), "#PPC32PICGOT",
- []>;
+ []> {
+ let Size = 20;
+}
def LDgotTprelL32: PPCEmitTimePseudo<(outs gprc_nor0:$rD), (ins s16imm:$disp, gprc_nor0:$reg),
"#LDgotTprelL32",
@@ -3581,7 +3587,9 @@ def ADDItocL : PPCEmitTimePseudo<(outs gprc:$rD), (ins gprc_nor0:$reg, tocentry3
// Get Global (GOT) Base Register offset, from the word immediately preceding
// the function label.
-def UpdateGBR : PPCEmitTimePseudo<(outs gprc:$rD, gprc:$rT), (ins gprc:$rI), "#UpdateGBR", []>;
+def UpdateGBR : PPCEmitTimePseudo<(outs gprc:$rD, gprc:$rT), (ins gprc:$rI), "#UpdateGBR", []> {
+ let Size = 8;
+}
// Pseudo-instruction marked for deletion. When deleting the instruction would
// cause iterator invalidation in MIR transformation passes, this pseudo can be
|
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
cf1875b to
5059f6b
Compare
|
Probably worth noting that because most tests don't use |
|
Ping. I'd appreciate a review on this one as it's a blocker for LLVM 22. |
5059f6b to
56840e1
Compare
lei137
left a comment
There was a problem hiding this comment.
Would you be able to provide some reasonable sized IR tests for this patch? The test in the original issue is way too big. I tried running the original IR through bug-point but it wasn't able to reduce it.
| case PPC::GETtlsldADDRPCREL: | ||
| if (MI.getOperand(2).getTargetFlags() == PPCII::MO_GOT_TLSGD_PCREL_FLAG || | ||
| MI.getOperand(2).getTargetFlags() == PPCII::MO_GOT_TLSLD_PCREL_FLAG) | ||
| return 4; |
There was a problem hiding this comment.
These opcodes are named so that they indicate whether the pcrel flag is set or not. If you feel the need to double check here I don't have an issue with it. However, I don't think we should remove explicitly setting the size for them in the td definition file.
I find the original test case passes even if I remove these case block and apply the following diff to the td file:
diff --git a/llvm/lib/Target/PowerPC/PPCInstr64Bit.td b/llvm/lib/Target/PowerPC/PPCInstr64Bit.td
index 10d090480604..0e075a3415a8 100644
--- a/llvm/lib/Target/PowerPC/PPCInstr64Bit.td
+++ b/llvm/lib/Target/PowerPC/PPCInstr64Bit.td
@@ -1524,14 +1524,14 @@ class GETtlsldADDRPseudo <string asmstr> : PPCEmitTimePseudo<(outs g8rc:$rD), (i
let hasExtraSrcRegAllocReq = 1, hasExtraDefRegAllocReq = 1 in {
// LR8 is a true define, while the rest of the Defs are clobbers. X3 is
// explicitly defined when this op is created, so not mentioned here.
-let Defs = [X0,X4,X5,X6,X7,X8,X9,X10,X11,X12,LR8,CTR8,CR0,CR1,CR5,CR6,CR7] in
+let Defs = [X0,X4,X5,X6,X7,X8,X9,X10,X11,X12,LR8,CTR8,CR0,CR1,CR5,CR6,CR7], Size = 8 in
def GETtlsADDR : GETtlsADDRPseudo <"#GETtlsADDR">;
let Defs = [X0,X2,X4,X5,X6,X7,X8,X9,X10,X11,X12,LR8,CTR8,CR0,CR1,CR5,CR6,CR7] in
def GETtlsADDRPCREL : GETtlsADDRPseudo <"#GETtlsADDRPCREL">;
// LR8 is a true define, while the rest of the Defs are clobbers. X3 is
// explicitly defined when this op is created, so not mentioned here.
-let Defs = [X0,X4,X5,X6,X7,X8,X9,X10,X11,X12,LR8,CTR8,CR0,CR1,CR5,CR6,CR7] in
+let Defs = [X0,X4,X5,X6,X7,X8,X9,X10,X11,X12,LR8,CTR8,CR0,CR1,CR5,CR6,CR7], Size = 8 in
def GETtlsldADDR : GETtlsldADDRPseudo <"#GETtlsldADDR">;
let Defs = [X0,X2,X4,X5,X6,X7,X8,X9,X10,X11,X12,LR8,CTR8,CR0,CR1,CR5,CR6,CR7] in
def GETtlsldADDRPCREL : GETtlsldADDRPseudo <"#GETtlsldADDRPCREL">;
There was a problem hiding this comment.
Hm... I'm not sure what I was doing wrong before, I think I just got confused by different GETtls opcodes.
As you say, defining the sizes in TableGen does work fine. The only thing that was wrong with the original definitions is that the Size = 8 was on GETtlsADDRPCREL instead of GETtlsldADDR.
I've restored the size setting in TableGen and dropped the corresponding code in InstrInfo.
| // The call sequence is up to 44 bytes large. | ||
| // TODO: Per LangRef the client must ensure that the number of bytes is | ||
| // large enough, but many tests use patchpoint with 40 bytes. | ||
| return std::max(Opers.getNumPatchBytes(), 44u); |
There was a problem hiding this comment.
I posted PR #177453 to change the assertion in LowerPATCHPOINT to a fatal error when the bytes requested aren't enough to contain the call. That way we can use the bytes size reported by the operands here without worry its incorrect. FWIW I think the longest call sequence is indeed 40 bytes, however the assertion wouldn't trigger as we were miss-counting the instructions we emitted.
There was a problem hiding this comment.
@mandlebug what do you mean by "miss-counting". Does that need to be fixed?
There was a problem hiding this comment.
Thanks! I've rebased on top of your change, which allowed dropping the adjustment for PATCHPOINT.
mandlebug
left a comment
There was a problem hiding this comment.
Other then Lei's outstanding comments LGTM.
For PowerPC, having accurate (or at least not too small) instruction sizes is critical, because the PPCBranchSelector pass relies on them. Underestimating the size of an instruction can result in the wrong branch kind being chosen, which will result in an MC error. This patch introduces validation that the instruction size reported by TII matches the actually emitted instruction size, and fixes various cases where this was not the case.
I must have gotten confused with the different opcodes in previous testing. These do have fixed sizes, one of the "Size = 8" ones was just on the wrong opcode.
56840e1 to
854bb06
Compare
lei137
left a comment
There was a problem hiding this comment.
This LGTM.
Thanks for fixes!
|
A test case if you can manage it would be good though. |
|
The problem with the test case is that it needs a jump across 2^16 bytes (~2^14 instructions), so it ends up being very large. That's why I didn't include it. |
Yes, it's difficult. I spoke to @mandlebug about this too and we couldn't think of a way to generate one either. |
For PowerPC, having accurate (or at least not too small) instruction sizes is critical, because the PPCBranchSelector pass relies on them. Underestimating the size of an instruction can result in the wrong branch kind being chosen, which will result in an MC error. This patch introduces validation that the instruction size reported by TII matches the actually emitted instruction size, and fixes various cases where this was not the case. Fixes llvm#175190. (cherry picked from commit 45abb30)
For PowerPC, having accurate (or at least not too small) instruction sizes is critical, because the PPCBranchSelector pass relies on them. Underestimating the size of an instruction can result in the wrong branch kind being chosen, which will result in an MC error.
This patch introduces validation that the instruction size reported by TII matches the actually emitted instruction size, and fixes various cases where this was not the case.
Fixes #175190.