[RISC-V] Optimize switch table#117048
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
|
No regressions. 2 less instructions, 1 less register per switch site. Diffs are based on 36,155 contexts (12,724 MinOpts, 23,431 FullOpts). Overall (-800 bytes)
MinOpts (-80 bytes)
FullOpts (-720 bytes)
Example diffslinux.riscv64.Checked.3.mch-8 (-3.08%) : 13600.dasm - JIT.HardwareIntrinsics.Arm.Helpers:TrigonometricMultiplyAddCoefficient(float,float,byte):float (FullOpts)@@ -45,20 +45,18 @@ G_M14747_IG05: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
bltu ra, t6, G_M14747_IG07
;; size=12 bbWeight=1 PerfScore 4.50
G_M14747_IG06: ; bbWeight=0.94, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- zext.w a0, a0
auipc t6, 0xD1FFAB1E
addi a1, t6, 0xD1FFAB1E
- slli a2, a0, 2
- add a1, a1, a2
+ sh2add.uw a1, a0, a1
lw a1, 0xD1FFAB1E(a1)
lui t6, 0xD1FFAB1E
addi t6, t6, 0xD1FFAB1E
- lui a2, 0xD1FFAB1E
- slli a2, a2, 20
- add a2, a2, t6
- add a1, a1, a2
+ lui a0, 0xD1FFAB1E
+ slli a0, a0, 20
+ add a0, a0, t6
+ add a1, a1, a0
jr a1
- ;; size=52 bbWeight=0.94 PerfScore 13.65
+ ;; size=44 bbWeight=0.94 PerfScore 12.71
G_M14747_IG07: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
sext.w a0, zero
;; size=4 bbWeight=0.50 PerfScore 0.25
@@ -138,7 +136,7 @@ RWD00 dd G_M14747_IG18 - G_M14747_IG02
dd G_M14747_IG07 - G_M14747_IG02
-; Total bytes of code 260, prolog size 16, PerfScore 66.15, instruction count 49, allocated bytes for code 260 (MethodHash=8f51c664) for method JIT.HardwareIntrinsics.Arm.Helpers:TrigonometricMultiplyAddCoefficient(float,float,byte):float (FullOpts)
+; Total bytes of code 252, prolog size 16, PerfScore 65.21, instruction count 47, allocated bytes for code 252 (MethodHash=8f51c664) for method JIT.HardwareIntrinsics.Arm.Helpers:TrigonometricMultiplyAddCoefficient(float,float,byte):float (FullOpts)
; ============================================================
Unwind Info:
@@ -149,7 +147,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 65 (0x00041) Actual length = 260 (0x000104)
+ Function Length : 63 (0x0003f) Actual length = 252 (0x0000fc)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)-8 (-2.53%) : 13619.dasm - JIT.HardwareIntrinsics.Arm.Helpers:TrigonometricMultiplyAddCoefficient(double,double,byte):double (FullOpts)@@ -45,20 +45,18 @@ G_M2782_IG05: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
bltu ra, t6, G_M2782_IG08
;; size=12 bbWeight=1 PerfScore 4.50
G_M2782_IG06: ; bbWeight=0.94, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- zext.w a0, a0
auipc t6, 0xD1FFAB1E
addi a1, t6, 0xD1FFAB1E
- slli a2, a0, 2
- add a1, a1, a2
+ sh2add.uw a1, a0, a1
lw a1, 0xD1FFAB1E(a1)
lui t6, 0xD1FFAB1E
addi t6, t6, 0xD1FFAB1E
- lui a2, 0xD1FFAB1E
- slli a2, a2, 20
- add a2, a2, t6
- add a1, a1, a2
+ lui a0, 0xD1FFAB1E
+ slli a0, a0, 20
+ add a0, a0, t6
+ add a1, a1, a0
jr a1
- ;; size=52 bbWeight=0.94 PerfScore 13.65
+ ;; size=44 bbWeight=0.94 PerfScore 12.71
G_M2782_IG07: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
addiw a0, zero, 0xD1FFAB1E
slli a0, a0, 52
@@ -175,7 +173,7 @@ RWD152 dq 3F8111111110F30Ch
RWD160 dq BFC5555555555543h
-; Total bytes of code 316, prolog size 16, PerfScore 80.90, instruction count 59, allocated bytes for code 316 (MethodHash=0386f521) for method JIT.HardwareIntrinsics.Arm.Helpers:TrigonometricMultiplyAddCoefficient(double,double,byte):double (FullOpts)
+; Total bytes of code 308, prolog size 16, PerfScore 79.96, instruction count 57, allocated bytes for code 308 (MethodHash=0386f521) for method JIT.HardwareIntrinsics.Arm.Helpers:TrigonometricMultiplyAddCoefficient(double,double,byte):double (FullOpts)
; ============================================================
Unwind Info:
@@ -186,7 +184,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 79 (0x0004f) Actual length = 316 (0x00013c)
+ Function Length : 77 (0x0004d) Actual length = 308 (0x000134)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)-8 (-2.33%) : 23707.dasm - System.Xml.XPath.XNodeNavigator:get_NodeType():int:this (FullOpts)@@ -47,20 +47,18 @@ G_M63141_IG03: ; bbWeight=0.50, gcrefRegs=0600 {s1 a0}, byrefRegs=0000 {}
bltu ra, t6, G_M63141_IG05
;; size=32 bbWeight=0.50 PerfScore 7.00
G_M63141_IG04: ; bbWeight=0.45, gcrefRegs=0200 {s1}, byrefRegs=0000 {}, byref
- zext.w a1, a1
auipc t6, 0xD1FFAB1E
addi a0, t6, 0xD1FFAB1E
- slli a2, a1, 2
- add a0, a0, a2
+ sh2add.uw a0, a1, a0
lw a0, 0xD1FFAB1E(a0)
lui t6, 0xD1FFAB1E
addi t6, t6, 0xD1FFAB1E
- lui a2, 0xD1FFAB1E
- slli a2, a2, 20
- add a2, a2, t6
- add a0, a0, a2
+ lui a1, 0xD1FFAB1E
+ slli a1, a1, 20
+ add a1, a1, t6
+ add a0, a0, a1
jr a0
- ;; size=52 bbWeight=0.45 PerfScore 6.53
+ ;; size=44 bbWeight=0.45 PerfScore 6.07
G_M63141_IG05: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
; gcrRegs -[s1]
addi a0, zero, 0xD1FFAB1E
@@ -161,7 +159,7 @@ RWD00 dd G_M63141_IG17 - G_M63141_IG02
dd G_M63141_IG07 - G_M63141_IG02
-; Total bytes of code 344, prolog size 20, PerfScore 70.21, instruction count 69, allocated bytes for code 344 (MethodHash=33a2095a) for method System.Xml.XPath.XNodeNavigator:get_NodeType():int:this (FullOpts)
+; Total bytes of code 336, prolog size 20, PerfScore 69.76, instruction count 67, allocated bytes for code 336 (MethodHash=33a2095a) for method System.Xml.XPath.XNodeNavigator:get_NodeType():int:this (FullOpts)
; ============================================================
Unwind Info:
@@ -172,7 +170,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 86 (0x00056) Actual length = 344 (0x000158)
+ Function Length : 84 (0x00054) Actual length = 336 (0x000150)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)+0 (0.00%) : 36016.dasm - (dynamicClass):IL_STUB_PInvoke(ptr,byref):int (FullOpts)No diffs found? +0 (0.00%) : 36000.dasm - (dynamicClass):IL_STUB_PInvoke(nint,int,byref):int (FullOpts)No diffs found? +0 (0.00%) : 35936.dasm - Microsoft.Diagnostics.Tracing.TraceEventDispatcher:Dispose(bool):this (FullOpts)No diffs found? DetailsSize improvements/regressions per collection
PerfScore improvements/regressions per collection
Context information
jit-analyze output |
RISC-V Release-CLR-QEMU: 9085 / 9116 (99.66%)report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-CLR-VF2: 9086 / 9116 (99.67%)report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-QEMU: 283633 / 284704 (99.62%)report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-VF2: 510214 / 511957 (99.66%)report.xml, report.md, failures.xml, testclr_details.tar.zst Build information and commandsGIT: |
RISC-V Release-CLR-QEMU: 9078 / 9108 (99.67%)report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-CLR-VF2: 9079 / 9109 (99.67%)report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-QEMU: 283909 / 284982 (99.62%)report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-VF2: 509103 / 510859 (99.66%)report.xml, report.md, failures.xml, testclr_details.tar.zst Build information and commandsGIT: |
|
@jakobbotsch, PTAL if we need to take this for .NET10 or can push out to .NET 11. |
Regression after dotnet#117048, using `idxReg` as temp results in early clobber if it's used later
Regression after dotnet#117048, using `idxReg` as temp clobbered the value if it was used later
Regression after #117048, using `idxReg` as temp clobbered the value if it was used later
Don't allocate temp register, use
sh2add.Part of #84834, cc @dotnet/samsung