JIT: Straighten out flow during early jump threading#104603
JIT: Straighten out flow during early jump threading#104603jakobbotsch merged 3 commits intodotnet:mainfrom
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
After early jump threading had kicked in we can frequently prove that a branch will go in one direction. This was previously left up to RBO; instead, try to fold it immediately and straighten out the flow before we build SSA, so that we can refine the phis. Fix dotnet#101176
|
Perfscore regressions seem to just be from us recognizing more loops and applying the questionable "loop-scaling" logic to them. For example, here's a case with no diffs yet massive perfscore regressions: Example@@ -8,58 +8,58 @@
; 0 inlinees with PGO data; 22 single block inlinees; 8 inlinees without PGO data
; Final local variable assignments
;
-;* V00 arg0 [V00 ] ( 0, 0 ) struct ( 8) zero-ref ld-addr-op single-def <Microsoft.CodeAnalysis.SyntaxList`1[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeListSyntax]>
-; V01 loc0 [V01,T13] ( 4, 18 ) ubyte -> rbx
-;* V02 loc1 [V02 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op <Microsoft.CodeAnalysis.SyntaxList`1+Enumerator[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeListSyntax]>
-; V03 loc2 [V03 ] ( 7, 60 ) struct (32) [rsp+0x78] do-not-enreg[XSF] must-init addr-exposed ld-addr-op <Microsoft.CodeAnalysis.SeparatedSyntaxList`1+Enumerator[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeSyntax]>
-; V04 loc3 [V04 ] ( 2, 4 ) struct (24) [rsp+0x60] do-not-enreg[XS] must-init addr-exposed ld-addr-op <Microsoft.CodeAnalysis.SeparatedSyntaxList`1[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeSyntax]>
-; V05 loc4 [V05,T02] ( 6, 48 ) ref -> rbp class-hnd exact <Microsoft.CodeAnalysis.CSharp.Syntax.AttributeSyntax>
-; V06 loc5 [V06 ] ( 2, 16 ) struct (24) [rsp+0x48] do-not-enreg[XS] must-init addr-exposed ld-addr-op <Microsoft.CodeAnalysis.SyntaxToken>
-; V07 OutArgs [V07 ] ( 1, 1 ) struct (32) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
-;* V08 tmp1 [V08 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "NewObj constructor temp" <Microsoft.CodeAnalysis.SyntaxList`1+Enumerator[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeListSyntax]>
-;* V09 tmp2 [V09 ] ( 0, 0 ) struct ( 8) zero-ref "Inlining Arg" <Microsoft.CodeAnalysis.SyntaxList`1[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeListSyntax]>
-; V10 tmp3 [V10,T25] ( 4, 6 ) ref -> rdx class-hnd "Inline return value spill temp" <Microsoft.CodeAnalysis.SyntaxNode>
-;* V11 tmp4 [V11 ] ( 0, 0 ) ref -> zero-ref class-hnd "dup spill" <Microsoft.CodeAnalysis.SyntaxNode>
-;* V12 tmp5 [V12 ] ( 0, 0 ) ref -> zero-ref
-;* V13 tmp6 [V13,T31] ( 0, 0 ) int -> zero-ref
-;* V14 tmp7 [V14 ] ( 0, 0 ) int -> zero-ref "Inlining Arg"
-; V15 tmp8 [V15,T21] ( 2, 8 ) ref -> rcx class-hnd "Inlining Arg" <Microsoft.CodeAnalysis.GreenNode>
-; V16 tmp9 [V16,T24] ( 2, 8 ) struct (32) [rsp+0x28] do-not-enreg[SF] must-init ld-addr-op "NewObj constructor temp" <Microsoft.CodeAnalysis.SeparatedSyntaxList`1+Enumerator[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeSyntax]>
-;* V17 tmp10 [V17 ] ( 0, 0 ) ref -> zero-ref class-hnd "Inline return value spill temp" <Microsoft.CodeAnalysis.CSharp.Syntax.NameSyntax>
-; V18 tmp11 [V18,T00] ( 4, 64 ) byref -> r14 "Inlining Arg"
-; V19 tmp12 [V19,T05] ( 5, 40 ) ref -> r15 class-hnd "Inline stloc first use temp" <Microsoft.CodeAnalysis.CSharp.Syntax.NameSyntax>
-; V20 tmp13 [V20,T07] ( 4, 32 ) ref -> rax class-hnd "Inline stloc first use temp" <Microsoft.CodeAnalysis.GreenNode>
-;* V21 tmp14 [V21 ] ( 0, 0 ) ref -> zero-ref ld-addr-op class-hnd "Inline ldloca(s) first use temp" <Microsoft.CodeAnalysis.CSharp.Syntax.NameSyntax>
-; V22 tmp15 [V22,T09] ( 2, 32 ) ref -> rdx class-hnd "impAppendStmt" <Microsoft.CodeAnalysis.CSharp.Syntax.NameSyntax>
-;* V23 tmp16 [V23 ] ( 0, 0 ) ref -> zero-ref class-hnd "Inlining Arg" <Microsoft.CodeAnalysis.CSharp.Syntax.NameSyntax>
-; V24 tmp17 [V24,T01] ( 4, 64 ) ref -> rcx class-hnd "dup spill" <Microsoft.CodeAnalysis.GreenNode>
-;* V25 tmp18 [V25 ] ( 0, 0 ) ref -> zero-ref
-; V26 tmp19 [V26,T11] ( 3, 24 ) ref -> rcx
-; V27 tmp20 [V27,T08] ( 4, 32 ) ref -> rcx
-;* V28 tmp21 [V28,T19] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp"
-; V29 tmp22 [V29,T06] ( 3, 40 ) int -> rcx "Inline stloc first use temp"
-;* V30 tmp23 [V30,T29] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp"
-; V31 tmp24 [V31,T14] ( 3, 18 ) int -> rdi "Inline stloc first use temp"
-; V32 tmp25 [V32,T17] ( 4, 14 ) int -> rax "Inline return value spill temp"
-;* V33 tmp26 [V33 ] ( 0, 0 ) ref -> zero-ref class-hnd "Inlining Arg" <Microsoft.CodeAnalysis.SyntaxNode>
-; V34 tmp27 [V34,T22] ( 2, 8 ) ref -> rax class-hnd "Inlining Arg" <Microsoft.CodeAnalysis.GreenNode>
-;* V35 tmp28 [V35 ] ( 0, 0 ) ref -> zero-ref class-hnd "Inlining Arg" <Microsoft.CodeAnalysis.SyntaxNode>
-; V36 tmp29 [V36,T15] ( 4, 16 ) ref -> rcx class-hnd "Inlining Arg" <Microsoft.CodeAnalysis.GreenNode>
-; V37 tmp30 [V37,T20] ( 4, 8 ) int -> rax "Inline stloc first use temp"
-; V38 tmp31 [V38,T28] ( 2, 2 ) ref -> rcx single-def "field V00._node (fldOffset=0x0)" P-INDEP
-; V39 tmp32 [V39,T18] ( 4, 13 ) int -> rdi "field V02._index (fldOffset=0x0)" P-INDEP
-; V40 tmp33 [V40,T12] ( 8, 21 ) ref -> rsi single-def "field V02._list (fldOffset=0x8)" P-INDEP
-;* V41 tmp34 [V41,T32] ( 0, 0 ) int -> zero-ref single-def "field V08._index (fldOffset=0x0)" P-INDEP
-; V42 tmp35 [V42,T30] ( 2, 2 ) ref -> rsi single-def "field V08._list (fldOffset=0x8)" P-INDEP
-;* V43 tmp36 [V43 ] ( 0, 0 ) ref -> zero-ref "field V09._node (fldOffset=0x0)" P-INDEP
-;* V44 tmp37 [V44 ] ( 0, 0 ) int -> zero-ref "V16.[000..004)"
-; V45 tmp38 [V45,T23] ( 2, 8 ) ref -> rcx "argument with side effect"
-; V46 tmp39 [V46,T03] ( 3, 48 ) ref -> rcx "argument with side effect"
-; V47 tmp40 [V47,T10] ( 2, 32 ) ref -> rdx "argument with side effect"
-; V48 tmp41 [V48,T04] ( 3, 48 ) ref -> rax "argument with side effect"
-; V49 cse0 [V49,T26] ( 3, 6 ) ref -> rcx "CSE #01: moderate"
-; V50 rat0 [V50,T16] ( 4, 14 ) ref -> rcx "replacement local"
-; V51 rat1 [V51,T27] ( 3, 4 ) long -> rax "CSE for expectedClsNode"
+;* V00 arg0 [V00 ] ( 0, 0 ) struct ( 8) zero-ref ld-addr-op single-def <Microsoft.CodeAnalysis.SyntaxList`1[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeListSyntax]>
+; V01 loc0 [V01,T08] ( 4, 258 ) ubyte -> rbx
+;* V02 loc1 [V02 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op <Microsoft.CodeAnalysis.SyntaxList`1+Enumerator[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeListSyntax]>
+; V03 loc2 [V03 ] ( 7,2576 ) struct (32) [rsp+0x78] do-not-enreg[XSF] must-init addr-exposed ld-addr-op <Microsoft.CodeAnalysis.SeparatedSyntaxList`1+Enumerator[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeSyntax]>
+; V04 loc3 [V04 ] ( 2, 16 ) struct (24) [rsp+0x60] do-not-enreg[XS] must-init addr-exposed ld-addr-op <Microsoft.CodeAnalysis.SeparatedSyntaxList`1[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeSyntax]>
+; V05 loc4 [V05,T01] ( 6,1632 ) ref -> rbp class-hnd exact <Microsoft.CodeAnalysis.CSharp.Syntax.AttributeSyntax>
+; V06 loc5 [V06 ] ( 2, 256 ) struct (24) [rsp+0x48] do-not-enreg[XS] must-init addr-exposed ld-addr-op <Microsoft.CodeAnalysis.SyntaxToken>
+; V07 OutArgs [V07 ] ( 1, 1 ) struct (32) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
+;* V08 tmp1 [V08 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "NewObj constructor temp" <Microsoft.CodeAnalysis.SyntaxList`1+Enumerator[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeListSyntax]>
+;* V09 tmp2 [V09 ] ( 0, 0 ) struct ( 8) zero-ref "Inlining Arg" <Microsoft.CodeAnalysis.SyntaxList`1[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeListSyntax]>
+; V10 tmp3 [V10,T24] ( 4, 24 ) ref -> rdx class-hnd "Inline return value spill temp" <Microsoft.CodeAnalysis.SyntaxNode>
+;* V11 tmp4 [V11 ] ( 0, 0 ) ref -> zero-ref class-hnd "dup spill" <Microsoft.CodeAnalysis.SyntaxNode>
+;* V12 tmp5 [V12 ] ( 0, 0 ) ref -> zero-ref
+;* V13 tmp6 [V13 ] ( 0, 0 ) int -> zero-ref
+;* V14 tmp7 [V14 ] ( 0, 0 ) int -> zero-ref "Inlining Arg"
+; V15 tmp8 [V15,T20] ( 2, 32 ) ref -> rcx class-hnd "Inlining Arg" <Microsoft.CodeAnalysis.GreenNode>
+; V16 tmp9 [V16,T23] ( 2, 32 ) struct (32) [rsp+0x28] do-not-enreg[SF] must-init ld-addr-op "NewObj constructor temp" <Microsoft.CodeAnalysis.SeparatedSyntaxList`1+Enumerator[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeSyntax]>
+;* V17 tmp10 [V17 ] ( 0, 0 ) ref -> zero-ref class-hnd "Inline return value spill temp" <Microsoft.CodeAnalysis.CSharp.Syntax.NameSyntax>
+; V18 tmp11 [V18,T00] ( 4,2176 ) byref -> r14 "Inlining Arg"
+; V19 tmp12 [V19,T03] ( 5,1312 ) ref -> r15 class-hnd "Inline stloc first use temp" <Microsoft.CodeAnalysis.CSharp.Syntax.NameSyntax>
+; V20 tmp13 [V20,T12] ( 4, 128 ) ref -> rax class-hnd "Inline stloc first use temp" <Microsoft.CodeAnalysis.GreenNode>
+;* V21 tmp14 [V21 ] ( 0, 0 ) ref -> zero-ref ld-addr-op class-hnd "Inline ldloca(s) first use temp" <Microsoft.CodeAnalysis.CSharp.Syntax.NameSyntax>
+; V22 tmp15 [V22,T13] ( 2, 128 ) ref -> rdx class-hnd "impAppendStmt" <Microsoft.CodeAnalysis.CSharp.Syntax.NameSyntax>
+;* V23 tmp16 [V23 ] ( 0, 0 ) ref -> zero-ref class-hnd "Inlining Arg" <Microsoft.CodeAnalysis.CSharp.Syntax.NameSyntax>
+; V24 tmp17 [V24,T04] ( 4,1024 ) ref -> rcx class-hnd "dup spill" <Microsoft.CodeAnalysis.GreenNode>
+;* V25 tmp18 [V25 ] ( 0, 0 ) ref -> zero-ref
+; V26 tmp19 [V26,T07] ( 3, 384 ) ref -> rcx
+; V27 tmp20 [V27,T06] ( 4, 512 ) ref -> rcx
+;* V28 tmp21 [V28 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp"
+; V29 tmp22 [V29,T02] ( 3,1536 ) int -> rcx "Inline stloc first use temp"
+;* V30 tmp23 [V30 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp"
+; V31 tmp24 [V31,T10] ( 3, 160 ) int -> rdi "Inline stloc first use temp"
+; V32 tmp25 [V32,T16] ( 4, 88 ) int -> rax "Inline return value spill temp"
+;* V33 tmp26 [V33 ] ( 0, 0 ) ref -> zero-ref class-hnd "Inlining Arg" <Microsoft.CodeAnalysis.SyntaxNode>
+; V34 tmp27 [V34,T21] ( 2, 32 ) ref -> rax class-hnd "Inlining Arg" <Microsoft.CodeAnalysis.GreenNode>
+;* V35 tmp28 [V35 ] ( 0, 0 ) ref -> zero-ref class-hnd "Inlining Arg" <Microsoft.CodeAnalysis.SyntaxNode>
+; V36 tmp29 [V36,T17] ( 4, 64 ) ref -> rcx class-hnd "Inlining Arg" <Microsoft.CodeAnalysis.GreenNode>
+; V37 tmp30 [V37,T19] ( 4, 32 ) int -> rax "Inline stloc first use temp"
+; V38 tmp31 [V38,T27] ( 2, 2 ) ref -> rcx single-def "field V00._node (fldOffset=0x0)" P-INDEP
+; V39 tmp32 [V39,T15] ( 4, 105 ) int -> rdi "field V02._index (fldOffset=0x0)" P-INDEP
+; V40 tmp33 [V40,T11] ( 8, 137 ) ref -> rsi single-def "field V02._list (fldOffset=0x8)" P-INDEP
+;* V41 tmp34 [V41,T29] ( 0, 0 ) int -> zero-ref single-def "field V08._index (fldOffset=0x0)" P-INDEP
+; V42 tmp35 [V42,T28] ( 2, 2 ) ref -> rsi single-def "field V08._list (fldOffset=0x8)" P-INDEP
+;* V43 tmp36 [V43 ] ( 0, 0 ) ref -> zero-ref "field V09._node (fldOffset=0x0)" P-INDEP
+;* V44 tmp37 [V44 ] ( 0, 0 ) int -> zero-ref "V16.[000..004)"
+; V45 tmp38 [V45,T22] ( 2, 32 ) ref -> rcx "argument with side effect"
+; V46 tmp39 [V46,T09] ( 3, 192 ) ref -> rcx "argument with side effect"
+; V47 tmp40 [V47,T14] ( 2, 128 ) ref -> rdx "argument with side effect"
+; V48 tmp41 [V48,T05] ( 3, 768 ) ref -> rax "argument with side effect"
+; V49 cse0 [V49,T25] ( 3, 24 ) ref -> rcx "CSE #01: moderate"
+; V50 rat0 [V50,T18] ( 4, 56 ) ref -> rcx "replacement local"
+; V51 rat1 [V51,T26] ( 3, 16 ) long -> rax "CSE for expectedClsNode"
;
; Lcl frame size = 152
@@ -86,13 +86,13 @@ G_M20888_IG02: ; bbWeight=1, gcrefRegs=0002 {rcx}, byrefRegs=0000 {}, byr
; gcrRegs +[rsi]
mov edi, -1
;; size=10 bbWeight=1 PerfScore 0.75
-G_M20888_IG03: ; bbWeight=8, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byref, isz
+G_M20888_IG03: ; bbWeight=64, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byref, isz
; gcrRegs -[rcx]
inc edi
test rsi, rsi
je SHORT G_M20888_IG05
- ;; size=7 bbWeight=8 PerfScore 12.00
-G_M20888_IG04: ; bbWeight=2, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byref, isz
+ ;; size=7 bbWeight=64 PerfScore 96.00
+G_M20888_IG04: ; bbWeight=8, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byref, isz
mov rcx, gword ptr [rsi+0x18]
; gcrRegs +[rcx]
mov rax, rcx
@@ -102,13 +102,13 @@ G_M20888_IG04: ; bbWeight=2, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byr
mov eax, 1
; gcrRegs -[rax]
jmp SHORT G_M20888_IG07
- ;; size=21 bbWeight=2 PerfScore 17.00
-G_M20888_IG05: ; bbWeight=2, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byref, isz
+ ;; size=21 bbWeight=8 PerfScore 68.00
+G_M20888_IG05: ; bbWeight=8, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byref, isz
; gcrRegs -[rcx]
xor eax, eax
jmp SHORT G_M20888_IG07
- ;; size=4 bbWeight=2 PerfScore 4.50
-G_M20888_IG06: ; bbWeight=2, gcrefRegs=0042 {rcx rsi}, byrefRegs=0000 {}, byref, isz
+ ;; size=4 bbWeight=8 PerfScore 18.00
+G_M20888_IG06: ; bbWeight=8, gcrefRegs=0042 {rcx rsi}, byrefRegs=0000 {}, byref, isz
; gcrRegs +[rcx]
movzx rax, byte ptr [rcx+0x0F]
cmp eax, 255
@@ -118,20 +118,20 @@ G_M20888_IG06: ; bbWeight=2, gcrefRegs=0042 {rcx rsi}, byrefRegs=0000 {},
call [rax+0x28]<unknown method>
; gcrRegs -[rcx]
; gcr arg pop 0
- ;; size=21 bbWeight=2 PerfScore 20.50
-G_M20888_IG07: ; bbWeight=8, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byref, isz
+ ;; size=21 bbWeight=8 PerfScore 82.00
+G_M20888_IG07: ; bbWeight=64, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byref, isz
cmp edi, eax
jge SHORT G_M20888_IG10
- ;; size=4 bbWeight=8 PerfScore 10.00
-G_M20888_IG08: ; bbWeight=2, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byref, isz
+ ;; size=4 bbWeight=64 PerfScore 80.00
+G_M20888_IG08: ; bbWeight=32, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byref, isz
test rsi, rsi
jne SHORT G_M20888_IG12
- ;; size=5 bbWeight=2 PerfScore 2.50
-G_M20888_IG09: ; bbWeight=2, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byref, isz
+ ;; size=5 bbWeight=32 PerfScore 40.00
+G_M20888_IG09: ; bbWeight=8, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byref, isz
mov rdx, rsi
; gcrRegs +[rdx]
jmp SHORT G_M20888_IG13
- ;; size=5 bbWeight=2 PerfScore 4.50
+ ;; size=5 bbWeight=8 PerfScore 18.00
G_M20888_IG10: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
; gcrRegs -[rdx rsi]
mov eax, ebx
@@ -147,7 +147,7 @@ G_M20888_IG11: ; bbWeight=1, epilog, nogc, extend
pop r15
ret
;; size=19 bbWeight=1 PerfScore 5.25
-G_M20888_IG12: ; bbWeight=2, gcVars=0000000000000000 {}, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, gcvars, byref, isz
+G_M20888_IG12: ; bbWeight=8, gcVars=0000000000000000 {}, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, gcvars, byref, isz
; gcrRegs +[rsi]
mov rcx, gword ptr [rsi+0x18]
; gcrRegs +[rcx]
@@ -162,42 +162,42 @@ G_M20888_IG12: ; bbWeight=2, gcVars=0000000000000000 {}, gcrefRegs=0040 {
; gcr arg pop 0
mov rdx, rax
; gcrRegs +[rdx]
- ;; size=29 bbWeight=2 PerfScore 27.50
-G_M20888_IG13: ; bbWeight=2, gcrefRegs=0044 {rdx rsi}, byrefRegs=0000 {}, byref, isz
+ ;; size=29 bbWeight=8 PerfScore 110.00
+G_M20888_IG13: ; bbWeight=8, gcrefRegs=0044 {rdx rsi}, byrefRegs=0000 {}, byref, isz
; gcrRegs -[rax]
mov rcx, rdx
; gcrRegs +[rcx]
test rcx, rcx
je SHORT G_M20888_IG15
- ;; size=8 bbWeight=2 PerfScore 3.00
-G_M20888_IG14: ; bbWeight=1, gcrefRegs=0046 {rcx rdx rsi}, byrefRegs=0000 {}, byref
+ ;; size=8 bbWeight=8 PerfScore 12.00
+G_M20888_IG14: ; bbWeight=4, gcrefRegs=0046 {rcx rdx rsi}, byrefRegs=0000 {}, byref
mov rax, 0xD1FFAB1E ; Microsoft.CodeAnalysis.CSharp.Syntax.AttributeListSyntax
cmp qword ptr [rcx], rax
jne G_M20888_IG25
- ;; size=19 bbWeight=1 PerfScore 4.25
-G_M20888_IG15: ; bbWeight=2, gcrefRegs=0042 {rcx rsi}, byrefRegs=0000 {}, byref
+ ;; size=19 bbWeight=4 PerfScore 17.00
+G_M20888_IG15: ; bbWeight=8, gcrefRegs=0042 {rcx rsi}, byrefRegs=0000 {}, byref
; gcrRegs -[rdx]
lea rdx, [rsp+0x60]
cmp dword ptr [rcx], ecx
call [Microsoft.CodeAnalysis.CSharp.Syntax.AttributeListSyntax:get_Attributes():Microsoft.CodeAnalysis.SeparatedSyntaxList`1[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeSyntax]:this]
; gcrRegs -[rcx]
; gcr arg pop 0
- ;; size=13 bbWeight=2 PerfScore 13.00
-G_M20888_IG16: ; bbWeight=2, nogc, extend
+ ;; size=13 bbWeight=8 PerfScore 52.00
+G_M20888_IG16: ; bbWeight=8, nogc, extend
vmovdqu xmm0, xmmword ptr [rsp+0x60]
vmovdqu xmmword ptr [rsp+0x30], xmm0
mov rcx, qword ptr [rsp+0x70]
mov qword ptr [rsp+0x40], rcx
- ;; size=22 bbWeight=2 PerfScore 12.00
-G_M20888_IG17: ; bbWeight=2, nogc, extend
+ ;; size=22 bbWeight=8 PerfScore 48.00
+G_M20888_IG17: ; bbWeight=8, nogc, extend
vmovdqu ymm0, ymmword ptr [rsp+0x28]
vmovdqu ymmword ptr [rsp+0x78], ymm0
- ;; size=12 bbWeight=2 PerfScore 10.00
-G_M20888_IG18: ; bbWeight=2, isz, extend
+ ;; size=12 bbWeight=8 PerfScore 40.00
+G_M20888_IG18: ; bbWeight=8, isz, extend
mov dword ptr [rsp+0x78], -1
jmp SHORT G_M20888_IG20
- ;; size=10 bbWeight=2 PerfScore 6.00
-G_M20888_IG19: ; bbWeight=8, gcrefRegs=0042 {rcx rsi}, byrefRegs=0000 {}, byref
+ ;; size=10 bbWeight=8 PerfScore 24.00
+G_M20888_IG19: ; bbWeight=128, gcrefRegs=0042 {rcx rsi}, byrefRegs=0000 {}, byref
; gcrRegs +[rcx]
mov edx, 1
call [Microsoft.CodeAnalysis.CSharp.Symbols.QuickAttributeHelpers:GetQuickAttributes(System.String,ubyte):ubyte]
@@ -205,14 +205,12 @@ G_M20888_IG19: ; bbWeight=8, gcrefRegs=0042 {rcx rsi}, byrefRegs=0000 {},
; gcr arg pop 0
or eax, ebx
movzx rbx, al
- ;; size=16 bbWeight=8 PerfScore 30.00
-G_M20888_IG20: ; bbWeight=16, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byref
+ ;; size=16 bbWeight=128 PerfScore 480.00
+G_M20888_IG20: ; bbWeight=512, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byref, isz
mov ecx, dword ptr [rsp+0x78]
inc ecx
cmp ecx, dword ptr [rsp+0x80]
jge G_M20888_IG03
- ;; size=19 bbWeight=16 PerfScore 68.00
-G_M20888_IG21: ; bbWeight=8, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byref, isz
mov dword ptr [rsp+0x78], ecx
lea rcx, [rsp+0x80]
mov r8d, dword ptr [rsp+0x78]
@@ -229,11 +227,13 @@ G_M20888_IG21: ; bbWeight=8, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byr
; gcrRegs +[r15]
test r15, r15
jne SHORT G_M20888_IG22
+ ;; size=71 bbWeight=512 PerfScore 8704.00
+G_M20888_IG21: ; bbWeight=32, gcrefRegs=8060 {rbp rsi r15}, byrefRegs=4000 {r14}, byref, isz
+ ; gcrRegs -[rax]
mov rcx, gword ptr [rbp+0x18]
; gcrRegs +[rcx]
xor edx, edx
mov rax, qword ptr [rcx]
- ; gcrRegs -[rax]
mov rax, qword ptr [rax+0x50]
call [rax+0x20]<unknown method>
; gcrRegs -[rcx] +[rax]
@@ -269,8 +269,8 @@ G_M20888_IG21: ; bbWeight=8, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byr
; gcr arg pop 0
mov r15, gword ptr [r14]
; gcrRegs +[r15]
- ;; size=128 bbWeight=8 PerfScore 304.00
-G_M20888_IG22: ; bbWeight=8, gcrefRegs=8040 {rsi r15}, byrefRegs=0000 {}, byref, isz
+ ;; size=76 bbWeight=32 PerfScore 808.00
+G_M20888_IG22: ; bbWeight=128, gcrefRegs=8040 {rsi r15}, byrefRegs=0000 {}, byref, isz
; byrRegs -[r14]
mov rcx, r15
; gcrRegs +[rcx]
@@ -294,8 +294,8 @@ G_M20888_IG22: ; bbWeight=8, gcrefRegs=8040 {rsi r15}, byrefRegs=0000 {},
jne SHORT G_M20888_IG23
xor rcx, rcx
jmp SHORT G_M20888_IG24
- ;; size=44 bbWeight=8 PerfScore 156.00
-G_M20888_IG23: ; bbWeight=8, gcrefRegs=0042 {rcx rsi}, byrefRegs=0000 {}, byref
+ ;; size=44 bbWeight=128 PerfScore 2496.00
+G_M20888_IG23: ; bbWeight=128, gcrefRegs=0042 {rcx rsi}, byrefRegs=0000 {}, byref
mov rax, qword ptr [rcx]
mov rax, qword ptr [rax+0x60]
call [rax+0x30]<unknown method>
@@ -303,14 +303,14 @@ G_M20888_IG23: ; bbWeight=8, gcrefRegs=0042 {rcx rsi}, byrefRegs=0000 {},
; gcr arg pop 0
mov rcx, rax
; gcrRegs +[rcx]
- ;; size=13 bbWeight=8 PerfScore 58.00
-G_M20888_IG24: ; bbWeight=8, gcrefRegs=0042 {rcx rsi}, byrefRegs=0000 {}, byref
+ ;; size=13 bbWeight=128 PerfScore 928.00
+G_M20888_IG24: ; bbWeight=128, gcrefRegs=0042 {rcx rsi}, byrefRegs=0000 {}, byref
; gcrRegs -[rax]
test rcx, rcx
jne G_M20888_IG19
mov rcx, 0xD1FFAB1E
jmp G_M20888_IG19
- ;; size=24 bbWeight=8 PerfScore 28.00
+ ;; size=24 bbWeight=128 PerfScore 448.00
G_M20888_IG25: ; bbWeight=0, gcrefRegs=0004 {rdx}, byrefRegs=0000 {}, byref
; gcrRegs -[rcx rsi] +[rdx]
mov rcx, rax
@@ -320,7 +320,7 @@ G_M20888_IG25: ; bbWeight=0, gcrefRegs=0004 {rdx}, byrefRegs=0000 {}, byr
int3
;; size=9 bbWeight=0 PerfScore 0.00
-; Total bytes of code 516, prolog size 52, PerfScore 811.83, instruction count 141, allocated bytes for code 516 (MethodHash=453aae67) for method Microsoft.CodeAnalysis.CSharp.DeclarationTreeBuilder:GetQuickAttributes(Microsoft.CodeAnalysis.SyntaxList`1[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeListSyntax]):ubyte (FullOpts)
+; Total bytes of code 516, prolog size 52, PerfScore 14590.08, instruction count 141, allocated bytes for code 516 (MethodHash=453aae67) for method Microsoft.CodeAnalysis.CSharp.DeclarationTreeBuilder:GetQuickAttributes(Microsoft.CodeAnalysis.SyntaxList`1[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeListSyntax]):ubyte (FullOpts)Diff in the jitdump looks like @@ -1,8 +1,8 @@
*************** In optMarkLoopHeads()
-2 loop heads marked
+4 loop heads marked
*************** In optFindAndScaleGeneralLoopBlocks()
-Marking a loop from BB02 to BB30
+Marking a loop from BB02 to BB26
BB02(wt=400)
BB03(wt=400)
BB04(wt=400)
@@ -19,20 +19,21 @@ Marking a loop from BB02 to BB30
BB15(wt=400)
BB16(wt=400)
BB17(wt=400)
- BB18(wt=400)
+ BB18(wt=800)
BB19(wt=400)
BB20(wt=400)
- BB21(wt=800)
+ BB21(wt=400)
BB22(wt=400)
BB23(wt=400)
BB24(wt=400)
- BB25(wt=400)
- BB26(wt=400)
- BB27(wt=400)
- BB28(wt=800)
- BB29(wt=800)
- BB30(wt=800)
-Marking a loop from BB08 to BB19
+ BB25(wt=800)
+ BB26(wt=800)
+Marking a loop from BB03 to BB26
+ BB03(wt=1600)
+ BB04(wt=1600)
+ BB05(wt=1600)
+ BB06(wt=1600)
+ BB07(wt=1600)
BB08(wt=1600)
BB09(wt=1600)
BB10(wt=1600)
@@ -42,7 +43,35 @@ Marking a loop from BB08 to BB19
BB14(wt=1600)
BB15(wt=1600)
BB16(wt=1600)
- BB17(wt=3200)
- BB18(wt=3200)
- BB19(wt=3200)
-Found a total of 2 general loops.
+ BB17(wt=1600)
+ BB18(wt=6400)
+ BB19(wt=1600)
+ BB20(wt=1600)
+ BB21(wt=1600)
+ BB22(wt=1600)
+ BB23(wt=1600)
+ BB24(wt=1600)
+ BB25(wt=6400)
+ BB26(wt=6400)
+Marking a loop from BB07 to BB16
+ BB07(wt=6400)
+ BB08(wt=6400)
+ BB09(wt=6400)
+ BB10(wt=6400)
+ BB11(wt=6400)
+ BB12(wt=6400)
+ BB13(wt=6400)
+ BB14(wt=6400)
+ BB15(wt=12800)
+ BB16(wt=12800)
+Marking a loop from BB09 to BB16
+ BB09(wt=25600)
+ BB10(wt=25600)
+ BB11(wt=25600)
+ BB12(wt=25600)
+ BB13(wt=25600)
+ BB14(wt=25600)
+ BB15(wt=102400)
+ BB16(wt=102400)
+Found a total of 4 general loops. |
|
Looking at some regressions.. libraries_tests.run.windows.x64.Release.mch System.Xml.Schema.Preprocessor:GetParentSchema(System.Xml.Schema.XmlSchemaObject):System.Xml.Schema.XmlSchema (Tier1)@@ -9,58 +9,65 @@
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
-; V00 arg0 [V00,T00] ( 7, 6 ) ref -> rbx class-hnd <System.Xml.Schema.XmlSchemaObject>
-; V01 loc0 [V01,T02] ( 4, 3.50) ref -> rax class-hnd <System.Xml.Schema.XmlSchema>
+; V00 arg0 [V00,T00] ( 8, 7.37) ref -> rbx class-hnd <System.Xml.Schema.XmlSchemaObject>
+; V01 loc0 [V01,T02] ( 4, 4 ) ref -> rax class-hnd <System.Xml.Schema.XmlSchema>
; V02 OutArgs [V02 ] ( 1, 1 ) struct (32) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
; V03 rat0 [V03,T01] ( 5, 7 ) ref -> rax "replacement local"
; V04 rat1 [V04,T03] ( 3, 2 ) long -> rcx "CSE for expectedClsNode"
;
; Lcl frame size = 32
-G_M65415_IG01: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
+G_M65415_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
push rbx
sub rsp, 32
mov rbx, rcx
; gcrRegs +[rbx]
- ;; size=8 bbWeight=0.50 PerfScore 0.75
-G_M65415_IG02: ; bbWeight=0.50, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byref
+ ;; size=8 bbWeight=1 PerfScore 1.50
+G_M65415_IG02: ; bbWeight=1, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byref, isz
xor rax, rax
; gcrRegs +[rax]
- ;; size=2 bbWeight=0.50 PerfScore 0.12
-G_M65415_IG03: ; bbWeight=1, gcrefRegs=0009 {rax rbx}, byrefRegs=0000 {}, byref, isz
test rbx, rbx
- je SHORT G_M65415_IG06
- mov rbx, gword ptr [rbx+0x18]
- mov rax, rbx
- test rax, rax
- je SHORT G_M65415_IG05
- ;; size=17 bbWeight=1 PerfScore 4.75
-G_M65415_IG04: ; bbWeight=0.50, gcrefRegs=0009 {rax rbx}, byrefRegs=0000 {}, byref, isz
- mov rcx, 0xD1FFAB1E ; System.Xml.Schema.XmlSchema
- cmp qword ptr [rax], rcx
- jne SHORT G_M65415_IG07
- ;; size=15 bbWeight=0.50 PerfScore 2.12
-G_M65415_IG05: ; bbWeight=1, gcrefRegs=0009 {rax rbx}, byrefRegs=0000 {}, byref, isz
- test rax, rax
- je SHORT G_M65415_IG03
- ;; size=5 bbWeight=1 PerfScore 1.25
-G_M65415_IG06: ; bbWeight=1, gcrefRegs=0001 {rax}, byrefRegs=0000 {}, byref, epilog, nogc
+ jne SHORT G_M65415_IG04
+ ;; size=7 bbWeight=1 PerfScore 1.50
+G_M65415_IG03: ; bbWeight=1, gcrefRegs=0001 {rax}, byrefRegs=0000 {}, byref, epilog, nogc
; gcrRegs -[rbx]
add rsp, 32
pop rbx
ret
;; size=6 bbWeight=1 PerfScore 1.75
-G_M65415_IG07: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, gcvars, byref, isz
+G_M65415_IG04: ; bbWeight=1, gcVars=0000000000000000 {}, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, gcvars, byref, isz
; gcrRegs -[rax] +[rbx]
+ mov rbx, gword ptr [rbx+0x18]
+ mov rax, rbx
+ ; gcrRegs +[rax]
+ test rax, rax
+ je SHORT G_M65415_IG06
+ ;; size=12 bbWeight=1 PerfScore 3.50
+G_M65415_IG05: ; bbWeight=0.50, gcrefRegs=0009 {rax rbx}, byrefRegs=0000 {}, byref, isz
+ mov rcx, 0xD1FFAB1E ; System.Xml.Schema.XmlSchema
+ cmp qword ptr [rax], rcx
+ jne SHORT G_M65415_IG08
+ ;; size=15 bbWeight=0.50 PerfScore 2.12
+G_M65415_IG06: ; bbWeight=1, gcrefRegs=0009 {rax rbx}, byrefRegs=0000 {}, byref, isz
+ test rax, rax
+ jne SHORT G_M65415_IG03
+ ;; size=5 bbWeight=1 PerfScore 1.25
+G_M65415_IG07: ; bbWeight=1.37, gcrefRegs=0009 {rax rbx}, byrefRegs=0000 {}, byref, isz
+ test rbx, rbx
+ jne SHORT G_M65415_IG04
+ jmp SHORT G_M65415_IG03
+ ;; size=7 bbWeight=1.37 PerfScore 4.45
+G_M65415_IG08: ; bbWeight=0, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byref, isz
+ ; gcrRegs -[rax]
mov rdx, rbx
; gcrRegs +[rdx]
call CORINFO_HELP_ISINSTANCEOFCLASS
; gcrRegs -[rdx] +[rax]
; gcr arg pop 0
- jmp SHORT G_M65415_IG05
+ jmp SHORT G_M65415_IG06
;; size=10 bbWeight=0 PerfScore 0.00
-; Total bytes of code 63, prolog size 8, PerfScore 10.75, instruction count 21, allocated bytes for code 63 (MethodHash=360f0078) for method System.Xml.Schema.Preprocessor:GetParentSchema(System.Xml.Schema.XmlSchemaObject):System.Xml.Schema.XmlSchema (Tier1)
+; Total bytes of code 70, prolog size 8, PerfScore 16.08, instruction count 24, allocated bytes for code 70 (MethodHash=360f0078) for method System.Xml.Schema.Preprocessor:GetParentSchema(System.Xml.Schema.XmlSchemaObject):System.Xml.Schema.XmlSchema (Tier1)Looks like we straighten out the flow, and then immediately some other "duplicate tail condition" logic kicks in: @@ -1,119 +1,178 @@
*************** In fgUpdateFlowGraph()
Before updating the flow graph:
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds weight IBC [IL range] [jump] [EH region] [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000] 1 1 73 [000..004)-> BB03(1) (always) i IBC
BB02 [0001] 1 BB04 1 73 [004..013)-> BB03(1) (always) i IBC hascall bwd bwd-target
BB03 [0002] 2 BB01,BB02 2 146 [013..016)-> BB05(0.5),BB04(0.5) ( cond ) i IBC bwd
BB04 [0003] 1 BB03 1 73 [016..019)-> BB02(1),BB05(0) ( cond ) i IBC bwd bwd-src
BB05 [0004] 2 BB03,BB04 1 73 [019..01B) (return) i IBC
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Considering uncond to cond BB01 -> BB03
setting likelihood of BB01 -> BB05 from 1 to 0.5
setting likelihood of BB01 -> BB04 to 0.5
fgOptimizeUncondBranchToSimpleCond(from BB01 to cond BB03), modified BB01
expecting opts to key off V01 in BB01
Decreased BB03 profile weight from 146 to 73
-Considering uncond to cond BB02 -> BB03
+Forward substituting local after jump threading. Before:
+STMT00007 ( ??? ... ??? )
+ [000025] ----------- ▌ JTRUE void
+ [000026] J------N--- └──▌ NE int
+ [000027] ----------- ├──▌ LCL_VAR ref V01 loc0
+ [000028] ----------- └──▌ CNS_INT ref null
+
+After:
+STMT00007 ( ??? ... ??? )
+ [000025] ----------- ▌ JTRUE void
+ [000026] J------N--- └──▌ NE int
+ [000029] ----------- ├──▌ CNS_INT ref null
+ [000028] ----------- └──▌ CNS_INT ref null
+
+Now trying to fold...
+
+Folding operator with constant nodes into a constant:
+ [000026] J------N--- ▌ NE int
+ [000029] ----------- ├──▌ CNS_INT ref null
+ [000028] ----------- └──▌ CNS_INT ref null
+Bashed to int constant:
+ [000026] ----------- ▌ CNS_INT int 0
+STMT00007 ( ??? ... ??? )
+ [000025] ----------- ▌ JTRUE void
+ [000026] ----------- └──▌ CNS_INT int 0
+
+removing useless STMT00007 ( ??? ... ??? )
+ [000025] ----------- ▌ JTRUE void
+ [000026] ----------- └──▌ CNS_INT int 0
+ from BB01
+setting likelihood of BB01 -> BB04 from 0.5 to 1
+
+Conditional folded at BB01
+BB01 becomes a BBJ_ALWAYS to BB04
+Trying to compact last pred BB02 of BB03 that we now bypass
Compacting BB03 into BB02:
*************** In fgDebugCheckBBlist
+Considering uncond to cond BB01 -> BB04
After updating the flow graph:
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds weight IBC [IL range] [jump] [EH region] [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
-BB01 [0000] 1 1 73 [000..004)-> BB05(0.5),BB04(0.5) ( cond ) i IBC
+BB01 [0000] 1 1 73 [000..004)-> BB04(1) (always) i IBC
BB02 [0001] 1 BB04 1 73 [004..016)-> BB05(0.5),BB04(0.5) ( cond ) i IBC hascall bwd bwd-target
BB04 [0003] 2 BB01,BB02 1 73 [016..019)-> BB02(1),BB05(0) ( cond ) i IBC bwd bwd-src
-BB05 [0004] 3 BB01,BB02,BB04 1 73 [019..01B) (return) i IBC
+BB05 [0004] 2 BB02,BB04 1 73 [019..01B) (return) i IBC
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
*************** Exception Handling table is empty
*************** In fgDebugCheckBBlist
*************** In fgExpandRarelyRunBlocks()
*************** In fgReorderBlocks()
Initial BasicBlocks
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds weight IBC [IL range] [jump] [EH region] [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
-BB01 [0000] 1 1 73 [000..004)-> BB05(0.5),BB04(0.5) ( cond ) i IBC
+BB01 [0000] 1 1 73 [000..004)-> BB04(1) (always) i IBC
BB02 [0001] 1 BB04 1 73 [004..016)-> BB05(0.5),BB04(0.5) ( cond ) i IBC hascall bwd bwd-target
BB04 [0003] 2 BB01,BB02 1 73 [016..019)-> BB02(1),BB05(0) ( cond ) i IBC bwd bwd-src
+BB05 [0004] 2 BB02,BB04 1 73 [019..01B) (return) i IBC
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------
+
+
+Duplication of the conditional block BB04 (always branch from BB01) performed, because the cost of duplication (6) is less or equal than 6, validProfileWeights = true
+setting likelihood of BB01 -> BB02 to 0
+setting likelihood of BB01 -> BB05 from 1 to 1
+
+fgOptimizeBranch added these statements(s) at the end of BB01:
+STMT00008 ( 0x016[E-] ... ??? )
+ ( 7, 6) [000030] ----------- ▌ JTRUE void
+ ( 5, 4) [000031] J------N--- └──▌ EQ int
+ ( 3, 2) [000032] ----------- ├──▌ LCL_VAR ref V00 arg0
+ ( 1, 1) [000033] ----------- └──▌ CNS_INT ref null
+
+fgOptimizeBranch changed block BB01 from BBJ_ALWAYS to BBJ_COND.
+
+After this change in fgOptimizeBranch the BB graph is:
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------
+BBnum BBid ref try hnd preds weight IBC [IL range] [jump] [EH region] [flags]
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------
+BB01 [0000] 1 1 73 [000..004)-> BB05(1),BB02(0) ( cond ) i IBC
+BB02 [0001] 2 BB01,BB04 1 73 [004..016)-> BB05(0.5),BB04(0.5) ( cond ) i IBC hascall bwd bwd-target
+BB04 [0003] 1 BB02 1.37 100 [016..019)-> BB02(1),BB05(0) ( cond ) i IBC bwd bwd-src
BB05 [0004] 3 BB01,BB02,BB04 1 73 [019..01B) (return) i IBC
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
*************** Finishing PHASE Optimize control flow
Trees after Optimize control flow
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds weight IBC [IL range] [jump] [EH region] [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
-BB01 [0000] 1 1 73 [000..004)-> BB05(0.5),BB04(0.5) ( cond ) i IBC
-BB02 [0001] 1 BB04 1 73 [004..016)-> BB05(0.5),BB04(0.5) ( cond ) i IBC hascall bwd bwd-target
-BB04 [0003] 2 BB01,BB02 1 73 [016..019)-> BB02(1),BB05(0) ( cond ) i IBC bwd bwd-src
+BB01 [0000] 1 1 73 [000..004)-> BB05(1),BB02(0) ( cond ) i IBC
+BB02 [0001] 2 BB01,BB04 1 73 [004..016)-> BB05(0.5),BB04(0.5) ( cond ) i IBC hascall bwd bwd-target
+BB04 [0003] 1 BB02 1.37 100 [016..019)-> BB02(1),BB05(0) ( cond ) i IBC bwd bwd-src
BB05 [0004] 3 BB01,BB02,BB04 1 73 [019..01B) (return) i IBC
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------- BB01 [0000] [000..004) -> BB05(0.5),BB04(0.5) (cond), preds={} succs={BB04,BB05}
+------------ BB01 [0000] [000..004) -> BB05(1),BB02(0) (cond), preds={} succs={BB02,BB05}
***** BB01 [0000]
STMT00000 ( 0x000[E-] ... 0x001 )
[000001] DA---+----- ▌ STORE_LCL_VAR ref V01 loc0
[000000] -----+----- └──▌ CNS_INT ref null
***** BB01 [0000]
-STMT00007 ( ??? ... ??? )
- [000025] ----------- ▌ JTRUE void
- [000026] J------N--- └──▌ NE int
- [000027] ----------- ├──▌ LCL_VAR ref V01 loc0
- [000028] ----------- └──▌ CNS_INT ref null
+STMT00008 ( 0x016[E-] ... ??? )
+ ( 7, 6) [000030] ----------- ▌ JTRUE void
+ ( 5, 4) [000031] J------N--- └──▌ EQ int
+ ( 3, 2) [000032] ----------- ├──▌ LCL_VAR ref V00 arg0
+ ( 1, 1) [000033] ----------- └──▌ CNS_INT ref null
------------- BB02 [0001] [004..016) -> BB05(0.5),BB04(0.5) (cond), preds={BB04} succs={BB04,BB05}
+------------ BB02 [0001] [004..016) -> BB05(0.5),BB04(0.5) (cond), preds={BB01,BB04} succs={BB04,BB05}
***** BB02 [0001]
STMT00005 ( 0x004[E-] ... ??? )
[000015] DA-XG+----- ▌ STORE_LCL_VAR ref V00 arg0
[000021] ---XG+----- └──▌ IND ref
[000024] -----+----- └──▌ ADD byref
[000012] -----+----- ├──▌ LCL_VAR ref V00 arg0
[000023] -----+----- └──▌ CNS_INT long 24 Fseq[<unknown field>]
***** BB02 [0001]
STMT00006 ( 0x00C[E-] ... 0x012 )
[000019] DAC-G+----- ▌ STORE_LCL_VAR ref V01 loc0
[000018] --C-G+----- └──▌ CALL help ref CORINFO_HELP_ISINSTANCEOFCLASS
[000016] -----+----- arg1 in rdx ├──▌ LCL_VAR ref V00 arg0
[000017] H----+-N--- arg0 in rcx └──▌ CNS_INT(h) long 0x7fff78421508 class System.Xml.Schema.XmlSchema
***** BB02 [0001]
STMT00001 ( 0x013[E-] ... 0x014 )
[000005] -----+----- ▌ JTRUE void
[000004] J----+-N--- └──▌ NE int
[000002] -----+----- ├──▌ LCL_VAR ref V01 loc0
[000003] -----+----- └──▌ CNS_INT ref null
------------- BB04 [0003] [016..019) -> BB02(1),BB05(0) (cond), preds={BB01,BB02} succs={BB05,BB02}
+------------ BB04 [0003] [016..019) -> BB02(1),BB05(0) (cond), preds={BB02} succs={BB05,BB02}
***** BB04 [0003]
STMT00003 ( 0x016[E-] ... 0x017 )
- [000011] -----+----- ▌ JTRUE void
- [000010] J----+-N--- └──▌ NE int
- [000008] -----+----- ├──▌ LCL_VAR ref V00 arg0
- [000009] -----+----- └──▌ CNS_INT ref null
+ ( 7, 6) [000011] ----------- ▌ JTRUE void
+ ( 5, 4) [000010] J------N--- └──▌ NE int
+ ( 3, 2) [000008] ----------- ├──▌ LCL_VAR ref V00 arg0
+ ( 1, 1) [000009] ----------- └──▌ CNS_INT ref null
------------ BB05 [0004] [019..01B) (return), preds={BB01,BB02,BB04} succs={}
***** BB05 [0004]
STMT00002 ( 0x019[E-] ... 0x01A )
[000007] -----+----- ▌ RETURN ref
[000006] -----+----- └──▌ LCL_VAR ref V01 loc0
-------------------------------------------------------------------------------------------------------------------
The "duplicate tail" transformation here results in the loop no longer being in loop-inverted shape, so the final block layout looks worse. |
System.Double:Equals(System.Object):ubyte:this (Instrumented Tier1)@@ -12,71 +12,70 @@
;
; V00 this [V00,T01] ( 4, 2.65) byref -> rsi this single-def
; V01 arg1 [V01,T00] ( 5, 4.02) ref -> rbx class-hnd single-def <System.Object>
-; V02 loc0 [V02,T04] ( 4, 1.56) double -> mm0
+; V02 loc0 [V02,T03] ( 4, 1.56) double -> mm6
; V03 OutArgs [V03 ] ( 1, 1 ) struct (32) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
-;* V04 tmp1 [V04,T02] ( 0, 0 ) int -> zero-ref "spilling qmarkNull"
-; V05 tmp2 [V05,T03] ( 4, 1.04) ubyte -> rax "Inline return value spill temp"
-; V06 tmp3 [V06,T05] ( 3, 0.78) double -> mm0 "Inlining Arg"
+;* V04 tmp1 [V04 ] ( 0, 0 ) int -> zero-ref "spilling qmarkNull"
+; V05 tmp2 [V05,T02] ( 4, 1.04) ubyte -> rax "Inline return value spill temp"
+; V06 tmp3 [V06,T04] ( 3, 0.78) double -> mm0 "Inlining Arg"
;
-; Lcl frame size = 40
+; Lcl frame size = 56
G_M46727_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
push rsi
push rbx
- sub rsp, 40
+ sub rsp, 56
+ vmovaps xmmword ptr [rsp+0x20], xmm6
mov rsi, rcx
; byrRegs +[rsi]
mov rbx, rdx
; gcrRegs +[rbx]
- ;; size=12 bbWeight=1 PerfScore 2.75
+ ;; size=18 bbWeight=1 PerfScore 4.75
G_M46727_IG02: ; bbWeight=1, gcrefRegs=0008 {rbx}, byrefRegs=0040 {rsi}, byref, isz
test rbx, rbx
- jne SHORT G_M46727_IG05
+ je SHORT G_M46727_IG07
;; size=5 bbWeight=1 PerfScore 1.25
-G_M46727_IG03: ; bbWeight=0.48, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+G_M46727_IG03: ; bbWeight=0.50, gcrefRegs=0008 {rbx}, byrefRegs=0040 {rsi}, byref, isz
+ mov rcx, 0xD1FFAB1E ; System.Double
+ cmp qword ptr [rbx], rcx
+ jne SHORT G_M46727_IG07
+ ;; size=15 bbWeight=0.50 PerfScore 2.12
+G_M46727_IG04: ; bbWeight=0.52, gcrefRegs=0008 {rbx}, byrefRegs=0040 {rsi}, byref, isz
+ mov rcx, 0xD1FFAB1E
+ call CORINFO_HELP_COUNTPROFILE32
+ ; gcr arg pop 0
+ vmovsd xmm6, qword ptr [rbx+0x08]
+ vucomisd xmm6, qword ptr [rsi]
+ jp SHORT G_M46727_IG09
+ jne SHORT G_M46727_IG09
+ ;; size=28 bbWeight=0.52 PerfScore 6.37
+G_M46727_IG05: ; bbWeight=0.26, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
; gcrRegs -[rbx]
; byrRegs -[rsi]
+ mov eax, 1
+ ;; size=5 bbWeight=0.26 PerfScore 0.07
+G_M46727_IG06: ; bbWeight=0.52, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, epilog, nogc
+ vmovaps xmm6, xmmword ptr [rsp+0x20]
+ add rsp, 56
+ pop rbx
+ pop rsi
+ ret
+ ;; size=13 bbWeight=0.52 PerfScore 3.25
+G_M46727_IG07: ; bbWeight=0.48, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref
mov rcx, 0xD1FFAB1E
call CORINFO_HELP_COUNTPROFILE32
; gcr arg pop 0
xor eax, eax
;; size=17 bbWeight=0.48 PerfScore 0.72
-G_M46727_IG04: ; bbWeight=0.48, epilog, nogc, extend
- add rsp, 40
+G_M46727_IG08: ; bbWeight=0.48, epilog, nogc, extend
+ vmovaps xmm6, xmmword ptr [rsp+0x20]
+ add rsp, 56
pop rbx
pop rsi
ret
- ;; size=7 bbWeight=0.48 PerfScore 1.08
-G_M46727_IG05: ; bbWeight=0.50, gcVars=0000000000000000 {}, gcrefRegs=0008 {rbx}, byrefRegs=0040 {rsi}, gcvars, byref, isz
- ; gcrRegs +[rbx]
- ; byrRegs +[rsi]
- mov rcx, 0xD1FFAB1E ; System.Double
- cmp qword ptr [rbx], rcx
- jne SHORT G_M46727_IG03
- ;; size=15 bbWeight=0.50 PerfScore 2.12
-G_M46727_IG06: ; bbWeight=0.52, gcrefRegs=0008 {rbx}, byrefRegs=0040 {rsi}, byref, isz
- mov rcx, 0xD1FFAB1E
- call CORINFO_HELP_COUNTPROFILE32
- ; gcr arg pop 0
- vmovsd xmm0, qword ptr [rbx+0x08]
- vucomisd xmm0, qword ptr [rsi]
- jp SHORT G_M46727_IG09
- jne SHORT G_M46727_IG09
- ;; size=28 bbWeight=0.52 PerfScore 6.37
-G_M46727_IG07: ; bbWeight=0.26, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- ; gcrRegs -[rbx]
- ; byrRegs -[rsi]
- mov eax, 1
- ;; size=5 bbWeight=0.26 PerfScore 0.07
-G_M46727_IG08: ; bbWeight=0.52, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, epilog, nogc
- add rsp, 40
- pop rbx
- pop rsi
- ret
- ;; size=7 bbWeight=0.52 PerfScore 1.17
+ ;; size=13 bbWeight=0.48 PerfScore 3.00
G_M46727_IG09: ; bbWeight=0.26, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0040 {rsi}, gcvars, byref, isz
; byrRegs +[rsi]
- vucomisd xmm0, xmm0
+ vucomisd xmm6, xmm6
jp SHORT G_M46727_IG10
je SHORT G_M46727_IG11
;; size=8 bbWeight=0.26 PerfScore 1.04
@@ -85,15 +84,15 @@ G_M46727_IG10: ; bbWeight=0.13, gcrefRegs=0000 {}, byrefRegs=0040 {rsi},
vucomisd xmm0, xmm0
setp al
movzx rax, al
- jmp SHORT G_M46727_IG08
+ jmp SHORT G_M46727_IG06
;; size=16 bbWeight=0.13 PerfScore 1.20
G_M46727_IG11: ; bbWeight=0.13, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
; byrRegs -[rsi]
xor eax, eax
- jmp SHORT G_M46727_IG08
+ jmp SHORT G_M46727_IG06
;; size=4 bbWeight=0.13 PerfScore 0.29
-; Total bytes of code 124, prolog size 6, PerfScore 18.07, instruction count 38, allocated bytes for code 124 (MethodHash=203a4978) for method System.Double:Equals(System.Object):ubyte:this (Instrumented Tier1)
+; Total bytes of code 142, prolog size 12, PerfScore 24.07, instruction count 41, allocated bytes for code 142 (MethodHash=203a4978) for method System.Double:Equals(System.Object):ubyte:this (Instrumented Tier1)Looks like after we folded some control flow we managed to compact some more blocks, and this results in worse block layout now. |
System.Data.Common.DataStorage:GetStorageType(System.Type):int (Tier1)@@ -9,22 +9,22 @@
; 3 inlinees with PGO data; 0 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
-; V00 arg0 [V00,T03] ( 12, 46.60) ref -> rbx class-hnd single-def <System.Type>
-; V01 loc0 [V01,T09] ( 3, 0 ) int -> rax
-; V02 loc1 [V02,T01] ( 6, 65.17) int -> rsi
+; V00 arg0 [V00,T03] ( 15, 20.18) ref -> rbx class-hnd single-def <System.Type>
+; V01 loc0 [V01,T07] ( 3, 0 ) int -> rax
+; V02 loc1 [V02,T00] ( 10, 65.17) int -> rsi
; V03 OutArgs [V03 ] ( 1, 1 ) struct (32) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
-; V04 tmp1 [V04,T05] ( 2, 0 ) ubyte -> rax "Inline return value spill temp"
-; V05 tmp2 [V05,T00] ( 5, 92.71) ref -> rbp class-hnd "Inlining Arg" <System.Type>
-;* V06 tmp3 [V06,T06] ( 0, 0 ) int -> zero-ref "spilling qmarkNull"
-; V07 tmp4 [V07,T08] ( 3, 0.04) int -> rax "Inline return value spill temp"
-; V08 tmp5 [V08,T07] ( 3, 0.08) int -> rax "guarded devirt return temp"
+; V04 tmp1 [V04,T08] ( 2, 0 ) ubyte -> rax "Inline return value spill temp"
+; V05 tmp2 [V05,T01] ( 7, 63.39) ref -> rbp class-hnd "Inlining Arg" <System.Type>
+;* V06 tmp3 [V06 ] ( 0, 0 ) int -> zero-ref "spilling qmarkNull"
+; V07 tmp4 [V07,T06] ( 3, 0.04) int -> rax "Inline return value spill temp"
+; V08 tmp5 [V08,T05] ( 3, 0.08) int -> rax "guarded devirt return temp"
;* V09 tmp6 [V09 ] ( 0, 0 ) ref -> zero-ref class-hnd exact "guarded devirt this exact temp" <System.RuntimeType>
;* V10 tmp7 [V10 ] ( 0, 0 ) int -> zero-ref "Inline return value spill temp"
;* V11 tmp8 [V11 ] ( 0, 0 ) ref -> zero-ref class-hnd exact "Inlining Arg" <System.RuntimeType>
;* V12 tmp9 [V12 ] ( 0, 0 ) ref -> zero-ref class-hnd "Inline stloc first use temp" <System.RuntimeType>
-; V13 tmp10 [V13,T02] ( 2, 63.09) ref -> rcx "arr expr"
+; V13 tmp10 [V13,T02] ( 4, 63.09) ref -> rax "arr expr"
;* V14 tmp11 [V14 ] ( 0, 0 ) ref -> zero-ref "argument with side effect"
-; V15 cse0 [V15,T04] ( 2, 16.77) ref -> rdi hoist "CSE #01: aggressive"
+; V15 cse0 [V15,T04] ( 4, 16.78) ref -> rdi hoist multi-def "CSE #01: aggressive"
;
; Lcl frame size = 40
@@ -37,65 +37,93 @@ G_M6739_IG01: ; bbWeight=1.00, gcrefRegs=0000 {}, byrefRegs=0000 {}, byre
mov rbx, rcx
; gcrRegs +[rbx]
;; size=11 bbWeight=1.00 PerfScore 4.49
-G_M6739_IG02: ; bbWeight=1.00, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byref
+G_M6739_IG02: ; bbWeight=1.00, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byref, isz
xor esi, esi
+ test rbx, rbx
+ je SHORT G_M6739_IG09
+ mov rcx, 0xD1FFAB1E ; System.RuntimeType
+ cmp qword ptr [rbx], rcx
+ jne SHORT G_M6739_IG09
mov rcx, 0xD1FFAB1E ; const ptr
mov rdi, gword ptr [rcx]
; gcrRegs +[rdi]
- ;; size=15 bbWeight=1.00 PerfScore 2.49
-G_M6739_IG03: ; bbWeight=15.77, gcrefRegs=0088 {rbx rdi}, byrefRegs=0000 {}, byref, isz
- mov rcx, rdi
- ; gcrRegs +[rcx]
- mov eax, esi
- mov rbp, gword ptr [rcx+8*rax+0x10]
+ ;; size=35 bbWeight=1.00 PerfScore 7.98
+G_M6739_IG03: ; bbWeight=15.62, gcrefRegs=0088 {rbx rdi}, byrefRegs=0000 {}, byref, isz
+ mov rax, rdi
+ ; gcrRegs +[rax]
+ mov ecx, esi
+ mov rbp, gword ptr [rax+8*rcx+0x10]
; gcrRegs +[rbp]
cmp rbx, rbp
- je SHORT G_M6739_IG12
- ;; size=15 bbWeight=15.77 PerfScore 59.15
-G_M6739_IG04: ; bbWeight=14.83, gcrefRegs=00A8 {rbx rbp rdi}, byrefRegs=0000 {}, byref, isz
- ; gcrRegs -[rcx]
- test rbx, rbx
- je SHORT G_M6739_IG07
- ;; size=5 bbWeight=14.83 PerfScore 18.54
-G_M6739_IG05: ; bbWeight=14.81, gcrefRegs=00A8 {rbx rbp rdi}, byrefRegs=0000 {}, byref, isz
- test rbp, rbp
- je SHORT G_M6739_IG07
- ;; size=5 bbWeight=14.81 PerfScore 18.51
-G_M6739_IG06: ; bbWeight=13.88, gcrefRegs=00A8 {rbx rbp rdi}, byrefRegs=0000 {}, byref, isz
- mov rcx, 0xD1FFAB1E ; System.RuntimeType
- cmp qword ptr [rbx], rcx
- jne SHORT G_M6739_IG14
- ;; size=15 bbWeight=13.88 PerfScore 58.98
-G_M6739_IG07: ; bbWeight=15.81, gcrefRegs=0088 {rbx rdi}, byrefRegs=0000 {}, byref, isz
- ; gcrRegs -[rbp]
+ je SHORT G_M6739_IG15
+ ;; size=15 bbWeight=15.62 PerfScore 58.56
+G_M6739_IG04: ; bbWeight=15.66, gcrefRegs=0088 {rbx rdi}, byrefRegs=0000 {}, byref, isz
+ ; gcrRegs -[rax rbp]
inc esi
cmp esi, 41
jl SHORT G_M6739_IG03
- ;; size=7 bbWeight=15.81 PerfScore 23.72
-G_M6739_IG08: ; bbWeight=0.04, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byref, isz
+ ;; size=7 bbWeight=15.66 PerfScore 23.48
+G_M6739_IG05: ; bbWeight=0.04, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byref
; gcrRegs -[rdi]
test rbx, rbx
- je SHORT G_M6739_IG16
- ;; size=5 bbWeight=0.04 PerfScore 0.05
-G_M6739_IG09: ; bbWeight=0.04, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byref, isz
+ je G_M6739_IG19
+ ;; size=9 bbWeight=0.04 PerfScore 0.05
+G_M6739_IG06: ; bbWeight=0.04, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byref
mov rcx, 0xD1FFAB1E ; System.RuntimeType
cmp qword ptr [rbx], rcx
- jne SHORT G_M6739_IG15
- ;; size=15 bbWeight=0.04 PerfScore 0.18
-G_M6739_IG10: ; bbWeight=0.04, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byref
+ jne G_M6739_IG18
+ ;; size=19 bbWeight=0.04 PerfScore 0.18
+G_M6739_IG07: ; bbWeight=0.04, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byref
mov rcx, rbx
; gcrRegs +[rcx]
call [System.RuntimeType:GetTypeCodeImpl():int:this]
; gcrRegs -[rcx rbx]
; gcr arg pop 0
;; size=9 bbWeight=0.04 PerfScore 0.13
-G_M6739_IG11: ; bbWeight=0.04, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
- jmp SHORT G_M6739_IG17
- ;; size=2 bbWeight=0.04 PerfScore 0.08
-G_M6739_IG12: ; bbWeight=0.96, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+G_M6739_IG08: ; bbWeight=0.04, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+ jmp G_M6739_IG20
+ ;; size=5 bbWeight=0.04 PerfScore 0.08
+G_M6739_IG09: ; bbWeight=0.01, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byref
+ ; gcrRegs +[rbx]
+ mov rax, 0xD1FFAB1E ; const ptr
+ mov rdi, gword ptr [rax]
+ ; gcrRegs +[rdi]
+ ;; size=13 bbWeight=0.01 PerfScore 0.02
+G_M6739_IG10: ; bbWeight=0.16, gcrefRegs=0088 {rbx rdi}, byrefRegs=0000 {}, byref, isz
+ mov rax, rdi
+ ; gcrRegs +[rax]
+ mov edx, esi
+ mov rbp, gword ptr [rax+8*rdx+0x10]
+ ; gcrRegs +[rbp]
+ cmp rbx, rbp
+ je SHORT G_M6739_IG15
+ ;; size=15 bbWeight=0.16 PerfScore 0.59
+G_M6739_IG11: ; bbWeight=0.15, gcrefRegs=00A8 {rbx rbp rdi}, byrefRegs=0000 {}, byref, isz
+ ; gcrRegs -[rax]
+ test rbx, rbx
+ je SHORT G_M6739_IG14
+ ;; size=5 bbWeight=0.15 PerfScore 0.19
+G_M6739_IG12: ; bbWeight=0.15, gcrefRegs=00A8 {rbx rbp rdi}, byrefRegs=0000 {}, byref, isz
+ test rbp, rbp
+ je SHORT G_M6739_IG14
+ ;; size=5 bbWeight=0.15 PerfScore 0.19
+G_M6739_IG13: ; bbWeight=0.14, gcrefRegs=00A8 {rbx rbp rdi}, byrefRegs=0000 {}, byref, isz
+ mov rax, 0xD1FFAB1E ; System.RuntimeType
+ cmp qword ptr [rbx], rax
+ jne SHORT G_M6739_IG17
+ ;; size=15 bbWeight=0.14 PerfScore 0.59
+G_M6739_IG14: ; bbWeight=0.16, gcrefRegs=0088 {rbx rdi}, byrefRegs=0000 {}, byref, isz
+ ; gcrRegs -[rbp]
+ inc esi
+ cmp esi, 41
+ jl SHORT G_M6739_IG10
+ jmp SHORT G_M6739_IG05
+ ;; size=9 bbWeight=0.16 PerfScore 0.55
+G_M6739_IG15: ; bbWeight=0.96, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+ ; gcrRegs -[rbx rdi]
mov eax, esi
;; size=2 bbWeight=0.96 PerfScore 0.24
-G_M6739_IG13: ; bbWeight=0.96, epilog, nogc, extend
+G_M6739_IG16: ; bbWeight=0.96, epilog, nogc, extend
add rsp, 40
pop rbx
pop rbp
@@ -103,7 +131,7 @@ G_M6739_IG13: ; bbWeight=0.96, epilog, nogc, extend
pop rdi
ret
;; size=9 bbWeight=0.96 PerfScore 3.12
-G_M6739_IG14: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=00A8 {rbx rbp rdi}, byrefRegs=0000 {}, gcvars, byref, isz
+G_M6739_IG17: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=00A8 {rbx rbp rdi}, byrefRegs=0000 {}, gcvars, byref, isz
; gcrRegs +[rbx rbp rdi]
mov rdx, rbp
; gcrRegs +[rdx]
@@ -112,7 +140,7 @@ G_M6739_IG14: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=00A8 {r
; gcrRegs -[rdx] +[rax]
; gcr arg pop 0
test rax, rax
- jne SHORT G_M6739_IG07
+ jne SHORT G_M6739_IG14
mov rcx, rbx
; gcrRegs +[rcx]
mov rdx, rbp
@@ -124,10 +152,10 @@ G_M6739_IG14: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=00A8 {r
; gcrRegs -[rcx rdx rbp]
; gcr arg pop 0
test eax, eax
- je SHORT G_M6739_IG07
- jmp SHORT G_M6739_IG12
+ je SHORT G_M6739_IG14
+ jmp SHORT G_M6739_IG15
;; size=48 bbWeight=0 PerfScore 0.00
-G_M6739_IG15: ; bbWeight=0, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byref, isz
+G_M6739_IG18: ; bbWeight=0, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byref
; gcrRegs -[rdi]
mov rcx, rbx
; gcrRegs +[rcx]
@@ -136,17 +164,17 @@ G_M6739_IG15: ; bbWeight=0, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byre
call [rax+0x10]<unknown method>
; gcrRegs -[rcx rbx]
; gcr arg pop 0
- jmp SHORT G_M6739_IG11
- ;; size=18 bbWeight=0 PerfScore 0.00
-G_M6739_IG16: ; bbWeight=0.00, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+ jmp G_M6739_IG08
+ ;; size=21 bbWeight=0 PerfScore 0.00
+G_M6739_IG19: ; bbWeight=0.00, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
xor eax, eax
;; size=2 bbWeight=0.00 PerfScore 0.00
-G_M6739_IG17: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+G_M6739_IG20: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
xor ecx, ecx
cmp eax, 1
cmove eax, ecx
;; size=8 bbWeight=0 PerfScore 0.00
-G_M6739_IG18: ; bbWeight=0, epilog, nogc, extend
+G_M6739_IG21: ; bbWeight=0, epilog, nogc, extend
add rsp, 40
pop rbx
pop rbp
@@ -155,7 +183,7 @@ G_M6739_IG18: ; bbWeight=0, epilog, nogc, extend
ret
;; size=9 bbWeight=0 PerfScore 0.00
-; Total bytes of code 200, prolog size 11, PerfScore 189.68, instruction count 67, allocated bytes for code 200 (MethodHash=2fe8e5ac) for method System.Data.Common.DataStorage:GetStorageType(System.Type):int (Tier1)
+; Total bytes of code 271, prolog size 11, PerfScore 100.44, instruction count 83, allocated bytes for code 271 (MethodHash=2fe8e5ac) for method System.Data.Common.DataStorage:GetStorageType(System.Type):int (Tier1)This one just looks like loop cloning now managed to kick in. |
|
/azp run FUzzlyn |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Recognizing a series of handle compares as a switch is not legal since those handles may change values.
|
cc @dotnet/jit-contrib PTAL @AndyAyersMS Diffs. They are quite mixed, but overall a decent improvement. TP also improves quite a bit. I analyzed some of the regressions above -- a lot of them seems to be that this change results in various different block layout decisions. |
|
|
||
| // We're looking for "X EQ/NE CNS" or "CNS EQ/NE X" pattern | ||
| if (op1->IsCnsIntOrI() ^ op2->IsCnsIntOrI()) | ||
| if ((op1->IsCnsIntOrI() && !op1->IsIconHandle()) ^ (op2->IsCnsIntOrI() && !op2->IsIconHandle())) |
There was a problem hiding this comment.
This was already merged as a separate change in #104634.
src/coreclr/jit/fgopt.cpp
Outdated
| // No point duplicating this block if it would not remove (part of) the joint. | ||
| if ((target->GetTrueTarget() == target) || (target->GetFalseTarget() == target)) | ||
| { | ||
| return false; | ||
| } |
There was a problem hiding this comment.
I was hitting some divergence in fgUpdateFlowGraph without this check. It only has a handful of diffs on its own.
src/coreclr/jit/fgopt.cpp
Outdated
| return false; | ||
| } | ||
|
|
||
| // No point duplicating this block if it would not remove (part of) the joint. |
There was a problem hiding this comment.
| // No point duplicating this block if it would not remove (part of) the joint. | |
| // No point duplicating this block if it would not remove (part of) the join. |
src/coreclr/jit/fgopt.cpp
Outdated
| } | ||
|
|
||
| // No point duplicating this block if it would not remove (part of) the joint. | ||
| if ((target->GetTrueTarget() == target) || (target->GetFalseTarget() == target)) |
There was a problem hiding this comment.
| if ((target->GetTrueTarget() == target) || (target->GetFalseTarget() == target)) | |
| if (target->TrueTargetIs(target) || target->FalseTargetIs(target)) |
|
Thanks for looking into the regressions!
Interesting -- I would hope more compaction wouldn't regress block layout (I'm guessing it's instead pessimizing some other flow opts). I haven't looked closely at the regressions from #103785 though... |
|
I think we can certainly consider taking this. Can you look at this diff in benchmarks_pgo? Curious if it's more cloning or whatnot... |
It doesn't look to be more loop cloning. From the codegen diff it looks like different block layout that leads to different register allocation: https://www.diffchecker.com/O3MjiTIP/ |
|
This diff is interesting (coming from benchmarks.run_pgo.windows.x64.checked.mch), where it stores -G_M58686_IG01: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
- ;; size=0 bbWeight=0 PerfScore 0.00
-G_M58686_IG02: ; bbWeight=0, gcrefRegs=0002 {rcx}, byrefRegs=0000 {}, byref
+G_M58686_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
+ ;; size=0 bbWeight=1 PerfScore 0.00
+G_M58686_IG02: ; bbWeight=1, gcrefRegs=0002 {rcx}, byrefRegs=0000 {}, byref
; gcrRegs +[rcx]
xor eax, eax
mov dword ptr [rcx+0x18], eax
- and dword ptr [rcx+0x18], 0xD1FFAB1E
- ;; size=12 bbWeight=0 PerfScore 0.00
-G_M58686_IG03: ; bbWeight=0, epilog, nogc, extend
+ mov eax, dword ptr [rcx+0x18]
+ and eax, 0xD1FFAB1E
+ mov dword ptr [rcx+0x18], eax
+ ;; size=16 bbWeight=1 PerfScore 4.50
+G_M58686_IG03: ; bbWeight=1, epilog, nogc, extend
ret
- ;; size=1 bbWeight=0 PerfScore 0.00
+ ;; size=1 bbWeight=1 PerfScore 1.00
-; Total bytes of code 13, prolog size 0, PerfScore 0.00, instruction count 4, allocated bytes for code 13 (MethodHash=f3a21ac1) for method System.Threading.Tasks.Task+SetOnInvokeMres:.ctor():this (Tier1)
+; Total bytes of code 17, prolog size 0, PerfScore 5.50, instruction count 6, allocated bytes for code 17 (MethodHash=f3a21ac1) for method System.Threading.Tasks.Task+SetOnInvokeMres:.ctor():this (Tier1) |
Looks like in the diff we did a CSE that stops us from doing the containment. The current collections do the same CSE even with the baseline it seems, so I can't repro it. |
|
Ok, managed to find the right context for the above (turns out my collection was the outdated one). Before flowgraph opts we have these blocks: ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds weight IBC [IL range] [jump] [EH region] [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000] 1 1 100 [000..001)-> BB02(1) (always) i IBC
BB02 [0013] 1 BB01 0 0 [000..001)-> BB06(1) (always) i IBC rare
BB03 [0014] 0 1 100 [000..001)-> BB05(1),BB04(0.000315) ( cond ) i IBC
BB04 [0015] 1 BB03 0.00 0 [000..001)-> BB06(1) (always) i IBC
BB05 [0016] 1 BB03 1.00 100 [000..001)-> BB06(1) (always) i IBC
BB06 [0017] 3 BB02,BB04,BB05 1 100 [000..001)-> BB08(1),BB07(0) ( cond ) i IBC
BB07 [0009] 1 BB06 0 0 [000..001) (throw ) i IBC rare hascall gcsafe
BB08 [0010] 1 BB06 1 100 [000..008)-> BB10(1) (always) i IBC
BB09 [0041] 0 0 0 [000..001)-> BB10(1) (always) i IBC rare hascall gcsafe
BB10 [0042] 2 BB08,BB09 1 100 [000..001)-> BB12(1),BB11(0) ( cond ) i IBC
BB11 [0046] 1 BB10 0 0 [000..001)-> BB12(1) (always) i IBC rare hascall gcsafe
BB12 [0047] 2 BB10,BB11 1 100 [000..009) (return) i IBC
---------------------------------------------------------------------------------------------------------------------------------------------------------------------Looks like in the base the flowgraph optimizations we do mean we end up with all 0 weight basic blocks going into CSE: ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds weight IBC [IL range] [jump] [EH region] [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000] 1 0 0 [000..001)-> BB03(1),BB02(0) ( cond ) i IBC rare
BB02 [0009] 1 BB01 0 0 [000..001) (throw ) i IBC rare hascall gcsafe
BB03 [0010] 1 BB01 0 0 [000..008)-> BB05(1),BB04(0) ( cond ) i IBC rare
BB04 [0046] 1 BB03 0 0 [000..001)-> BB05(1) (always) i IBC rare hascall gcsafe
BB05 [0047] 2 BB03,BB04 0 0 [000..009) (return) i IBC rare
---------------------------------------------------------------------------------------------------------------------------------------------------------------------CSE decides to do no CSE because of the 0 weights. OTOH, with the early folding done by this PR there is just one block going into CSE, which appeared after a bunch of compaction. It seems the result is a different weight that means CSE does end up happening: ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds weight IBC [IL range] [jump] [EH region] [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000] 1 1 100 [000..009) (return) i IBC
---------------------------------------------------------------------------------------------------------------------------------------------------------------------The "inconsistent weights" problem is probably (similar to) #96614. In the end we also ought to recognize the address mode regardless of the intervening local created by CSE (it ends up single-def single-used after a bunch of transformations). That goes under the #104538 umbrella. |
After early jump threading has kicked in we can frequently prove that a branch will go in one direction. This was previously left up to RBO; instead, try to fold it immediately and straighten out the flow before we build SSA, so that we can refine the phis.
Fix #101176
Diffs here are very mixed... Would have expected it to be generally a good thing to straighten out flow like this, but apparently not unequivocally so.