Skip to content

Conversation

@AndyAyersMS
Copy link
Member

Handle simple side effects in the box; br pattern based optimization.

Fixes #1713.

Handle simple side effects in the `box; br` pattern based optimization.

Fixes dotnet#1713.
@maryamariyan maryamariyan added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 15, 2020
@AndyAyersMS
Copy link
Member Author

cc @dotnet/jit-contrib @jkotas

Seems like an important enough case to warrant handling in the jit, despite the small number of impacted methods. Does not hit outside of SPC.

TIER0

PMI CodeSize Diffs for System.Private.CoreLib.dll [tier0] for x64 default jit
Summary of Code Size diffs:
(Lower is better)
Total bytes of diff: -60 (-0.00% of base)
    diff is an improvement.
Top file improvements (bytes):
         -60 : System.Private.CoreLib.dasm (-0.00% of base)
1 total files with Code Size differences (1 improved, 0 regressed), 0 unchanged.
Top method improvements (bytes):
         -21 (-6.86% of base) : System.Private.CoreLib.dasm - GenericEqualityComparer`1:IndexOf(ref,Vector`1,int,int):int:this
         -21 (-6.82% of base) : System.Private.CoreLib.dasm - GenericEqualityComparer`1:LastIndexOf(ref,Vector`1,int,int):int:this
          -9 (-3.23% of base) : System.Private.CoreLib.dasm - AsyncMethodBuilderCore:Start(byref)
          -5 (-0.21% of base) : System.Private.CoreLib.dasm - StringBuilder:AppendJoinCore(long,int,ref):StringBuilder:this (7 methods)
          -2 (-0.65% of base) : System.Private.CoreLib.dasm - ObjectEqualityComparer`1:IndexOf(ref,Vector`1,int,int):int:this
          -2 (-0.65% of base) : System.Private.CoreLib.dasm - ObjectEqualityComparer`1:LastIndexOf(ref,Vector`1,int,int):int:this
Top method improvements (percentages):
         -21 (-6.86% of base) : System.Private.CoreLib.dasm - GenericEqualityComparer`1:IndexOf(ref,Vector`1,int,int):int:this
         -21 (-6.82% of base) : System.Private.CoreLib.dasm - GenericEqualityComparer`1:LastIndexOf(ref,Vector`1,int,int):int:this
          -9 (-3.23% of base) : System.Private.CoreLib.dasm - AsyncMethodBuilderCore:Start(byref)
          -2 (-0.65% of base) : System.Private.CoreLib.dasm - ObjectEqualityComparer`1:IndexOf(ref,Vector`1,int,int):int:this
          -2 (-0.65% of base) : System.Private.CoreLib.dasm - ObjectEqualityComparer`1:LastIndexOf(ref,Vector`1,int,int):int:this
          -5 (-0.21% of base) : System.Private.CoreLib.dasm - StringBuilder:AppendJoinCore(long,int,ref):StringBuilder:this (7 methods)
6 total methods with Code Size differences (6 improved, 0 regressed), 21938 unchanged.

TIER1

Tried a couple of experiments to mitigate the size regressions, but no luck. They are small.

PMI CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies for x64 default jit
Summary of Code Size diffs:
(Lower is better)
Total bytes of diff: 74 (0.00% of base)
    diff is a regression.
Top file regressions (bytes):
          74 : System.Private.CoreLib.dasm (0.00% of base)
1 total files with Code Size differences (0 improved, 1 regressed), 164 unchanged.
Top method regressions (bytes):
          29 ( 7.36% of base) : System.Private.CoreLib.dasm - GenericArraySortHelper`1:SwapIfGreater(Span`1,int,int) (5 methods)
          28 ( 1.77% of base) : System.Private.CoreLib.dasm - GenericArraySortHelper`2:DownHeap(Span`1,Span`1,int,int,int) (5 methods)
          23 ( 2.31% of base) : System.Private.CoreLib.dasm - GenericArraySortHelper`1:DownHeap(Span`1,int,int,int) (5 methods)
           9 ( 1.52% of base) : System.Private.CoreLib.dasm - GenericArraySortHelper`2:SwapIfGreaterWithValues(Span`1,Span`1,int,int) (5 methods)
           5 ( 0.31% of base) : System.Private.CoreLib.dasm - StringBuilder:AppendJoinCore(long,int,ref):StringBuilder:this (7 methods)
Top method improvements (bytes):
          -9 (-5.06% of base) : System.Private.CoreLib.dasm - ObjectEqualityComparer`1:IndexOf(ref,Vector`1,int,int):int:this
          -9 (-5.00% of base) : System.Private.CoreLib.dasm - ObjectEqualityComparer`1:LastIndexOf(ref,Vector`1,int,int):int:this
          -2 (-0.88% of base) : System.Private.CoreLib.dasm - AsyncMethodBuilderCore:Start(byref)
Top method regressions (percentages):
          29 ( 7.36% of base) : System.Private.CoreLib.dasm - GenericArraySortHelper`1:SwapIfGreater(Span`1,int,int) (5 methods)
          23 ( 2.31% of base) : System.Private.CoreLib.dasm - GenericArraySortHelper`1:DownHeap(Span`1,int,int,int) (5 methods)
          28 ( 1.77% of base) : System.Private.CoreLib.dasm - GenericArraySortHelper`2:DownHeap(Span`1,Span`1,int,int,int) (5 methods)
           9 ( 1.52% of base) : System.Private.CoreLib.dasm - GenericArraySortHelper`2:SwapIfGreaterWithValues(Span`1,Span`1,int,int) (5 methods)
           5 ( 0.31% of base) : System.Private.CoreLib.dasm - StringBuilder:AppendJoinCore(long,int,ref):StringBuilder:this (7 methods)
Top method improvements (percentages):
          -9 (-5.06% of base) : System.Private.CoreLib.dasm - ObjectEqualityComparer`1:IndexOf(ref,Vector`1,int,int):int:this
          -9 (-5.00% of base) : System.Private.CoreLib.dasm - ObjectEqualityComparer`1:LastIndexOf(ref,Vector`1,int,int):int:this
          -2 (-0.88% of base) : System.Private.CoreLib.dasm - AsyncMethodBuilderCore:Start(byref)
8 total methods with Code Size differences (3 improved, 5 regressed), 244488 unchanged.

@jkotas
Copy link
Member

jkotas commented Jan 15, 2020

What do the size regressions look like?

Copy link
Contributor

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AndyAyersMS
Copy link
Member Author

Here is a typical diff, from GenericArraySortHelper`2:DownHeap: we CSE an address rather than a load from that address. Happens twice in this method, partially counteracted by one less register save.

;;; before
;;; r11 is a span's pointer field, ecx is the span length, r15 the index

G_M45276_IG05:
       lea      r14d, [rax+rbp]
       lea      r15d, [r14-1]
       cmp      r15d, ecx
       jae      G_M45276_IG13
       movsxd   r15, r15d
       mov      r15, qword ptr [r11+8*r15]
       cmp      r14d, ecx
       jae      G_M45276_IG13
       movsxd   r14, r14d
       mov      r14, qword ptr [r11+8*r14]
       cmp      r15, r14
       jl       SHORT G_M45276_IG08
						;; bbWeight=2    PerfScore 18.50
G_M45276_IG06:
       cmp      r15, r14
       jle      SHORT G_M45276_IG09
				
;;; after
G_M45276_IG05:
       lea      r14d, [rax+rbp]
       lea      r15d, [r14-1]
       cmp      r15d, ecx
       jae      G_M45276_IG13
       movsxd   r15, r15d
       lea      r15, bword ptr [r11+8*r15]
       cmp      dword ptr [r15], r15d           // residual null check
       cmp      r14d, ecx
       jae      G_M45276_IG13
       movsxd   r14, r14d
       mov      r14, qword ptr [r11+8*r14]
       cmp      qword ptr [r15], r14
       jl       SHORT G_M45276_IG08
						;; bbWeight=2    PerfScore 23.00
G_M45276_IG06:
       cmp      qword ptr [r15], r14
       jle      SHORT G_M45276_IG09

Without this change, at Tier1, the box;br .. sequence gets handled by the general importer box ops which express the nullcheck as an unconsumed indir; this happens to provide a dominating cse def for the subsequent loads.

So sometimes there's a benefit to a GT_IND based nullcheck, sometimes there's a benefit to GT_NULLCHECK; we can't tell locally which might end up working out better, and we can't normalize to whichever form is better downstream.

@AndyAyersMS AndyAyersMS merged commit 882c087 into dotnet:master Jan 15, 2020
@AndyAyersMS AndyAyersMS deleted the BoxPatternInvestigation branch January 15, 2020 18:35
@AndyAyersMS AndyAyersMS added optimization tenet-performance Performance related issue labels Mar 7, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 11, 2020
MichalStrehovsky pushed a commit to MichalStrehovsky/runtime that referenced this pull request Dec 9, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization tenet-performance Performance related issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tiered compilation fails to elide value type null check

4 participants