Skip to content

JIT: consider accounting for the ability to remove bounds checks in CSE heuristics #45658

@AndyAyersMS

Description

@AndyAyersMS

In this example

using System;
using System.Runtime.CompilerServices;

class X
{
    int[] data;

    [MethodImpl(MethodImplOptions.NoInlining)]
    public int F()
    {
        if (data[1] == 50)
        {
            data[1] = 100;
        }
        return data[1];
    }

    public static int Main()
    {
        X x = new X();
        x.data = new int[10];
        x.data[1] = 50;
        return x.F();
    }
}

the jit is able to eliminate bounds checks on the second and third data[i] because they're redundant.

; Assembly listing for method X:F():int:this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 this         [V00,T01] (  3,  3   )     ref  ->  rcx         this class-hnd
;  V01 OutArgs      [V01    ] (  1,  1   )  lclBlk (32) [rsp+0x00]   "OutgoingArgSpace"
;  V02 tmp1         [V02,T00] (  3,  6   )     ref  ->  rdx         "arr expr"
;  V03 tmp2         [V03,T04] (  2,  2   )     ref  ->  rdx         "arr expr"
;  V04 tmp3         [V04,T02] (  2,  4   )     ref  ->  rax         "arr expr"
;  V05 cse0         [V05,T03] (  4,  3.50)     ref  ->  rax         "CSE - aggressive"
;  V06 cse1         [V06,T05] (  2,  2   )     int  ->  rcx         "CSE - aggressive"
;
; Lcl frame size = 40

G_M53578_IG01:              ;; offset=0000H
       4883EC28             sub      rsp, 40
                                                ;; bbWeight=1    PerfScore 0.25
G_M53578_IG02:              ;; offset=0004H
       488B4108             mov      rax, gword ptr [rcx+8]
       488BD0               mov      rdx, rax
       8B4A08               mov      ecx, dword ptr [rdx+8]
       83F901               cmp      ecx, 1
       7618                 jbe      SHORT G_M53578_IG06
       837A1432             cmp      dword ptr [rdx+20], 50
       750A                 jne      SHORT G_M53578_IG04
                                                ;; bbWeight=1    PerfScore 8.50
G_M53578_IG03:              ;; offset=0019H
       488BD0               mov      rdx, rax
       C7421464000000       mov      dword ptr [rdx+20], 100
                                                ;; bbWeight=0.50 PerfScore 0.62
G_M53578_IG04:              ;; offset=0023H
       8B4014               mov      eax, dword ptr [rax+20]
                                                ;; bbWeight=1    PerfScore 2.00
G_M53578_IG05:              ;; offset=0026H
       4883C428             add      rsp, 40
       C3                   ret
                                                ;; bbWeight=1    PerfScore 1.25
G_M53578_IG06:              ;; offset=002BH
       E800E8775F           call     CORINFO_HELP_RNGCHKFAIL
       CC                   int3
                                                ;; bbWeight=0    PerfScore 0.00

However, and somewhat unexpectedly, this optimization doesn't happen if CSE is disabled:

; ----------------
; COMPlus_JitNoCSE=1
; ----------------
; Assembly listing for method X:F():int:this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 this         [V00,T00] (  5,  4.50)     ref  ->  rcx         this class-hnd
;  V01 OutArgs      [V01    ] (  1,  1   )  lclBlk (32) [rsp+0x00]   "OutgoingArgSpace"
;  V02 tmp1         [V02,T01] (  3,  6   )     ref  ->  rax         "arr expr"
;  V03 tmp2         [V03,T03] (  3,  3   )     ref  ->  rax         "arr expr"
;  V04 tmp3         [V04,T02] (  3,  6   )     ref  ->  rax         "arr expr"
;
; Lcl frame size = 40

G_M53578_IG01:              ;; offset=0000H
       4883EC28             sub      rsp, 40
                                                ;; bbWeight=1    PerfScore 0.25
G_M53578_IG02:              ;; offset=0004H
       488B4108             mov      rax, gword ptr [rcx+8]
       83780801             cmp      dword ptr [rax+8], 1
       7629                 jbe      SHORT G_M53578_IG06
       83781432             cmp      dword ptr [rax+20], 50
       7511                 jne      SHORT G_M53578_IG04
                                                ;; bbWeight=1    PerfScore 8.00
G_M53578_IG03:              ;; offset=0014H
       488B4108             mov      rax, gword ptr [rcx+8]
       83780801             cmp      dword ptr [rax+8], 1
       7619                 jbe      SHORT G_M53578_IG06
       C7401464000000       mov      dword ptr [rax+20], 100
                                                ;; bbWeight=0.50 PerfScore 3.00
G_M53578_IG04:              ;; offset=0025H
       488B4108             mov      rax, gword ptr [rcx+8]
       83780801             cmp      dword ptr [rax+8], 1
       7608                 jbe      SHORT G_M53578_IG06
       8B4014               mov      eax, dword ptr [rax+20]
                                                ;; bbWeight=1    PerfScore 7.00
G_M53578_IG05:              ;; offset=0032H
       4883C428             add      rsp, 40
       C3                   ret
                                                ;; bbWeight=1    PerfScore 1.25
G_M53578_IG06:              ;; offset=0037H
       E8C4E7785F           call     CORINFO_HELP_RNGCHKFAIL
       CC                   int3
                                                ;; bbWeight=0    PerfScore 0.00

This coupling seems unfortunate because CSEs are now budget dependent, and so enabling more CSEs in a method may now cause us to lose bounds check optimizations.

I suspect it's because the CSE is able to tunnel the initial check's liberal VN down to the subsequent uses, whereas the unoptimized cases see a different liberal VN.

Seems like we should either never eliminate these checks or always eliminate them, regardless of CSE.

cc @briansull @dotnet/jit-contrib

category:cq
theme:cse
skill-level:intermediate
cost:small
impact:medium

Metadata

Metadata

Assignees

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions