Skip to content

RyuJIT: Optimize "X / POW2_CNS" via cmovns #41549

@EgorBo

Description

@EgorBo
int Test(int x) => x / 4;

Current codegen:

       8BC1                 mov      eax, ecx
       C1F81F               sar      eax, 31
       83E003               and      eax, 3
       03C1                 add      eax, ecx
       C1F802               sar      eax, 2
       C3                   ret      
; Total bytes of code: 14

Expected codegen:

       8D4103               lea      eax, [rcx+3]
       85C9                 test     ecx, ecx
       0F49C1               cmovns   eax, ecx
       C1F802               sar      eax, 2
       C3                   ret      
; Total bytes of code: 12

This micro-peephole-optimization was added recently in LLVM (PR, see BuildSDIVPow2), see godbolt: https://godbolt.org/z/M153rj
My working (ugly) prototype for RyuJIT: EgorBo@9b1d149 (I believe it should be done the other way - I'd introduce a GT_SELECT/GT_CMOV operator so we can later use it for other cmov-based optimizations, e.g. remove branches)

Benchmark

[Benchmark]
public void Test()
{
    for (int i = 0; i < 10000; i++)
        Consume(i / 4 + i / 8 + i / 16 + i / 32);
}

[MethodImpl(MethodImplOptions.NoInlining)]
static void Consume(int x) { }
          | Method |     Mean |    Error |   StdDev |
          |------- |---------:|---------:|---------:|
   master |   Test | 19.14 us | 0.017 us | 0.013 us |
prototype |   Test | 17.02 us | 0.016 us | 0.013 us | ~11% faster

/cc @AntonLapounov

category:cq
theme:basic-cq
skill-level:beginner
cost:small
impact:small

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMItenet-performancePerformance related issue

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions