-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Open
Labels
area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMICLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMItenet-performancePerformance related issuePerformance related issue
Milestone
Description
int Test(int x) => x / 4;Current codegen:
8BC1 mov eax, ecx
C1F81F sar eax, 31
83E003 and eax, 3
03C1 add eax, ecx
C1F802 sar eax, 2
C3 ret
; Total bytes of code: 14Expected codegen:
8D4103 lea eax, [rcx+3]
85C9 test ecx, ecx
0F49C1 cmovns eax, ecx
C1F802 sar eax, 2
C3 ret
; Total bytes of code: 12This micro-peephole-optimization was added recently in LLVM (PR, see BuildSDIVPow2), see godbolt: https://godbolt.org/z/M153rj
My working (ugly) prototype for RyuJIT: EgorBo@9b1d149 (I believe it should be done the other way - I'd introduce a GT_SELECT/GT_CMOV operator so we can later use it for other cmov-based optimizations, e.g. remove branches)
Benchmark
[Benchmark]
public void Test()
{
for (int i = 0; i < 10000; i++)
Consume(i / 4 + i / 8 + i / 16 + i / 32);
}
[MethodImpl(MethodImplOptions.NoInlining)]
static void Consume(int x) { } | Method | Mean | Error | StdDev |
|------- |---------:|---------:|---------:|
master | Test | 19.14 us | 0.017 us | 0.013 us |
prototype | Test | 17.02 us | 0.016 us | 0.013 us | ~11% faster
/cc @AntonLapounov
category:cq
theme:basic-cq
skill-level:beginner
cost:small
impact:small
TherzokTherzok, HFadeel, AlgorithmsAreCool and PaulusParssinen
Metadata
Metadata
Assignees
Labels
area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMICLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMItenet-performancePerformance related issuePerformance related issue