Skip to content

Conversation

@EgorBo
Copy link
Member

@EgorBo EgorBo commented Feb 6, 2025

Currently, when we need to copy a block with GC references (from anywhere to non-heap), we can actually use SIMD if we put the whole copy part under no-GC - this is valid because nobody else is expected to read it from stack in parallel (it's UB). The existing code is a bit conservative in that regard, it only allows blocks <= 64 bytes (and avoids SIMD in favor of slow rep movsq or bulk write barrier).

This is primary needed to address possible performance regressions #112060 may introduce, but it should be a goodness regardless whether that PR lands or not.

NOTE: arm64 has a similar logic already.

Example (I use ref structs here, but it works for regular structs as well):

void Foo(ref MyStruct a, ref MyStruct b)
{
    a = b;
}

[InlineArray(16)]
ref struct MyStruct {
    public string _element0;
}

Main:

; Assembly listing for method Benchmarks:Foo(byref,byref):this (FullOpts)
       sub      rsp, 40
       vzeroupper 
       cmp      byte  ptr [rdx], dl
       mov      rcx, rdx
       cmp      byte  ptr [r8], r8b
       mov      rdx, r8
       mov      r8d, 128
       call     [CORINFO_HELP_BULK_WRITEBARRIER] ;; JIT knows it doesn't need WB, but uses for large sizes
       nop      
       add      rsp, 40
       ret      
; Total bytes of code 36

PR:

; Assembly listing for method Benchmarks:Foo(byref,byref):this (FullOpts)
G_M54163_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
G_M54163_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0104 {rdx r8}, byref, nogc
                            ; byrRegs +[rdx r8]
       vmovdqu32 zmm0, zmmword ptr [r8]
       vmovdqu32 zmmword ptr [rdx], zmm0
       vmovdqu32 zmm0, zmmword ptr [r8+0x40]
       vmovdqu32 zmmword ptr [rdx+0x40], zmm0
G_M54163_IG03:        ; bbWeight=1, epilog, nogc, extend
       vzeroupper 
       ret      
; Total bytes of code 30

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 6, 2025
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@EgorBo
Copy link
Member Author

EgorBo commented Feb 6, 2025

@MihuBot

@EgorBo
Copy link
Member Author

EgorBo commented Feb 6, 2025

PTAL @jakobbotsch @dotnet/jit-contrib Diffs. We either replace rep movsq (super slow) or bulk barrier (slow) with an unrolled SIMD. This is needed for #112060 because due to the conservative behavior we could've ended with double bulk barrier calls

@EgorBo EgorBo requested a review from jakobbotsch February 6, 2025 12:02
@EgorBo EgorBo merged commit 666bb9d into dotnet:main Feb 6, 2025
109 of 112 checks passed
@EgorBo EgorBo deleted the fix-unrolling-for-gc-structs branch February 6, 2025 13:04
@github-actions github-actions bot locked and limited conversation to collaborators Mar 9, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants