Skip to content

Conversation

@EgorBo
Copy link
Member

@EgorBo EgorBo commented Oct 5, 2024

LowerCallMemmove already does a good job unrolling Memmove. Unfortunately, it leaves some opportunities on the table when the source can be a constant (esp, GPR constant), so we can skip a memory load. The reason is that LowerCallMemmove cannot rely on VN for that as it's too late.

void Test_utf16(Span<char> destination) => "True".CopyTo(destination);
void Test_utf8(Span<byte> destination) => "True"u8.CopyTo(destination);

Codegen diff:

; Method Benchmarks:Test_utf16(System.Span`1[ushort]):this (FullOpts)
       sub      rsp, 40
       mov      rax, bword ptr [rdx]
       mov      ecx, dword ptr [rdx+0x08]
       cmp      ecx, 4
       jl       SHORT G_M39687_IG04
-      mov      rcx, 0x21A00204FFC
-      mov      rdx, qword ptr [rcx]
-      mov      qword ptr [rax], rdx
+      mov      rcx, 0x65007500720054
+      mov      qword ptr [rax], rcx
       add      rsp, 40
       ret      
G_M39687_IG04:
       call     [System.ThrowHelper:ThrowArgumentException_DestinationTooShort()]
       int3     
-; Total bytes of code: 43
+; Total bytes of code: 40


; Method Benchmarks:Test_utf8(System.Span`1[ubyte]):this (FullOpts)
       sub      rsp, 40
       mov      rax, bword ptr [rdx]
       mov      ecx, dword ptr [rdx+0x08]
       cmp      ecx, 4
       jl       SHORT G_M38560_IG04
-      mov      rcx, 0x2C391CC5380      ; static handle
-      mov      edx, dword ptr [rcx]
-      mov      dword ptr [rax], edx
+      mov      dword ptr [rax], 0x65757254
       add      rsp, 40
       ret      
G_M38560_IG04:
       call     [System.ThrowHelper:ThrowArgumentException_DestinationTooShort()]
       int3     
-; Total bytes of code: 41
+; Total bytes of code: 33

Diffs

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 5, 2024
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@EgorBo

This comment was marked as resolved.

@EgorBo EgorBo marked this pull request as ready for review October 5, 2024 22:06
@EgorBo

This comment was marked as resolved.

@EgorBo
Copy link
Member Author

EgorBo commented Oct 6, 2024

@MihuBot

@EgorBo
Copy link
Member Author

EgorBo commented Oct 6, 2024

/azp run runtime-coreclr outerloop, runtime-coreclr jitstress, runtime-coreclr pgo

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

Co-authored-by: Jakob Botsch Nielsen <[email protected]>
@EgorBo
Copy link
Member Author

EgorBo commented Oct 7, 2024

PTAL @jakobbotsch @dotnet/jit-contrib
Diffs (678 improved, 370 regressed) there are size "regressions" from the unrollings as expected. Outerloop passed.

@EgorBo EgorBo requested a review from jakobbotsch October 7, 2024 11:08
@EgorBo EgorBo merged commit 667c007 into dotnet:main Oct 8, 2024
@EgorBo EgorBo deleted the opt-memmove-cns branch October 8, 2024 13:04
@xtqqczze
Copy link
Contributor

xtqqczze commented Oct 8, 2024

@MihuBot -arm64

@xtqqczze
Copy link
Contributor

xtqqczze commented Oct 8, 2024

@EgorBo Regressions on arm64?

Method Toolchain Mean Error Ratio
WriteBools Main 604.1 ns 0.33 ns 1.00
WriteBools PR 477.4 ns 0.03 ns 0.79

EgorBot/runtime-utils#110 (comment)

Method Toolchain Mean Error Ratio
WriteBools Main 604.1 ns 0.12 ns 1.00
WriteBools PR 531.0 ns 0.73 ns 0.88

EgorBot/runtime-utils#116 (comment)

@EgorBo
Copy link
Member Author

EgorBo commented Oct 8, 2024

@EgorBo Regressions on arm64?

load might be faster than up to 4 instructions to compose a 64bit constant for "FALS" or "TRUE" utf16 strings on arm64, but it's tricky in the real world, the load might be under contention. Anyway, we'll see once dotnet/performance run finishes

@github-actions github-actions bot locked and limited conversation to collaborators Nov 8, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants