Skip to content

Simplify Span.Slice(int, int) bounds check for improved x64 codegen #119689

@rameel

Description

@rameel

All examples: Godbolt

The current implementation of Slice(int, int) uses a platform-specific bounds check:

#if TARGET_64BIT
// Since start and length are both 32-bit, their sum can be computed across a 64-bit domain
// without loss of fidelity. The cast to uint before the cast to ulong ensures that the
// extension from 32- to 64-bit is zero-extending rather than sign-extending. The end result
// of this is that if either input is negative or if the input sum overflows past Int32.MaxValue,
// that information is captured correctly in the comparison against the backing _length field.
// We don't use this same mechanism in a 32-bit process due to the overhead of 64-bit arithmetic.
if ((ulong)(uint)start + (ulong)(uint)length > (ulong)(uint)_length)
ThrowHelper.ThrowArgumentOutOfRangeException();
#else
if ((uint)start > (uint)_length || (uint)length > (uint)(_length - start))
ThrowHelper.ThrowArgumentOutOfRangeException();
#endif

The x64 version has a more complex single condition, which JIT often fails to optimize effectively even when prior assumptions are available, leading to larger code compared to the x86 version.

The x86 version with its simpler and separate conditions is much more friendly to the JIT optimizations like range analysis.

One case where the x64 version generates better code: is when the JIT knows nothing about either start and length. Otherwise, it produces either identical code or suboptimal code with redundant checks.

For example, consider this common slicing pattern:

span[index..];

which desugars roughly to:

span.Slice(index, s.Length - index);

On x64, the generated assembly looks like this:

       push     rax
       lea      edx, [rsi-0x02]
       mov      eax, edx
       add      rax, 2
       mov      ecx, esi
       cmp      rax, rcx
       ja       SHORT G_M40604_IG04
       lea      rax, bword ptr [rdi+0x04]
       add      rsp, 8
       ret
G_M40604_IG04:
       call     [System.ThrowHelper:ThrowArgumentOutOfRangeException()]
       int3
; Total bytes of code 33, instruction count 12

A simplified bounds check (mirroring the x86 style) could reduce this to:

       push     rax
       lea      edx, [rsi-0x02]
       cmp      esi, 2
       jl       SHORT G_M9789_IG04
       lea      rax, bword ptr [rdi+0x04]
       add      rsp, 8
       ret
G_M9789_IG04:
       call     [Program:<Slice>g__Error_OutOfRange|25_0[char]()]
       int3
; Total bytes of code 25, instruction count 9

The difference becomes even more when Slice is guarded where the JIT can leverage the outer check:

if (span.Length >= 2)
{
    return span.Slice(2, s.Length - 2);
}
else
{
    return default;
}

The current x64 code generates:

       push     rbp
       mov      rbp, rsp
       cmp      esi, 2
       jl       SHORT G_M588_IG05
       lea      edx, [rsi-0x02]
       mov      eax, edx
       add      rax, 2
       mov      ecx, esi
       cmp      rax, rcx
       ja       SHORT G_M588_IG07
       lea      rax, bword ptr [rdi+0x04]
       pop      rbp
       ret
G_M588_IG05:
       xor      rax, rax
       xor      edx, edx
       pop      rbp
       ret
G_M588_IG07:
       call     [System.ThrowHelper:ThrowArgumentOutOfRangeException()]
       int3
; Total bytes of code 44, instruction count 19

With a simplified check, the JIT eliminates the redundant conditions entirely:

       push     rax
       cmp      esi, 2
       jl       SHORT G_M46701_IG05
       lea      edx, [rsi-0x02]
       lea      rax, bword ptr [rdi+0x04]
       add      rsp, 8
       ret
G_M46701_IG05:
       xor      rax, rax
       xor      edx, edx
       add      rsp, 8
       ret
; Total bytes of code 27, instruction count 11

In summary, the x64-specific check can sometimes produce larger and less efficient code compared to the x86 version, especially when the JIT struggles to optimize the complex condition. The x86-style checks, with their simpler and separate conditions, align more closely with common usage patterns, such as guard conditions that may precede the method call. This alignment enables the JIT to better recognize and eliminate redundant checks. In contrast, the complex x64 condition may obscure these opportunities, and limiting the JIT's ability to optimize effectively.

All examples: Godbolt

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions