-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Description
All examples: Godbolt
The current implementation of Slice(int, int) uses a platform-specific bounds check:
runtime/src/libraries/System.Private.CoreLib/src/System/Span.cs
Lines 414 to 426 in b09c933
| #if TARGET_64BIT | |
| // Since start and length are both 32-bit, their sum can be computed across a 64-bit domain | |
| // without loss of fidelity. The cast to uint before the cast to ulong ensures that the | |
| // extension from 32- to 64-bit is zero-extending rather than sign-extending. The end result | |
| // of this is that if either input is negative or if the input sum overflows past Int32.MaxValue, | |
| // that information is captured correctly in the comparison against the backing _length field. | |
| // We don't use this same mechanism in a 32-bit process due to the overhead of 64-bit arithmetic. | |
| if ((ulong)(uint)start + (ulong)(uint)length > (ulong)(uint)_length) | |
| ThrowHelper.ThrowArgumentOutOfRangeException(); | |
| #else | |
| if ((uint)start > (uint)_length || (uint)length > (uint)(_length - start)) | |
| ThrowHelper.ThrowArgumentOutOfRangeException(); | |
| #endif |
The x64 version has a more complex single condition, which JIT often fails to optimize effectively even when prior assumptions are available, leading to larger code compared to the x86 version.
The x86 version with its simpler and separate conditions is much more friendly to the JIT optimizations like range analysis.
One case where the x64 version generates better code: is when the JIT knows nothing about either start and length. Otherwise, it produces either identical code or suboptimal code with redundant checks.
For example, consider this common slicing pattern:
span[index..];which desugars roughly to:
span.Slice(index, s.Length - index);On x64, the generated assembly looks like this:
push rax
lea edx, [rsi-0x02]
mov eax, edx
add rax, 2
mov ecx, esi
cmp rax, rcx
ja SHORT G_M40604_IG04
lea rax, bword ptr [rdi+0x04]
add rsp, 8
ret
G_M40604_IG04:
call [System.ThrowHelper:ThrowArgumentOutOfRangeException()]
int3
; Total bytes of code 33, instruction count 12A simplified bounds check (mirroring the x86 style) could reduce this to:
push rax
lea edx, [rsi-0x02]
cmp esi, 2
jl SHORT G_M9789_IG04
lea rax, bword ptr [rdi+0x04]
add rsp, 8
ret
G_M9789_IG04:
call [Program:<Slice>g__Error_OutOfRange|25_0[char]()]
int3
; Total bytes of code 25, instruction count 9The difference becomes even more when Slice is guarded where the JIT can leverage the outer check:
if (span.Length >= 2)
{
return span.Slice(2, s.Length - 2);
}
else
{
return default;
}The current x64 code generates:
push rbp
mov rbp, rsp
cmp esi, 2
jl SHORT G_M588_IG05
lea edx, [rsi-0x02]
mov eax, edx
add rax, 2
mov ecx, esi
cmp rax, rcx
ja SHORT G_M588_IG07
lea rax, bword ptr [rdi+0x04]
pop rbp
ret
G_M588_IG05:
xor rax, rax
xor edx, edx
pop rbp
ret
G_M588_IG07:
call [System.ThrowHelper:ThrowArgumentOutOfRangeException()]
int3
; Total bytes of code 44, instruction count 19With a simplified check, the JIT eliminates the redundant conditions entirely:
push rax
cmp esi, 2
jl SHORT G_M46701_IG05
lea edx, [rsi-0x02]
lea rax, bword ptr [rdi+0x04]
add rsp, 8
ret
G_M46701_IG05:
xor rax, rax
xor edx, edx
add rsp, 8
ret
; Total bytes of code 27, instruction count 11In summary, the x64-specific check can sometimes produce larger and less efficient code compared to the x86 version, especially when the JIT struggles to optimize the complex condition. The x86-style checks, with their simpler and separate conditions, align more closely with common usage patterns, such as guard conditions that may precede the method call. This alignment enables the JIT to better recognize and eliminate redundant checks. In contrast, the complex x64 condition may obscure these opportunities, and limiting the JIT's ability to optimize effectively.
All examples: Godbolt