-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Description
It's a quite popular case when we compare a string against some predefined one using Ordinal/OrdinalIgnoreCase case (I found plenty of cases in dotnet/aspnetcore and dotnet/runtime repos). E.g. patterns like:
* string.Equals(str, "cns", StringComparison.OrdinalIgnoreCase); // or just Ordinal
* str.Equals("cns", StringComparison.OrdinalIgnoreCase); // or just Ordinal
* StringComparer.OrdinalIgnoreCase.Equals(str, "cns"); // or just OrdinalSo Roslyn/ILLink/JIT (perhaps, it should be JIT to handle more cases after inlining + Roslyn/ILLink know nothing about target arch) could optimize such comparisons by inlining a more optimized check, e.g. here is what we can do for small strings (1-4 chars) keeping in mind that strings are 8-bytes aligned (for simplicity I only cared about AMD64 arch):
[Benchmark(Baseline = true)]
[Arguments("true")]
[Arguments("TRUE")]
public bool StringEquals(string str)
{
return string.Equals(str, "True", StringComparison.OrdinalIgnoreCase);
}
[Benchmark]
[Arguments("true")]
[Arguments("TRUE")]
public bool Inlined(string str)
{
return object.ReferenceEquals(str, "True") ||
(str.Length == 4 &&
// string's content fits into 64bit register, the following code looks awful
// but it's a simple 'or + cmp' in the codegen
(Unsafe.ReadUnaligned<ulong>(ref Unsafe.As<char, byte>(
ref MemoryMarshal.GetReference(str.AsSpan()))) | 0x0020002000200020 /*'to upper' since the const arg doesn't contain anything but [a..Z]*/) == 0x65007500720074);
}| Method | str | Mean | Error | StdDev | Ratio |
|------------- |----- |----------:|----------:|----------:|------:|
| StringEquals | TRUE | 4.1723 ns | 0.0023 ns | 0.0020 ns | 1.00 |
| Inlined | TRUE | 0.3416 ns | 0.0024 ns | 0.0022 ns | 0.08 |
| | | | | | |
| StringEquals | true | 4.1718 ns | 0.0023 ns | 0.0021 ns | 1.00 |
| Inlined | true | 0.3406 ns | 0.0019 ns | 0.0016 ns | 0.08 |
We also can inline SIMD stuff for longer strings, e.g. here I check that an http header is "Proxy-Authenticate" (18 chars) using two AVX2 vectors: https://gist.github.com/EgorBo/c8e8490ddd6f9a0d5b72c413ddd81d44
| Method | headerName | Mean | Error | StdDev | Ratio | Gen 0 | Gen 1 | Gen 2 | Allocated |
|----------------- |------------------- |----------:|----------:|----------:|------:|------:|------:|------:|----------:|
| StringEqauls | PROXY-AUTHENTICATE | 10.296 ns | 0.0087 ns | 0.0078 ns | 1.00 | - | - | - | - |
| StringEqauls_AVX | PROXY-AUTHENTICATE | 2.560 ns | 0.0042 ns | 0.0038 ns | 0.25 | - | - | - | - |
| | | | | | | | | | |
| StringEqauls | proxy-authenticate | 10.298 ns | 0.0078 ns | 0.0065 ns | 1.00 | - | - | - | - |
| StringEqauls_AVX | proxy-authenticate | 2.563 ns | 0.0071 ns | 0.0066 ns | 0.25 | - | - | - | - |
when the input string is let's say 30 bytes the results are even better:
| Method | headerName | Mean | Error | StdDev | Ratio | Gen 0 | Gen 1 | Gen 2 | Allocated |
|----------------- |--------------------- |----------:|----------:|----------:|------:|------:|------:|------:|----------:|
| StringEqauls | PROXY(...)WORLD [30] | 15.396 ns | 0.0141 ns | 0.0117 ns | 1.00 | - | - | - | - |
| StringEqauls_AVX | PROXY(...)WORLD [30] | 2.558 ns | 0.0010 ns | 0.0008 ns | 0.17 | - | - | - | - |
| | | | | | | | | | |
| StringEqauls | proxy(...)World [30] | 14.997 ns | 0.0089 ns | 0.0079 ns | 1.00 | - | - | - | - |
| StringEqauls_AVX | proxy(...)World [30] | 2.550 ns | 0.0038 ns | 0.0030 ns | 0.17 | - | - | - | - |
So for [0..32] chars (string.Length) we can emit an inlined super-fast comparison:
[0..4]: using a single 64bit GP register
[5..8]: using two 64bit GP registers
[9..16]: using two 128bit vectors
[17..32]: using two 256bit vectors
[33...]: leave as is.