Skip to content

Conversation

@EgorBo
Copy link
Member

@EgorBo EgorBo commented Apr 9, 2022

Use a slightly faster "no matches" check in IndexOf and IndexOfAny.

Benchmark:

public class Benchmarks
{
    public IEnumerable<object[]> TestData()
    {
        // Small inputs which are handled via SIMD 
        yield return new object[] { "12345678", '1' };
        yield return new object[] { "123456789", '9' };
        yield return new object[] { "1234567812345678", '1' };
        yield return new object[] { "1234567812345678", '0' };
        yield return new object[] { "Console WriteLine Hello World", 'o' };
        yield return new object[] { "Console WriteLine Hello World", 'd' };

        // Large inputs 
        yield return new object[] { new string('x', 64), 'y' };
        yield return new object[] { new string('x', 200), 'y' };
        yield return new object[] { new string('x', 1000), 'y' };
        yield return new object[] { new string('x', 1000000), 'y' };
    }

    [Benchmark]
    [ArgumentsSource(nameof(TestData))]
    public int IndexOf_byte(string str, char c)
    {
        return MemoryMarshal.Cast<char,byte>(str.AsSpan()).IndexOf((byte)c);
    }

    [Benchmark]
    [ArgumentsSource(nameof(TestData))]
    public int IndexOf_char(string str, char c)
    {
        return str.AsSpan().IndexOf(c);
    }
}

IndexOf_byte

|  Method |               Toolchain |                  str | c |          Mean |      Error |     StdDev | Ratio |
|-------- |------------------------ |--------------------- |-- |--------------:|-----------:|-----------:|------:|
| IndexOf |      /Core_Root/corerun |             12345678 | 1 |      1.692 ns |  0.0022 ns |  0.0020 ns |  1.00 |
| IndexOf | /Core_Root_base/corerun |             12345678 | 1 |      1.686 ns |  0.0017 ns |  0.0013 ns |  1.00 |
|         |                         |                      |   |               |            |            |       |
| IndexOf |      /Core_Root/corerun |     1234567812345678 | 0 |     10.262 ns |  0.0127 ns |  0.0119 ns |  1.78 |
| IndexOf | /Core_Root_base/corerun |     1234567812345678 | 0 |      5.763 ns |  0.0030 ns |  0.0025 ns |  1.00 |
|         |                         |                      |   |               |            |            |       |
| IndexOf |      /Core_Root/corerun |     1234567812345678 | 1 |      1.676 ns |  0.0032 ns |  0.0030 ns |  1.01 |
| IndexOf | /Core_Root_base/corerun |     1234567812345678 | 1 |      1.665 ns |  0.0022 ns |  0.0020 ns |  1.00 |
|         |                         |                      |   |               |            |            |       |
| IndexOf |      /Core_Root/corerun |            123456789 | 9 |      4.197 ns |  0.0089 ns |  0.0083 ns |  1.00 |
| IndexOf | /Core_Root_base/corerun |            123456789 | 9 |      4.193 ns |  0.0032 ns |  0.0029 ns |  1.00 |
|         |                         |                      |   |               |            |            |       |
| IndexOf |      /Core_Root/corerun | Conso(...)World [29] | d |      7.010 ns |  0.0049 ns |  0.0043 ns |  1.00 |
| IndexOf | /Core_Root_base/corerun | Conso(...)World [29] | d |      7.014 ns |  0.0018 ns |  0.0017 ns |  1.00 |
|         |                         |                      |   |               |            |            |       |
| IndexOf |      /Core_Root/corerun | Conso(...)World [29] | o |      1.679 ns |  0.0006 ns |  0.0006 ns |  1.00 |
| IndexOf | /Core_Root_base/corerun | Conso(...)World [29] | o |      1.678 ns |  0.0008 ns |  0.0007 ns |  1.00 |
|         |                         |                      |   |               |            |            |       |
| IndexOf |      /Core_Root/corerun | xxxxx(...)xxxxx [64] | y |      7.322 ns |  0.0028 ns |  0.0024 ns |  0.80 |
| IndexOf | /Core_Root_base/corerun | xxxxx(...)xxxxx [64] | y |      9.204 ns |  0.0032 ns |  0.0029 ns |  1.00 |
|         |                         |                      |   |               |            |            |       |
| IndexOf |      /Core_Root/corerun |  xxxx(...)xxxx [200] | y |     13.269 ns |  0.0024 ns |  0.0022 ns |  0.62 |
| IndexOf | /Core_Root_base/corerun |  xxxx(...)xxxx [200] | y |     21.317 ns |  0.2285 ns |  0.2138 ns |  1.00 |
|         |                         |                      |   |               |            |            |       |
| IndexOf |      /Core_Root/corerun | xxxx(...)xxxx [1000] | y |     58.107 ns |  0.0458 ns |  0.0429 ns |  0.65 |
| IndexOf | /Core_Root_base/corerun | xxxx(...)xxxx [1000] | y |     89.510 ns |  0.0272 ns |  0.0241 ns |  1.00 |
|         |                         |                      |   |               |            |            |       |
| IndexOf |      /Core_Root/corerun |  xx(...)xx [1000000] | y | 44,056.624 ns |  9.0990 ns |  8.5112 ns |  0.56 |
| IndexOf | /Core_Root_base/corerun |  xx(...)xx [1000000] | y | 78,269.573 ns | 26.4061 ns | 22.0503 ns |  1.00 |

IndexOf_char

|  Method |               Toolchain |                  str | c |          Mean |      Error |     StdDev | Ratio |
|-------- |------------------------ |--------------------- |-- |--------------:|-----------:|-----------:|------:|
| IndexOf |      /Core_Root/corerun |             12345678 | 1 |      1.382 ns |  0.0020 ns |  0.0018 ns |  1.00 |
| IndexOf | /Core_Root_base/corerun |             12345678 | 1 |      1.382 ns |  0.0023 ns |  0.0021 ns |  1.00 |
|         |                         |                      |   |               |            |            |       |
| IndexOf |      /Core_Root/corerun |     1234567812345678 | 0 |     11.378 ns |  0.0141 ns |  0.0132 ns |  1.07 |
| IndexOf | /Core_Root_base/corerun |     1234567812345678 | 0 |     10.634 ns |  0.0240 ns |  0.0187 ns |  1.00 |
|         |                         |                      |   |               |            |            |       |
| IndexOf |      /Core_Root/corerun |     1234567812345678 | 1 |      1.375 ns |  0.0016 ns |  0.0014 ns |  1.01 |
| IndexOf | /Core_Root_base/corerun |     1234567812345678 | 1 |      1.366 ns |  0.0034 ns |  0.0030 ns |  1.00 |
|         |                         |                      |   |               |            |            |       |
| IndexOf |      /Core_Root/corerun |            123456789 | 9 |      2.612 ns |  0.0014 ns |  0.0011 ns |  1.00 |
| IndexOf | /Core_Root_base/corerun |            123456789 | 9 |      2.615 ns |  0.0029 ns |  0.0027 ns |  1.00 |
|         |                         |                      |   |               |            |            |       |
| IndexOf |      /Core_Root/corerun | Conso(...)World [29] | d |      5.784 ns |  0.0043 ns |  0.0038 ns |  0.91 |
| IndexOf | /Core_Root_base/corerun | Conso(...)World [29] | d |      6.389 ns |  0.0059 ns |  0.0052 ns |  1.00 |
|         |                         |                      |   |               |            |            |       |
| IndexOf |      /Core_Root/corerun | Conso(...)World [29] | o |      1.689 ns |  0.0016 ns |  0.0014 ns |  1.00 |
| IndexOf | /Core_Root_base/corerun | Conso(...)World [29] | o |      1.694 ns |  0.0007 ns |  0.0007 ns |  1.00 |
|         |                         |                      |   |               |            |            |       |
| IndexOf |      /Core_Root/corerun | xxxxx(...)xxxxx [64] | y |      7.009 ns |  0.0025 ns |  0.0023 ns |  0.76 |
| IndexOf | /Core_Root_base/corerun | xxxxx(...)xxxxx [64] | y |      9.204 ns |  0.0025 ns |  0.0023 ns |  1.00 |
|         |                         |                      |   |               |            |            |       |
| IndexOf |      /Core_Root/corerun |  xxxx(...)xxxx [200] | y |     14.192 ns |  0.0070 ns |  0.0062 ns |  0.67 |
| IndexOf | /Core_Root_base/corerun |  xxxx(...)xxxx [200] | y |     21.267 ns |  0.2695 ns |  0.2521 ns |  1.00 |
|         |                         |                      |   |               |            |            |       |
| IndexOf |      /Core_Root/corerun | xxxx(...)xxxx [1000] | y |     64.790 ns |  0.2031 ns |  0.1800 ns |  0.72 |
| IndexOf | /Core_Root_base/corerun | xxxx(...)xxxx [1000] | y |     89.485 ns |  0.0196 ns |  0.0174 ns |  1.00 |
|         |                         |                      |   |               |            |            |       |
| IndexOf |      /Core_Root/corerun |  xx(...)xx [1000000] | y | 60,098.282 ns |  3.1361 ns |  2.7801 ns |  0.77 |
| IndexOf | /Core_Root_base/corerun |  xx(...)xx [1000000] | y | 78,264.715 ns | 20.9694 ns | 19.6148 ns |  1.00 |

up to +40% faster for large inputs can 0.5-1ns regress in the worst case - if input is found in the very first vector.

There are weird regressions

| IndexOf |      /Core_Root/corerun |     1234567812345678 | 0 |     10.262 ns |  0.0127 ns |  0.0119 ns |  1.78 |
| IndexOf | /Core_Root_base/corerun |     1234567812345678 | 0 |      5.763 ns |  0.0030 ns |  0.0025 ns |  1.00 |

| IndexOf |      /Core_Root/corerun |     1234567812345678 | 0 |     11.378 ns |  0.0141 ns |  0.0132 ns |  1.07 |
| IndexOf | /Core_Root_base/corerun |     1234567812345678 | 0 |     10.634 ns |  0.0240 ns |  0.0187 ns |  1.00 |

but it goes away in FullPGO mode (and the PR becomes faster than the base as expected in this case) - it means the loop is not properly aligned by default. So the regression will be fixed once we update *.mibc for corelib (IndexOf definitely participates in the trainings)

@ghost ghost assigned EgorBo Apr 9, 2022
@ghost ghost added the area-System.Memory label Apr 9, 2022
@ghost
Copy link

ghost commented Apr 9, 2022

Tagging subscribers to this area: @dotnet/area-system-memory
See info in area-owners.md if you want to be subscribed.

Issue Details

Use a slightly faster "no matches" check in IndexOf and IndexOfAny.

Benchmark:

public class Benchmarks
{
    public IEnumerable<object[]> TestData()
    {
        // Small inputs which are handled via SIMD 
        yield return new object[] { "12345678", '1' };
        yield return new object[] { "123456789", '9' };
        yield return new object[] { "1234567812345678", '1' };
        yield return new object[] { "1234567812345678", '0' };
        yield return new object[] { "Console WriteLine Hello World", 'o' };
        yield return new object[] { "Console WriteLine Hello World", 'd' };

        // Large inputs 
        yield return new object[] { new string('x', 64), 'y' };
        yield return new object[] { new string('x', 200), 'y' };
        yield return new object[] { new string('x', 1000), 'y' };
        yield return new object[] { new string('x', 1000000), 'y' };
    }

    [Benchmark]
    [ArgumentsSource(nameof(TestData))]
    public int IndexOf_byte(string str, char c)
    {
        return MemoryMarshal.Cast<char,byte>(str.AsSpan()).IndexOf((byte)c);
    }

    [Benchmark]
    [ArgumentsSource(nameof(TestData))]
    public int IndexOf_char(string str, char c)
    {
        return str.AsSpan().IndexOf(c);
    }
}

IndexOf_byte

|  Method |               Toolchain |                  str | c |          Mean |      Error |     StdDev | Ratio |
|-------- |------------------------ |--------------------- |-- |--------------:|-----------:|-----------:|------:|
| IndexOf |      /Core_Root/corerun |             12345678 | 1 |      1.692 ns |  0.0022 ns |  0.0020 ns |  1.00 |
| IndexOf | /Core_Root_base/corerun |             12345678 | 1 |      1.686 ns |  0.0017 ns |  0.0013 ns |  1.00 |
|         |                         |                      |   |               |            |            |       |
| IndexOf |      /Core_Root/corerun |     1234567812345678 | 0 |     10.262 ns |  0.0127 ns |  0.0119 ns |  1.78 |
| IndexOf | /Core_Root_base/corerun |     1234567812345678 | 0 |      5.763 ns |  0.0030 ns |  0.0025 ns |  1.00 |
|         |                         |                      |   |               |            |            |       |
| IndexOf |      /Core_Root/corerun |     1234567812345678 | 1 |      1.676 ns |  0.0032 ns |  0.0030 ns |  1.01 |
| IndexOf | /Core_Root_base/corerun |     1234567812345678 | 1 |      1.665 ns |  0.0022 ns |  0.0020 ns |  1.00 |
|         |                         |                      |   |               |            |            |       |
| IndexOf |      /Core_Root/corerun |            123456789 | 9 |      4.197 ns |  0.0089 ns |  0.0083 ns |  1.00 |
| IndexOf | /Core_Root_base/corerun |            123456789 | 9 |      4.193 ns |  0.0032 ns |  0.0029 ns |  1.00 |
|         |                         |                      |   |               |            |            |       |
| IndexOf |      /Core_Root/corerun | Conso(...)World [29] | d |      7.010 ns |  0.0049 ns |  0.0043 ns |  1.00 |
| IndexOf | /Core_Root_base/corerun | Conso(...)World [29] | d |      7.014 ns |  0.0018 ns |  0.0017 ns |  1.00 |
|         |                         |                      |   |               |            |            |       |
| IndexOf |      /Core_Root/corerun | Conso(...)World [29] | o |      1.679 ns |  0.0006 ns |  0.0006 ns |  1.00 |
| IndexOf | /Core_Root_base/corerun | Conso(...)World [29] | o |      1.678 ns |  0.0008 ns |  0.0007 ns |  1.00 |
|         |                         |                      |   |               |            |            |       |
| IndexOf |      /Core_Root/corerun | xxxxx(...)xxxxx [64] | y |      7.322 ns |  0.0028 ns |  0.0024 ns |  0.80 |
| IndexOf | /Core_Root_base/corerun | xxxxx(...)xxxxx [64] | y |      9.204 ns |  0.0032 ns |  0.0029 ns |  1.00 |
|         |                         |                      |   |               |            |            |       |
| IndexOf |      /Core_Root/corerun |  xxxx(...)xxxx [200] | y |     13.269 ns |  0.0024 ns |  0.0022 ns |  0.62 |
| IndexOf | /Core_Root_base/corerun |  xxxx(...)xxxx [200] | y |     21.317 ns |  0.2285 ns |  0.2138 ns |  1.00 |
|         |                         |                      |   |               |            |            |       |
| IndexOf |      /Core_Root/corerun | xxxx(...)xxxx [1000] | y |     58.107 ns |  0.0458 ns |  0.0429 ns |  0.65 |
| IndexOf | /Core_Root_base/corerun | xxxx(...)xxxx [1000] | y |     89.510 ns |  0.0272 ns |  0.0241 ns |  1.00 |
|         |                         |                      |   |               |            |            |       |
| IndexOf |      /Core_Root/corerun |  xx(...)xx [1000000] | y | 44,056.624 ns |  9.0990 ns |  8.5112 ns |  0.56 |
| IndexOf | /Core_Root_base/corerun |  xx(...)xx [1000000] | y | 78,269.573 ns | 26.4061 ns | 22.0503 ns |  1.00 |

IndexOf_char

|  Method |               Toolchain |                  str | c |          Mean |      Error |     StdDev | Ratio |
|-------- |------------------------ |--------------------- |-- |--------------:|-----------:|-----------:|------:|
| IndexOf |      /Core_Root/corerun |             12345678 | 1 |      1.382 ns |  0.0020 ns |  0.0018 ns |  1.00 |
| IndexOf | /Core_Root_base/corerun |             12345678 | 1 |      1.382 ns |  0.0023 ns |  0.0021 ns |  1.00 |
|         |                         |                      |   |               |            |            |       |
| IndexOf |      /Core_Root/corerun |     1234567812345678 | 0 |     11.378 ns |  0.0141 ns |  0.0132 ns |  1.07 |
| IndexOf | /Core_Root_base/corerun |     1234567812345678 | 0 |     10.634 ns |  0.0240 ns |  0.0187 ns |  1.00 |
|         |                         |                      |   |               |            |            |       |
| IndexOf |      /Core_Root/corerun |     1234567812345678 | 1 |      1.375 ns |  0.0016 ns |  0.0014 ns |  1.01 |
| IndexOf | /Core_Root_base/corerun |     1234567812345678 | 1 |      1.366 ns |  0.0034 ns |  0.0030 ns |  1.00 |
|         |                         |                      |   |               |            |            |       |
| IndexOf |      /Core_Root/corerun |            123456789 | 9 |      2.612 ns |  0.0014 ns |  0.0011 ns |  1.00 |
| IndexOf | /Core_Root_base/corerun |            123456789 | 9 |      2.615 ns |  0.0029 ns |  0.0027 ns |  1.00 |
|         |                         |                      |   |               |            |            |       |
| IndexOf |      /Core_Root/corerun | Conso(...)World [29] | d |      5.784 ns |  0.0043 ns |  0.0038 ns |  0.91 |
| IndexOf | /Core_Root_base/corerun | Conso(...)World [29] | d |      6.389 ns |  0.0059 ns |  0.0052 ns |  1.00 |
|         |                         |                      |   |               |            |            |       |
| IndexOf |      /Core_Root/corerun | Conso(...)World [29] | o |      1.689 ns |  0.0016 ns |  0.0014 ns |  1.00 |
| IndexOf | /Core_Root_base/corerun | Conso(...)World [29] | o |      1.694 ns |  0.0007 ns |  0.0007 ns |  1.00 |
|         |                         |                      |   |               |            |            |       |
| IndexOf |      /Core_Root/corerun | xxxxx(...)xxxxx [64] | y |      7.009 ns |  0.0025 ns |  0.0023 ns |  0.76 |
| IndexOf | /Core_Root_base/corerun | xxxxx(...)xxxxx [64] | y |      9.204 ns |  0.0025 ns |  0.0023 ns |  1.00 |
|         |                         |                      |   |               |            |            |       |
| IndexOf |      /Core_Root/corerun |  xxxx(...)xxxx [200] | y |     14.192 ns |  0.0070 ns |  0.0062 ns |  0.67 |
| IndexOf | /Core_Root_base/corerun |  xxxx(...)xxxx [200] | y |     21.267 ns |  0.2695 ns |  0.2521 ns |  1.00 |
|         |                         |                      |   |               |            |            |       |
| IndexOf |      /Core_Root/corerun | xxxx(...)xxxx [1000] | y |     64.790 ns |  0.2031 ns |  0.1800 ns |  0.72 |
| IndexOf | /Core_Root_base/corerun | xxxx(...)xxxx [1000] | y |     89.485 ns |  0.0196 ns |  0.0174 ns |  1.00 |
|         |                         |                      |   |               |            |            |       |
| IndexOf |      /Core_Root/corerun |  xx(...)xx [1000000] | y | 60,098.282 ns |  3.1361 ns |  2.7801 ns |  0.77 |
| IndexOf | /Core_Root_base/corerun |  xx(...)xx [1000000] | y | 78,264.715 ns | 20.9694 ns | 19.6148 ns |  1.00 |

up +40% faster for large inputs can 0.5-1ns regress in the worst case - if input is found in the very first vector.

Author: EgorBo
Assignees: EgorBo
Labels:

area-System.Memory

Milestone: -

@EgorBo EgorBo force-pushed the arm-spanhelpers-fast branch from 9e59caa to 8ab651d Compare April 9, 2022 20:13
Copy link
Contributor

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@EgorBo EgorBo merged commit a4503d6 into dotnet:main Apr 12, 2022
@adamsitnik adamsitnik added the tenet-performance Performance related issue label Apr 14, 2022
@dakersnar
Copy link
Contributor

Performance improvement on Windows arm64: dotnet/perf-autofiling-issues#4732

@ghost ghost locked as resolved and limited conversation to collaborators May 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants