-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Faster IndexOf for arm64 #67811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster IndexOf for arm64 #67811
Conversation
|
Tagging subscribers to this area: @dotnet/area-system-memory Issue DetailsUse a slightly faster "no matches" check in Benchmark: public class Benchmarks
{
public IEnumerable<object[]> TestData()
{
// Small inputs which are handled via SIMD
yield return new object[] { "12345678", '1' };
yield return new object[] { "123456789", '9' };
yield return new object[] { "1234567812345678", '1' };
yield return new object[] { "1234567812345678", '0' };
yield return new object[] { "Console WriteLine Hello World", 'o' };
yield return new object[] { "Console WriteLine Hello World", 'd' };
// Large inputs
yield return new object[] { new string('x', 64), 'y' };
yield return new object[] { new string('x', 200), 'y' };
yield return new object[] { new string('x', 1000), 'y' };
yield return new object[] { new string('x', 1000000), 'y' };
}
[Benchmark]
[ArgumentsSource(nameof(TestData))]
public int IndexOf_byte(string str, char c)
{
return MemoryMarshal.Cast<char,byte>(str.AsSpan()).IndexOf((byte)c);
}
[Benchmark]
[ArgumentsSource(nameof(TestData))]
public int IndexOf_char(string str, char c)
{
return str.AsSpan().IndexOf(c);
}
}IndexOf_byteIndexOf_charup +40% faster for large inputs can 0.5-1ns regress in the worst case - if input is found in the very first vector.
|
9e59caa to
8ab651d
Compare
kunalspathak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Performance improvement on Windows arm64: dotnet/perf-autofiling-issues#4732 |
Use a slightly faster "no matches" check in
IndexOfandIndexOfAny.Benchmark:
IndexOf_byte
IndexOf_char
up to +40% faster for large inputs can 0.5-1ns regress in the worst case - if input is found in the very first vector.
There are weird regressions
but it goes away in FullPGO mode (and the PR becomes faster than the base as expected in this case) - it means the loop is not properly aligned by default. So the regression will be fixed once we update *.mibc for corelib (IndexOf definitely participates in the trainings)