-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Replace ushort usage with int #32694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Interesting, how did you determine this was a performance issue? Just trying the change? How much difference does it make? |
|
I had assumed it was using ushort because signed integers can be a little slower. When indexing into an array, you see an extra Though |
|
Any throughput-focused change like this should have performance numbers accompanying it to demonstrate the value in the churn and to prove its actually a win rather than a potential regression for unforseen reasons. |
|
@scalablecory using ushort for indexing means you end up having to extend the 16bit value on every loop iteration. You can see the overhead on a simple method like char Get(char* source, ushort index) => source[index];Get(Char*, UInt16)
L0000: movzx eax, dx
L0003: movsxd rax, eax
L0006: movzx eax, word [rcx+rax*2]
L000a: ret
Get(Char*, Int32)
L0000: movsxd rax, edx
L0003: movzx eax, word [rcx+rax*2]
L0007: ret
Get(Char*, Int64)
L0000: movzx eax, word [rdx+r8*2]
L0005: retMore examples of the same. I don't think there is ever an upside for using ushort over 32/64bit types when iterating.
I was surprised to see ushort being used in this way when first looking at Uri code. I ran a quick microbenchmark comparing between int16, uint16, int32, uint32, int64, uint64 for such cases just to make sure there isn't some ushort magic I'm missing out on. Uri uses ushort to store indexes in a packed struct to reduce the size of a Uri allocation and that makes sense (this is also the source of the Uri length limitations). The perf diff of this change is anywhere from 0 to ~10% depending on the code paths hit. For the simple scenario of doing |
|
Thanks. Do you see any perf change with |
| validShortName = true; | ||
| } | ||
| else if (name[i] < '0' || name[i] > '9') | ||
| else if (c < '0' || c > '9') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c - '0' > '9' - '0' ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Saw this as well, I am planning on changing similar checks as a part of a different PR in the uri-cleanup series.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to make char.IsInRange public for that. The hack I suggested is widely used in the BCL but looks ugly. I had a draft PR to JIT to handle this pattern there dotnet/coreclr#27480
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of isolating common character checks into a Uri internal CharHelper class like
internal static class CharHelper
{
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool IsAsciiDigit(char c) =>
(uint)(c - '0') <= ('9' - '0');
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool IsAsciiLowercaseLetter(char c) =>
(uint)(c - 'a') <= ('z' - 'a');
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool IsAsciiUppercaseLetter(char c) =>
(uint)(c - 'A') <= ('Z' - 'A');
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool IsAsciiLetter(char c) =>
(uint)((c - 'A') & ~0x20) <= ('Z' - 'A');
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool IsAsciiLetterOrDigit(char c) =>
((uint)((c - 'A') & ~0x20) <= ('Z' - 'A')) ||
((uint)(c - '0') <= ('9' - '0'));
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool IsHexDigit(char c) =>
((uint)(c - '0') <= ('9' - '0')) ||
((uint)((c - 'A') & ~0x20) <= ('F' - 'A'));
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool IsInInclusiveRange(char c, char min, char max) =>
(uint)(c - min) <= (uint)(max - min);
}
@MihaZupan 0% - 15% fluctuations looks suspiciously large. Could you please post the statistical metrics calculated by BenchmarkDotNet for the original and changed versions? Especially, Mean and Standard Deviation. |
|
@alnikola For reference, here's the dump of 40 benchmark runs, ran on the work laptop for [Benchmark] public Uri NewUri() => new Uri("https://www.microsoft.com/");The average of those runs is a 3.3% improvement. I will post results of the runs from a desktop machine when I get home. @adamsitnik When running BenchmarkDotNet from the command line with |
The following command should work: --job LongRunJob --corerun $path |
|
And a verylong run on the same machine yields
|
This looks fine to me. |
alnikola
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
wfurt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
16bit integers should generally be avoided for perf as I understand. Except for storage size. |
|
Are there any places where we may have been eliminating bounds checks, e.g. because the JIT could prove that the value was < the length (and being unsigned it definitely wasn't < 0), and now with a signed int it may no longer be able to eliminate those bounds checks? |
|
Exactly. Hence my question 😉 Thanks for the concrete example. |
|
I will recheck and fix if necessary, tho I haven't seen any explicit casts of |
I don't see any such cases |
|
@MihaZupan, should this be merged? |
Uri code wastes cpu cycles by using ushort for indexes and lengths. This PR changes those to int.
Also removed
FindEndOfComponent, as it was effectively a glorified and confusingIndexOf.