Skip to content

Conversation

@MihaZupan
Copy link
Member

Uri code wastes cpu cycles by using ushort for indexes and lengths. This PR changes those to int.

Also removed FindEndOfComponent, as it was effectively a glorified and confusing IndexOf.

@MihaZupan MihaZupan added this to the 5.0 milestone Feb 22, 2020
@MihaZupan MihaZupan requested a review from a team February 22, 2020 10:42
@danmoseley
Copy link
Member

Interesting, how did you determine this was a performance issue? Just trying the change? How much difference does it make?

@scalablecory
Copy link
Contributor

I had assumed it was using ushort because signed integers can be a little slower. When indexing into an array, you see an extra movsxd:

movsxd rdx, edx
mov rax, [rcx + rdx]

Though uint may make more sense than ushort. I believe on x64 most 16-bit instructions require an extra prefix that will bloat your code size.

@stephentoub
Copy link
Member

stephentoub commented Feb 22, 2020

Any throughput-focused change like this should have performance numbers accompanying it to demonstrate the value in the churn and to prove its actually a win rather than a potential regression for unforseen reasons.

@MihaZupan
Copy link
Member Author

MihaZupan commented Feb 22, 2020

@scalablecory using ushort for indexing means you end up having to extend the 16bit value on every loop iteration.

You can see the overhead on a simple method like

char Get(char* source, ushort index) => source[index];
Get(Char*, UInt16)
    L0000: movzx eax, dx
    L0003: movsxd rax, eax
    L0006: movzx eax, word [rcx+rax*2]
    L000a: ret

Get(Char*, Int32)
    L0000: movsxd rax, edx
    L0003: movzx eax, word [rcx+rax*2]
    L0007: ret

Get(Char*, Int64)
    L0000: movzx eax, word [rdx+r8*2]
    L0005: ret

More examples of the same.

I don't think there is ever an upside for using ushort over 32/64bit types when iterating.

Interesting, how did you determine this was a performance issue? Just trying the change? How much difference does it make?

I was surprised to see ushort being used in this way when first looking at Uri code. I ran a quick microbenchmark comparing between int16, uint16, int32, uint32, int64, uint64 for such cases just to make sure there isn't some ushort magic I'm missing out on.

Uri uses ushort to store indexes in a packed struct to reduce the size of a Uri allocation and that makes sense (this is also the source of the Uri length limitations).
But it also takes steps to ensure that it uses ushort for indexes, lengths in a lot of places, and in such scenarios it doesn't make sense.

The perf diff of this change is anywhere from 0 to ~10% depending on the code paths hit. For the simple scenario of doing new Uri("https://www.microsoft.com/"), the difference is ~5% (re-running the benchmark multiple times as it yields results from 0 to 15%, averaging at ~5).

@scalablecory
Copy link
Contributor

Thanks. Do you see any perf change with uint? It will get rid of that stray movsxd vs the int version.

validShortName = true;
}
else if (name[i] < '0' || name[i] > '9')
else if (c < '0' || c > '9')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c - '0' > '9' - '0' ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saw this as well, I am planning on changing similar checks as a part of a different PR in the uri-cleanup series.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to make char.IsInRange public for that. The hack I suggested is widely used in the BCL but looks ugly. I had a draft PR to JIT to handle this pattern there dotnet/coreclr#27480

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of isolating common character checks into a Uri internal CharHelper class like

internal static class CharHelper
{
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static bool IsAsciiDigit(char c) =>
        (uint)(c - '0') <= ('9' - '0');

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static bool IsAsciiLowercaseLetter(char c) =>
        (uint)(c - 'a') <= ('z' - 'a');

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static bool IsAsciiUppercaseLetter(char c) =>
        (uint)(c - 'A') <= ('Z' - 'A');

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static bool IsAsciiLetter(char c) =>
        (uint)((c - 'A') & ~0x20) <= ('Z' - 'A');

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static bool IsAsciiLetterOrDigit(char c) =>
        ((uint)((c - 'A') & ~0x20) <= ('Z' - 'A')) ||
        ((uint)(c - '0') <= ('9' - '0'));

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static bool IsHexDigit(char c) =>
        ((uint)(c - '0') <= ('9' - '0')) ||
        ((uint)((c - 'A') & ~0x20) <= ('F' - 'A'));

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static bool IsInInclusiveRange(char c, char min, char max) =>
        (uint)(c - min) <= (uint)(max - min);
}

@alnikola
Copy link
Contributor

The perf diff of this change is anywhere from 0 to ~10% depending on the code paths hit. For the simple scenario of doing new Uri("https://www.microsoft.com/"), the difference is ~5% (re-running the benchmark multiple times as it yields results from 0 to 15%, averaging at ~5).

@MihaZupan 0% - 15% fluctuations looks suspiciously large. Could you please post the statistical metrics calculated by BenchmarkDotNet for the original and changed versions? Especially, Mean and Standard Deviation.

@MihaZupan
Copy link
Member Author

@alnikola For reference, here's the dump of 40 benchmark runs, ran on the work laptop for

[Benchmark] public Uri NewUri() => new Uri("https://www.microsoft.com/");

The average of those runs is a 3.3% improvement.

I will post results of the runs from a desktop machine when I get home.

@adamsitnik When running BenchmarkDotNet from the command line with --coreRuns specified, is there a way to specify the job type (for example LongRunJob)? Or can the coreRuns be specified in the ManualConfig?

@adamsitnik
Copy link
Member

When running BenchmarkDotNet from the command line with --coreRuns specified, is there a way to specify the job type (for example LongRunJob)? Or can the coreRuns be specified in the ManualConfig?

The following command should work:

--job LongRunJob --corerun $path

@MihaZupan
Copy link
Member Author

And a verylong run on the same machine yields

Method Toolchain Mean Error StdDev Median Ratio RatioSD
NewUri \clean\CoreRun.exe 199.3 ns 0.59 ns 7.89 ns 198.0 ns 1.04 0.04
NewUri \new\CoreRun.exe 191.6 ns 0.17 ns 2.20 ns 192.4 ns 1.00 0.00

@alnikola
Copy link
Contributor

And a verylong run on the same machine yields

This looks fine to me.

Copy link
Contributor

@alnikola alnikola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@wfurt wfurt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jaykrell
Copy link
Contributor

16bit integers should generally be avoided for perf as I understand. Except for storage size.

@stephentoub
Copy link
Member

Are there any places where we may have been eliminating bounds checks, e.g. because the JIT could prove that the value was < the length (and being unsigned it definitely wasn't < 0), and now with a signed int it may no longer be able to eliminate those bounds checks?

@EgorBo
Copy link
Member

EgorBo commented Feb 24, 2020

Are there any places where we may have been eliminating bounds checks, e.g. because the JIT could prove that the value was < the length (and being unsigned it definitely wasn't < 0), and now with a signed int it may no longer be able to eliminate those bounds checks?

In theory it's possible 🙂 https://sharplab.io/#v2:C4LghgzgtgPgAgJgIwFgBQcDMACR2DC26A3tmUWuRVVrgCzYCySAFAJYB2wA2gLrZgAToLABPADTYArhAAWAe0HBsbAJToqxDVXJsAZthYspnYKqEjRAOgAyAUw4BzYLOwA+Feso6fFsdzZ+AF5sAAYAbm0yAF8o6nJaOAZGBHYuPgFhMUlTT2wAenzsAGNZMCc7ABNsYHkVLjitbx99Q2NTcyzreycXd084nzI/UQDgsMjm7Fi0aKA=

@stephentoub
Copy link
Member

In theory it's possible

Exactly. Hence my question 😉 Thanks for the concrete example.

@MihaZupan
Copy link
Member Author

I will recheck and fix if necessary, tho I haven't seen any explicit casts of string.Length in the code, so there might not be any regressions related to it, but we can definitely improve it in a few places.

@MihaZupan
Copy link
Member Author

Are there any places where we may have been eliminating bounds checks, e.g. because the JIT could prove that the value was < the length (and being unsigned it definitely wasn't < 0), and now with a signed int it may no longer be able to eliminate those bounds checks?

I don't see any such cases

@stephentoub
Copy link
Member

@MihaZupan, should this be merged?

@MihaZupan MihaZupan merged commit d16afce into dotnet:master Mar 13, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants