Eliminate range checks from ConcatAsHexSuffix + Instrinsics #18406

benaadams · 2020-01-17T08:26:54Z

Uses dotnet/runtime#1644 for the non-instrinsic path to eliminate range checks/branches

            Method | Branch |     Mean |     Error |         Op/s | Allocated | Precent |
------------------ |--------|---------:|----------:|-------------:|----------:|--------:|
 ConcatAsHexSuffix | master | 21.18 ns | 0.1040 ns | 47,221,986.9 |  34.33 MB |         |
 ConcatAsHexSuffix |     pr | 16.98 ns | 0.0905 ns | 58,881,989.3 |  34.33 MB | 124.7 % |

src/Shared/ServerInfrastructure/StringUtilities.cs

EgorBo · 2020-01-17T13:00:56Z

src/Shared/ServerInfrastructure/StringUtilities.cs

+                buffer[i + 3] = (char)hexEncodeMap[(number >> 20) & 0xF];
+                buffer[i + 2] = (char)hexEncodeMap[(number >> 24) & 0xF];
+                buffer[i + 1] = (char)hexEncodeMap[(number >> 28) & 0xF];
                buffer[i] = tupleSeparator;


it can be vectorized I think, load tupleNumber into an 256bit vector, do some shuffle magic and dump to a string (ideally - to FastAllocateString)

FastAllocateString is internal so it can't be used here in ASP.NET Core?

256bit vec is too large, 128bit should work, as there get 9 elements written to buffer. So with Vector128<ushor> it's 8 elements + the remaining 1.

Preferable this should be done in a different PR -- but nice idea!

Nitpicking: uint and little-endian 😉 (but that's irrelevant to the idea).

@gfoidl yeah I meant Vector128 :-)

@gfoidl @benaadams here is a vectorized string ToHex(int num) : https://pastebin.com/F2ZUYkwG

Looks good.

Makes it sense to add int -> hex to runtime? I think it's quite common to convert a number to hex. Therefore is int.ToString("X"), but it allocates a string. So there should be something like int.ToString(Span<char>, Format) (I know the API needs work, maybe on a better type like a special formatter) and at this central place all kind of optimization could take place (SIMD or SWAR).

Makes it sense to add int -> hex to runtime?

That conversation has been going on for years https://github.com/dotnet/corefx/issues/10013

Gowan then

Method | Branch | Mean | Error | Op/s | Allocated | Precent | ------------------ |--------|---------:|----------:|-------------:|----------:|--------:| ConcatAsHexSuffix | master | 21.18 ns | 0.1040 ns | 47,221,986.9 | 34.33 MB | | ConcatAsHexSuffix | pr | 17.26 ns | 0.0844 ns | 57,953,720.1 | 34.33 MB | 123 % |

I have to search within my brain, but I believe there is a SWAR variant that needs less instructions than this SIMD variant. I'll give an update here...(later today).

That conversation has been going on for years

Maybe split the hex-part out of PopulateSpanWithHexSuffix to have such a method at least here in aspnetcore?

src/Shared/ServerInfrastructure/StringUtilities.cs

gfoidl · 2020-01-18T14:20:36Z

src/Shared/ServerInfrastructure/StringUtilities.cs

+            return string.Create(length, (str, separator, number), s_populateSpanWithHexSuffix);
+        }
+
+        private static void PopulateSpanWithHexSuffix(Span<char> buffer, (string str, char separator, uint number) tuple)


I'll start a new comment to keep it clean.

I have a variant with SWAR

SWAR code

[MethodImpl(MethodImplOptions.AggressiveInlining)] private static void Number2HexSwar(int number, ref char dest) { Debug.Assert(Bmi2.X64.IsSupported); ulong input = (ulong)(uint)number; // cast to uint avoids movsxd and uses mov ulong isolatedNibbles = Bmi2.X64.ParallelBitDeposit(input, 0x0F0F0F0F_0F0F0F0Ful); Debug.Assert((isolatedNibbles & 0xF0F0F0F0_F0F0F0F0ul) == 0); // JIT will treat these as constants ulong packedAscii0 = Packed('0'); ulong packedCorrection = Packed(0x80 - ('9' - '0' + 1)); ulong packed0x80 = Packed(0x80); ulong packedHexShiftToAF = Packed('A' - '0' - 10); ulong asciiHexNumbers = isolatedNibbles + packedAscii0; ulong asciiHexChars = isolatedNibbles + packedCorrection; ulong flag = asciiHexChars & packed0x80; ulong hexAFMask = flag - (flag >> 7); ulong hex = asciiHexNumbers + (hexAFMask & packedHexShiftToAF); hex = BinaryPrimitives.ReverseEndianness(hex); ulong hex0 = hex; ulong hex1 = hex >> 32; Unsafe.Add(ref Unsafe.As<char, ulong>(ref dest), 0) = Bmi2.X64.ParallelBitDeposit(hex0, 0x00FF00FF_00FF00FFul); Unsafe.Add(ref Unsafe.As<char, ulong>(ref dest), 1) = Bmi2.X64.ParallelBitDeposit(hex1, 0x00FF00FF_00FF00FFul); static ulong Packed(ulong a) => a * 0x01010101_01010101ul; }

In the microbenchmarks it is ~20...30% slower than the SIMD-variant. But it has no memory access, so maybe it is a good choice for real-world scenarios. @benaadams can you please run your benchmarks with this? I don't have a setup for this here.

benaadams · 2020-01-18T16:19:52Z

Improved with feedback

            Method |   Branch |     Mean |     Error |         Op/s | Allocated | Precent |
------------------ |----------|---------:|----------:|-------------:|----------:|--------:|
 ConcatAsHexSuffix |   master | 21.18 ns | 0.1040 ns | 47,221,986.9 |  34.33 MB |         |
 ConcatAsHexSuffix | original | 17.26 ns | 0.0844 ns | 57,953,720.1 |  34.33 MB | 122.7 % |
 ConcatAsHexSuffix | feedback | 16.98 ns | 0.0905 ns | 58,881,989.3 |  34.33 MB | 124.7 % |

benaadams · 2020-01-18T16:36:00Z

With the swar

            Method |   Branch |     Mean |     Error |         Op/s | Allocated | Precent |
------------------ |----------|---------:|----------:|-------------:|----------:|--------:|
 ConcatAsHexSuffix |   master | 21.18 ns | 0.1040 ns | 47,221,986.9 |  34.33 MB |         |
 ConcatAsHexSuffix | original | 17.26 ns | 0.0844 ns | 57,953,720.1 |  34.33 MB | 122.7 % |
 ConcatAsHexSuffix | feedback | 16.98 ns | 0.0905 ns | 58,881,989.3 |  34.33 MB | 124.7 % |
 ConcatAsHexSuffix |     swar | 17.03 ns | 0.0382 ns | 58,730,769.9 |  34.33 MB | 124.4 % |

Though PDEP is pretty slow on Rzyen/AMD (don't have to test) as used in the SWAR version

gfoidl · 2020-01-18T17:24:04Z

Thanks for the test. So let's leave the SWAR variant out.

gfoidl

Otherwise LGTM 😃

src/Shared/ServerInfrastructure/StringUtilities.cs

Co-Authored-By: Günther Foidl <[email protected]>

gfoidl

👍

BrennanConroy

The bit manipulation looks good 👍

Anyone else want to take a look?

src/Shared/ServerInfrastructure/StringUtilities.cs

BrennanConroy · 2020-02-12T20:46:25Z

src/Shared/ServerInfrastructure/StringUtilities.cs

+                // This must be explicity typed as ReadOnlySpan<byte>
+                // This then becomes a non-allocating mapping to the data section of the assembly.
+                // If it is a var, Span<byte> or byte[], it allocates the byte array per call.
+                ReadOnlySpan<byte> hexEncodeMap = new byte[] { (byte)'0', (byte)'1', (byte)'2', (byte)'3', (byte)'4', (byte)'5', (byte)'6', (byte)'7', (byte)'8', (byte)'9', (byte)'A', (byte)'B', (byte)'C', (byte)'D', (byte)'E', (byte)'F' };


nit: this is the same as the asciiUpperCaseData above, so we can just lift it outside the if statement.

analogrelay · 2020-02-19T21:26:16Z

@halter73 let's get this in soon.

Co-Authored-By: Brennan <[email protected]>

Eliminate range checks from ConcatAsHexSuffix

3a373f7

benaadams mentioned this pull request Jan 17, 2020

JIT: Optimize range checks for a[i & C], a[i % c] and a[(i & c1)>>c2)] patterns dotnet/runtime#1644

Merged

gfoidl reviewed Jan 17, 2020

View reviewed changes

src/Shared/ServerInfrastructure/StringUtilities.cs Outdated Show resolved Hide resolved

EgorBo reviewed Jan 17, 2020

View reviewed changes

Pilchie added the area-servers label Jan 17, 2020

benaadams added 2 commits January 18, 2020 02:29

Use Intrinsics

40a4d29

Remove unsafe

f9e403e

benaadams changed the title ~~Eliminate range checks from ConcatAsHexSuffix~~ Eliminate range checks from ConcatAsHexSuffix + Instrinsics Jan 18, 2020

gfoidl reviewed Jan 18, 2020

View reviewed changes

Feedback

619933b

benaadams force-pushed the ConcatAsHexSuffix branch from 92ec46c to 619933b Compare January 18, 2020 16:10

gfoidl reviewed Jan 18, 2020

View reviewed changes

src/Shared/ServerInfrastructure/StringUtilities.cs Outdated Show resolved Hide resolved

Update src/Shared/ServerInfrastructure/StringUtilities.cs

ddbe090

Co-Authored-By: Günther Foidl <[email protected]>

gfoidl approved these changes Jan 18, 2020

View reviewed changes

jkotalik assigned BrennanConroy Feb 7, 2020

BrennanConroy approved these changes Feb 12, 2020

View reviewed changes

analogrelay assigned halter73 Feb 19, 2020

Apply suggestions from code review

1c5a78d

Co-Authored-By: Brennan <[email protected]>

analogrelay added this to the 5.0.0-preview2 milestone Feb 26, 2020

Merge branch 'master' into ConcatAsHexSuffix

af367e3

analogrelay merged commit 07ce586 into dotnet:master Feb 26, 2020

EgorBo mentioned this pull request Oct 31, 2020

Vectorize HexConverter.EncodeToUtf16 using SSSE3 dotnet/runtime#44111

Merged

amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Aug 24, 2023

Eliminate range checks from ConcatAsHexSuffix + Instrinsics #18406

Eliminate range checks from ConcatAsHexSuffix + Instrinsics #18406

Uh oh!

Conversation

benaadams commented Jan 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

EgorBo Jan 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gfoidl Jan 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gfoidl Jan 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

benaadams commented Jan 18, 2020

Uh oh!

benaadams commented Jan 18, 2020

Uh oh!

gfoidl commented Jan 18, 2020

Uh oh!

gfoidl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gfoidl left a comment

Choose a reason for hiding this comment

Uh oh!

BrennanConroy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

analogrelay commented Feb 19, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

benaadams commented Jan 17, 2020 •

edited

Loading

EgorBo Jan 17, 2020 •

edited

Loading

gfoidl Jan 17, 2020 •

edited

Loading

gfoidl Jan 18, 2020 •

edited

Loading