-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Eliminate range checks from ConcatAsHexSuffix + Instrinsics #18406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| buffer[i + 3] = (char)hexEncodeMap[(number >> 20) & 0xF]; | ||
| buffer[i + 2] = (char)hexEncodeMap[(number >> 24) & 0xF]; | ||
| buffer[i + 1] = (char)hexEncodeMap[(number >> 28) & 0xF]; | ||
| buffer[i] = tupleSeparator; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it can be vectorized I think, load tupleNumber into an 256bit vector, do some shuffle magic and dump to a string (ideally - to FastAllocateString)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FastAllocateString is internal so it can't be used here in ASP.NET Core?
256bit vec is too large, 128bit should work, as there get 9 elements written to buffer. So with Vector128<ushor> it's 8 elements + the remaining 1.
Preferable this should be done in a different PR -- but nice idea!
Nitpicking: uint and little-endian 😉 (but that's irrelevant to the idea).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gfoidl yeah I meant Vector128 :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gfoidl @benaadams here is a vectorized string ToHex(int num) : https://pastebin.com/F2ZUYkwG
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Makes it sense to add int -> hex to runtime? I think it's quite common to convert a number to hex. Therefore is int.ToString("X"), but it allocates a string. So there should be something like int.ToString(Span<char>, Format) (I know the API needs work, maybe on a better type like a special formatter) and at this central place all kind of optimization could take place (SIMD or SWAR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes it sense to add int -> hex to runtime?
That conversation has been going on for years https://github.com/dotnet/corefx/issues/10013
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gowan then
Method | Branch | Mean | Error | Op/s | Allocated | Precent |
------------------ |--------|---------:|----------:|-------------:|----------:|--------:|
ConcatAsHexSuffix | master | 21.18 ns | 0.1040 ns | 47,221,986.9 | 34.33 MB | |
ConcatAsHexSuffix | pr | 17.26 ns | 0.0844 ns | 57,953,720.1 | 34.33 MB | 123 % |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to search within my brain, but I believe there is a SWAR variant that needs less instructions than this SIMD variant. I'll give an update here...(later today).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That conversation has been going on for years
Maybe split the hex-part out of PopulateSpanWithHexSuffix to have such a method at least here in aspnetcore?
| return string.Create(length, (str, separator, number), s_populateSpanWithHexSuffix); | ||
| } | ||
|
|
||
| private static void PopulateSpanWithHexSuffix(Span<char> buffer, (string str, char separator, uint number) tuple) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll start a new comment to keep it clean.
I have a variant with SWAR
SWAR code
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static void Number2HexSwar(int number, ref char dest)
{
Debug.Assert(Bmi2.X64.IsSupported);
ulong input = (ulong)(uint)number; // cast to uint avoids movsxd and uses mov
ulong isolatedNibbles = Bmi2.X64.ParallelBitDeposit(input, 0x0F0F0F0F_0F0F0F0Ful);
Debug.Assert((isolatedNibbles & 0xF0F0F0F0_F0F0F0F0ul) == 0);
// JIT will treat these as constants
ulong packedAscii0 = Packed('0');
ulong packedCorrection = Packed(0x80 - ('9' - '0' + 1));
ulong packed0x80 = Packed(0x80);
ulong packedHexShiftToAF = Packed('A' - '0' - 10);
ulong asciiHexNumbers = isolatedNibbles + packedAscii0;
ulong asciiHexChars = isolatedNibbles + packedCorrection;
ulong flag = asciiHexChars & packed0x80;
ulong hexAFMask = flag - (flag >> 7);
ulong hex = asciiHexNumbers + (hexAFMask & packedHexShiftToAF);
hex = BinaryPrimitives.ReverseEndianness(hex);
ulong hex0 = hex;
ulong hex1 = hex >> 32;
Unsafe.Add(ref Unsafe.As<char, ulong>(ref dest), 0) = Bmi2.X64.ParallelBitDeposit(hex0, 0x00FF00FF_00FF00FFul);
Unsafe.Add(ref Unsafe.As<char, ulong>(ref dest), 1) = Bmi2.X64.ParallelBitDeposit(hex1, 0x00FF00FF_00FF00FFul);
static ulong Packed(ulong a) => a * 0x01010101_01010101ul;
}92ec46c to
619933b
Compare
|
Improved with feedback |
|
With the swar Though |
|
Thanks for the test. So let's leave the SWAR variant out. |
gfoidl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM 😃
Co-Authored-By: Günther Foidl <[email protected]>
gfoidl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
BrennanConroy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bit manipulation looks good 👍
Anyone else want to take a look?
| // This must be explicity typed as ReadOnlySpan<byte> | ||
| // This then becomes a non-allocating mapping to the data section of the assembly. | ||
| // If it is a var, Span<byte> or byte[], it allocates the byte array per call. | ||
| ReadOnlySpan<byte> hexEncodeMap = new byte[] { (byte)'0', (byte)'1', (byte)'2', (byte)'3', (byte)'4', (byte)'5', (byte)'6', (byte)'7', (byte)'8', (byte)'9', (byte)'A', (byte)'B', (byte)'C', (byte)'D', (byte)'E', (byte)'F' }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this is the same as the asciiUpperCaseData above, so we can just lift it outside the if statement.
|
@halter73 let's get this in soon. |
Co-Authored-By: Brennan <[email protected]>

Uses dotnet/runtime#1644 for the non-instrinsic path to eliminate range checks/branches