Skip to content

Conversation

@tannergooding
Copy link
Member

Found that the implementation was less efficient than expected when working on Int128 and UInt128. This updates the 64-bit paths to be closer to what the 32-bit paths do, but with keeping the "decomposed" operation on 32-bit platforms.

@ghost ghost added the area-System.Numerics label May 3, 2022
@ghost ghost assigned tannergooding May 3, 2022
@ghost
Copy link

ghost commented May 3, 2022

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

Issue Details

Found that the implementation was less efficient than expected when working on Int128 and UInt128. This updates the 64-bit paths to be closer to what the 32-bit paths do, but with keeping the "decomposed" operation on 32-bit platforms.

Author: tannergooding
Assignees: -
Labels:

area-System.Numerics

Milestone: -

@danmoseley
Copy link
Member

Do you plan to add a few int128 formatting/parsing benchmarks in the perf repo?

@tannergooding
Copy link
Member Author

tannergooding commented May 3, 2022

Do you plan to add a few int128 formatting/parsing benchmarks in the perf repo?

After Int128/UInt128 is added to the repo, yes. This is just some intermediate cleanup that will make adding those types simpler.

I'll get benchmark results for Int64/UInt64 after CI is passing.

@tannergooding
Copy link
Member Author

A few small wins (running the Int64/UInt64 formatting benchmarks from dotnet/performance)

Before

BenchmarkDotNet=v0.13.1.1786-nightly, OS=Windows 11 (10.0.22000.652/21H2)
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=7.0.100-preview.3.22179.4
[Host] : .NET 7.0.0 (7.0.22.17504), X64 RyuJIT
Job-POIVPS : .NET 7.0.0 (7.0.22.17504), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250.0000 ms MaxIterationCount=20
MinIterationCount=15 WarmupCount=1

Method value Mean Error StdDev Median Min Max Gen 0 Allocated
Int64.ToString -9223372036854775808 26.764 ns 0.2994 ns 0.2800 ns 26.720 ns 26.379 ns 27.132 ns 0.0038 64 B
Int64.TryFormat -9223372036854775808 23.900 ns 0.0877 ns 0.0820 ns 23.875 ns 23.766 ns 24.033 ns - -
Int64.ToString 12345 9.028 ns 0.2440 ns 0.2810 ns 9.050 ns 8.457 ns 9.452 ns 0.0019 32 B
Int64.TryFormat 12345 7.715 ns 0.0250 ns 0.0234 ns 7.718 ns 7.673 ns 7.745 ns - -
Int64.ToString 9223372036854775807 21.241 ns 0.1508 ns 0.1410 ns 21.168 ns 21.060 ns 21.472 ns 0.0037 64 B
Int64.TryFormat 9223372036854775807 20.602 ns 0.0370 ns 0.0328 ns 20.595 ns 20.567 ns 20.676 ns - -
Method value Mean Error StdDev Median Min Max Gen 0 Allocated
UInt64.ToString 0 1.769 ns 0.0124 ns 0.0116 ns 1.767 ns 1.754 ns 1.792 ns - -
UInt64.TryFormat 0 3.529 ns 0.0163 ns 0.0145 ns 3.523 ns 3.515 ns 3.560 ns - -
UInt64.ToString 12345 7.765 ns 0.0914 ns 0.0855 ns 7.783 ns 7.585 ns 7.886 ns 0.0019 32 B
UInt64.TryFormat 12345 6.541 ns 0.0188 ns 0.0167 ns 6.538 ns 6.521 ns 6.571 ns - -
UInt64.ToString 18446744073709551615 25.132 ns 0.2716 ns 0.2540 ns 25.039 ns 24.767 ns 25.642 ns 0.0038 64 B
UInt64.TryFormat 18446744073709551615 21.153 ns 0.0718 ns 0.0599 ns 21.138 ns 21.094 ns 21.300 ns - -

After

BenchmarkDotNet=v0.13.1.1786-nightly, OS=Windows 11 (10.0.22000.652/21H2)
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=7.0.100-preview.3.22179.4
[Host] : .NET 7.0.0 (7.0.22.17504), X64 RyuJIT
Job-GUQJVF : .NET 7.0.0 (42.42.42.42424), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000 Toolchain=CoreRun IterationTime=250.0000 ms
MaxIterationCount=20 MinIterationCount=15 WarmupCount=1

Method value Mean Error StdDev Median Min Max Gen 0 Allocated
Int64.ToString -9223372036854775808 26.619 ns 0.5673 ns 0.5572 ns 26.582 ns 25.885 ns 27.767 ns 0.0037 64 B
Int64.TryFormat -9223372036854775808 24.076 ns 0.1343 ns 0.1190 ns 24.028 ns 23.938 ns 24.273 ns - -
Int64.ToString 12345 7.222 ns 0.1829 ns 0.2032 ns 7.223 ns 6.942 ns 7.606 ns 0.0019 32 B
Int64.TryFormat 12345 6.803 ns 0.0371 ns 0.0347 ns 6.805 ns 6.749 ns 6.860 ns - -
Int64.ToString 9223372036854775807 20.991 ns 0.1958 ns 0.1831 ns 20.980 ns 20.701 ns 21.270 ns 0.0038 64 B
Int64.TryFormat 9223372036854775807 18.136 ns 0.0604 ns 0.0535 ns 18.122 ns 18.076 ns 18.244 ns - -
Method value Mean Error StdDev Median Min Max Gen 0 Allocated
UInt64.ToString 0 1.525 ns 0.0038 ns 0.0033 ns 1.525 ns 1.520 ns 1.530 ns - -
UInt64.TryFormat 0 1.596 ns 0.0075 ns 0.0063 ns 1.595 ns 1.583 ns 1.603 ns - -
UInt64.ToString 12345 6.792 ns 0.0835 ns 0.0781 ns 6.804 ns 6.618 ns 6.922 ns 0.0019 32 B
UInt64.TryFormat 12345 4.410 ns 0.0089 ns 0.0074 ns 4.410 ns 4.400 ns 4.428 ns - -
UInt64.ToString 18446744073709551615 21.219 ns 0.1725 ns 0.1613 ns 21.203 ns 21.002 ns 21.544 ns 0.0038 64 B
UInt64.TryFormat 18446744073709551615 18.520 ns 0.0704 ns 0.0658 ns 18.524 ns 18.424 ns 18.641 ns - -

}

[MethodImpl(MethodImplOptions.AggressiveInlining)] // called from only one location
[MethodImpl(MethodImplOptions.AggressiveInlining)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the comment stale, or this PR is causing it to be called in more places?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We now handle both byte and char based versions so its at least 2 places

return true;
}

[MethodImpl(MethodImplOptions.AggressiveInlining)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're ok with whatever additional code size this leads to? (General question for the AggressiveInlinings added in the PR)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was previously manually inlined in a few cases already.

Otherwise, this is a decently hot loop that executes under 10 iterations for 32-bit and 20 iterations for 64-bit, so its worth inlining.

ulong value = (ulong)(-input);

int bufferLength = Math.Max(digits, FormattingHelpers.CountDigits((ulong)(-input))) + sNegative.Length;
int bufferLength = Math.Max(digits, FormattingHelpers.CountDigits((ulong)(-value))) + sNegative.Length;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any benefit to doing value = -value once here rather than negating it both here and down below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Int128/UInt128 there will be. For Int64/UInt64 I'd expect the JIT to do the relevant CSE here.

I'm fine with manually extracting it for consistency if we want.

do
{
ulong remainder;
(value, remainder) = Math.DivRem(value, 10);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use Math.DivRem here and elsewhere but separate % and / operations above in Int64DivMod1E9, and in both cases the divisor is a constant. Is there a reason for the difference?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DivRem is more efficient. Int64DivMod1E9 is a bit of a special case where we are effectively doing both 32-bit = 64-bit % 32-bit and 64-bit = 64-bit / 32-bit.

The former can be optimized on some platforms, but I didn't do any investigation to confirm if the JIT was doing anything special here or not.


internal static unsafe string UInt64ToDecStr(ulong value)
{
// Intrinsified in mono interpreter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this comment refer to?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe its referring to CountDigits. It's a pre-existing comment that I preserved

@tannergooding tannergooding requested a review from dakersnar May 3, 2022 16:35
Copy link
Contributor

@dakersnar dakersnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AndyAyersMS
Copy link
Member

AndyAyersMS commented May 12, 2022

Improvements seen on arm64: dotnet/perf-autofiling-issues#5277 and dotnet/perf-autofiling-issues#5282

Also dotnet/perf-autofiling-issues#5278 shows both better and worse results (bimodal with new highs and lows)

@ghost ghost locked as resolved and limited conversation to collaborators Jun 11, 2022
@tannergooding tannergooding deleted the formatting-uint64 branch July 1, 2025 14:40
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants