-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Description
During implementation of BFloat16, I examined the parsing/formatting traits of other FP types and found HalfNumberBufferLength is incorrect.
runtime/src/libraries/Common/src/System/Number.NumberBuffer.cs
Lines 14 to 23 in 52c14fb
| internal const int DecimalNumberBufferLength = 29 + 1 + 1; // 29 for the longest input + 1 for rounding | |
| internal const int DoubleNumberBufferLength = 767 + 1 + 1; // 767 for the longest input + 1 for rounding: 4.9406564584124654E-324 | |
| internal const int Int32NumberBufferLength = 10 + 1; // 10 for the longest input: 2,147,483,647 | |
| internal const int Int64NumberBufferLength = 19 + 1; // 19 for the longest input: 9,223,372,036,854,775,807 | |
| internal const int Int128NumberBufferLength = 39 + 1; // 39 for the longest input: 170,141,183,460,469,231,731,687,303,715,884,105,727 | |
| internal const int SingleNumberBufferLength = 112 + 1 + 1; // 112 for the longest input + 1 for rounding: 1.40129846E-45 | |
| internal const int HalfNumberBufferLength = 21; // 19 for the longest input + 1 for rounding (+1 for the null terminator) | |
| internal const int UInt32NumberBufferLength = 10 + 1; // 10 for the longest input: 4,294,967,295 | |
| internal const int UInt64NumberBufferLength = 20 + 1; // 20 for the longest input: 18,446,744,073,709,551,615 | |
| internal const int UInt128NumberBufferLength = 39 + 1; // 39 for the longest input: 340,282,366,920,938,463,463,374,607,431,768,211,455 |
Calculation
The buffer length is decided by highest possible significant digits of the type. Such value occurs when setting BiasedExponent to 1 and TrailingSignificand to all bits set. For Half it's 0x07FF.
Let e = Abs(MinExponent) = ExponentBias - 1 and m = TrailingSignificandLength,
Significand of the value should be (2 - 2^-m), and exponent should be -e, so the value is (2 - 2^-m) * 2^-e
Convert the value to fractional: (2^(m+1) - 1) / (2^(e+m))
Multiply 5^(e+m) to both numerator and denominator to get decimal fraction: (2^(m+1) - 1)*5^(e+m) / 10^(e+m)
The numerator won't contain trailing 0, so the total significand digits of the fraction is the magnitude of its numerator:
(2^(m+1) - 1)*5^(e+m) ≈ 2^(m+1) * 5^(e+m) = 10^(m+1) * 5^(e-1)
So the total significand digits is m+1+Log10(5^(e-1)) = m+1+(e-1)*Log10(5) (ceiling)
I'm going to add comment for this together with BFloat16.
For double, e = 1022, m = 52, total digits = 766.6483744270753
For float, e = 126, m = 23, total digits = 111.37125054200236
For Half, e = 14, m = 10, total digits = 20.086610056368244
So the longest Half value has 21 significant digits.
Convert the value to double and validate, it's 0.000122010707855224609375.
Observation
BitConverter.UInt16BitsToHalf(0x0x07FF).ToString(99) throws IndexOutOfRangeException.
This also reproduces in .NET 6, probably present since the introduction of Half. We should fix and backport it, at least to 8.0.
The file has been refactored so it may require manual backport.