You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
FromNaN32ps() converts to binary16 NaN without changing
the quiet bit so it preserves the signalling.
It inlines so it is faster than Fromfloat32().
This was needed because Fromfloat32() is 100% compatible with
AMD and Intel F16C instructions. Unfortunately, that means
NaN input values are converted to NaN with quiet bit always set
in order to be compatible. FromNaN32ps() offers an alternative.
Copy file name to clipboardExpand all lines: README.md
+20-16Lines changed: 20 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -31,7 +31,7 @@ Current status:
31
31
* core API is done and breaking API changes are unlikely.
32
32
* 100% of unit tests pass:
33
33
* short mode (`go test -short`) tests around 65765 conversions in 0.005s.
34
-
* normal mode (`go test`) tests all possible 4+ billion conversions in about 75s.
34
+
* normal mode (`go test`) tests all possible 4+ billion conversions in about 95s.
35
35
* 100% code coverage with both short mode and normal mode.
36
36
* tested on amd64 but it should work on all little-endian platforms supported by Go.
37
37
@@ -49,7 +49,7 @@ Unit tests take a fraction of a second to check all 65536 expected values for fl
49
49
## Float32 to Float16 Conversion
50
50
Conversions from float32 to float16 use IEEE 754 default rounding ("Round-to-Nearest RoundTiesToEven"). All 4294967296 possible float32 to float16 conversions (in pure Go) are confirmed to be correct.
51
51
52
-
Unit tests in normal mode take about 60-90 seconds to check all 4+ billion expected values for float32 to float16 conversions as well as PrecisionFromfloat32() for each.
52
+
Unit tests in normal mode take about 1-2 minutes to check all 4+ billion float32 input values and results for Fromfloat32(), FromNaN32ps(), and PrecisionFromfloat32().
53
53
54
54
Unit tests in short mode use a small subset (around 229 float32 inputs) and finish in under 0.01 second while still reaching 100% code coverage.
55
55
@@ -66,7 +66,7 @@ pi32 := pi16.Float32()
66
66
// Only convert if there's no data loss (useful for CBOR encoders)
67
67
// PrecisionFromfloat32() is faster than the overhead of calling a function
68
68
if float16.Precision(pi) == float16.PrecisionExact {
// this should only happen if both input and output are NaN
290
-
if!(f16.IsNaN() &&IsNaN32(f32)) {
336
+
if!(f16.IsNaN() &&isNaN32(f32)) {
291
337
t.Errorf("i=%d, PrecisionFromfloat32 in f32bits=0x%08x (%f), out f16bits=0x%04x, back=0x%08x (%f), got PrecisionExact when roundtrip failed with non-special value", i, u32, f32, u16, u32bis, f32bis)
0 commit comments