TST/BENCH: Adding test coverage and benchmarks for floating point umath functions by r-devulap · Pull Request #19485 · numpy/numpy

r-devulap · 2021-07-15T16:11:19Z

Expands test coverage for umath module in the following ways:

Adds tests to ensure correctness for special floating points (NAN, INF, etc.)
Adds tests to ensure FP exceptions are raised when appropriate (divide by zero, invalid, overflow and underflow)
Expands accuracy validation datasets for more umath functions. Source code for generating these is also provided.
Expanded benchmarking for all the unary umath functions.

tylerjereddy

That's a lot of CSV data files!

I see there's a README at that path explaining what is going on.

Super minor point, but the README says: Add a file 'umath-validation-set-<ufuncname>.txt; looks like the team is going with .csv instead.

numpy/core/tests/test_umath.py

r-devulap · 2021-07-16T04:51:25Z

That's a lot of CSV data files!

yup, but it is in line with this wish list #13515. This patch covers most of the umath functions and hence the long list of csv files. Commit a655aa8 also adds the C++ code I wrote to generate these files. I am not sure if you would want to include that in the repository. If you don't care for it, I am okay reverting that commit.

Super minor point, but the README says: Add a file 'umath-validation-set-<ufuncname>.txt; looks like the team is going with .csv instead.

Thanks for pointing that out. I will update the README file.

r-devulap · 2021-07-16T19:47:47Z

The umath validation tests fail on the manylinux2010 container which seems to have really bad ULP errors (max difference is 9.21431e+18 ULP for sin and cos). For now, I have enabled these tests only if both the compiler and the CPU supports AVX512 (which essentially restricts to testing just the AVX512 code paths in NumPy).

r-devulap · 2021-07-16T20:00:30Z

The umath validation tests fail on the manylinux2010 container which seems to have really bad ULP errors (max difference is 9.21431e+18 ULP for sin and cos). For now, I have enabled these tests only if both the compiler and the CPU supports AVX512 (which essentially restricts to testing just the AVX512 code paths in NumPy).

umm, on a second thought, This still might not work :/

charris · 2021-07-16T21:26:54Z

We have dropped 2010 for the development numpy wheels, looks like we need to do it locally also.

r-devulap · 2021-07-16T21:31:50Z

We have dropped 2010 for the development numpy wheels, looks like we need to do it locally also.

I will stop worrying about fixing the CI failures then, both the failing checks are from manylinux2010.

charris · 2021-07-16T21:47:46Z

See #19498.

r-devulap · 2021-07-16T21:49:13Z

See #19498.

I will rebase after that gets merged.

charris · 2021-07-16T21:49:20Z

I will stop worrying about fixing the CI failures then

I think we should wait to see if things work with 2014.

r-devulap · 2021-07-16T21:57:27Z

I will stop worrying about fixing the CI failures then

I think we should wait to see if things work with 2014.

It passes on manylinux2014_i686 on my local machine at least, optimistic it will be okay after I rebase.

charris · 2021-07-16T22:31:15Z

OK, the change to manylinux2014 is in.

r-devulap · 2021-07-16T22:36:14Z

rebased ..

mattip · 2021-07-18T12:19:03Z

This extends the existing set of csv files (for np.float32) with more functions and adds np.float64 data as well. It ups the tolerated ULP from 2 to 4. Should we be less tolerant of np.float64 errors? I realize that this might slow down processing, but I think our approach has always been accuracy before performance.

r-devulap · 2021-07-21T17:21:25Z

It ups the tolerated ULP from 2 to 4. Should we be less tolerant of np.float64 errors?

4-ulp max errors means that up to two bits at the end of a result’s significand may be incorrect, so the top 51 bits are still correct. I think the results are still pretty accurate and not a major concern IMO. For comparison, note that such an error can be accumulated easily after a few (possibly just 8) floating-point additions, multiplications, or divisions in a row.

What level of accuracy would be acceptable for these functions?

charris · 2021-07-21T17:47:39Z

What level of accuracy would be acceptable for these functions?

Hard to say, I expect it would depend on the application. Astronomy might be an area where exact coordinates transformations or timing might matter. Outside of that, my rule of thumb is that if you end up with 10 digits of accuracy at the end of a calculation, you have done well enough. @mhvk might have some thoughts.

mhvk · 2021-07-21T18:48:54Z

Hard to make a generic statement for astropy - I assume this is mostly for more complicated functions (trig, exp, log, etc.)? For those the loss of a few bits should probably be OK, though it may well be that tests would start failing as they assume more stringent constraints even if those are not needed scientifically. For addition/multiplication, we do try very hard to keep losses as small as possible (especially in our Time class).

But I am obviously very intrigued by the very large speed-ups mentioned in #19478!

charris · 2021-07-21T20:43:30Z

@r-devulap Are there timings available for the high precision versions?

r-devulap · 2021-07-21T20:54:56Z

@r-devulap Are there timings available for the high precision versions?

I don't have specific numbers at the moment. But since higher accuracy means lower performance (which is the trade-off), they certainly will be slower than the ones in #19478.

charris · 2021-07-21T21:01:27Z

which is the trade-off

It would be nice to know what the trade-off is.

seberg · 2021-07-21T21:08:54Z

The one thing I wonder is if we could set the max-ulp to something function specific (and lower than 4) where we know it should be more precise (i.e. most functions both for intel SVM and other math libraries).

Not because I think 4 is unreasonable for pretty much any function, but it also "documents" what our precision typically is. And it might help us get a comparison with the current precision, I still have the feeling that these are more comparable to glibc than the 4 ULP may sound like.

It would be interesting to report if some exotic system is not within the precision limits we would like to see (rather than fail a test), but that is far beyond here :).

mattip · 2021-07-25T07:28:39Z

If you tighten the ULP to 2, what tests fail?

r-devulap · 2021-07-27T17:03:20Z

If you tighten the ULP to 2, what tests fail?

Nothing fails actually. Depending on the polynomial approximation used, the max ULP can occur at different points and its hard for a validation suite to capture those points for all the functions. I am working on getting hold of some data that might help with understanding the performance and accuracy trade offs.

r-devulap · 2021-07-27T17:10:33Z

The one thing I wonder is if we could set the max-ulp to something function specific (and lower than 4) where we know it should be more precise (i.e. most functions both for intel SVM and other math libraries).

We absolutely could set the max ulp to something function specific. I am looking into it.

It would be interesting to report if some exotic system is not within the precision limits we would like to see (rather than fail a test), but that is far beyond here :).

We already know that sin and cos implementation for some 32-bit libraries are terrible for largish numbers (see 742f3f1) . I would imagine it is quite a bit of work to document precision limits across all the platforms that NumPy supports.

r-devulap · 2021-07-28T04:16:04Z

Here are the ULP errors per function for the low accuracy (LA) SVML function in #19478 . Accuracy for single precision functions of one argument was determined by exhaustive testing, so these are the true maximum ulp errors. For the double precision functions, the worst case error was determined by measuring accuracy for billions of test points for each function (exhaustive testing is infeasible for double precision).

SVML	float32 LA ULP	float64 LA ULP
tan	3.91	3.93
cos	2.36	3.38
sin	2.37	3.3
asin	3.12	2.55
atan	2.3	2.52
atan2	3.26	2.37
expm1	2.62	2.1
cosh	2.48	1.97
log1p	1.96	1.93
log10	3.5	1.92
sinh	1.55	1.89
log2	2.12	1.84
cbrt	1.94	1.82
acos	2.1	1.67
log	1.84	1.67
exp	3.76	1.53
asinh	1.01	1.48
atanh	1.45	1.46
tanh	1.38	1.19
pow	1.23	1.06
acosh	1.16	1.05
exp2	1.01	1.04

Is it safe to assume that an accuracy of 2ULP is acceptable for double precision functions? For those that are worse than 2 ULP error (tan, cos, sin, asin, atan, expm1), moving from the low accuracy (LA) SVML implementation to high accuracy (HA) SVML slows it down by roughly 1.3x- 1.6x (depending on the function).

charris · 2021-07-28T04:33:38Z

@r-devulap Thanks. It is unfortunate that the biggest errs belong to some of the most used functions :) I wonder how much of that is due to the range over which the tests are run?

r-devulap · 2021-07-28T15:12:46Z

@r-devulap Thanks. It is unfortunate that the biggest errs belong to some of the most used functions :) I wonder how much of that is due to the range over which the tests are run?

Not a 100% sure but that is likely the case. When I was developing float32 sin, cos, exp and log algorithms on my own (#13368 and #13134), that was indeed the case. The range reduction step to reduce large numbers to a certain range introduces some errors and then the polynomial approximation adds some more.

seberg · 2021-07-28T18:17:21Z

Nice, this table is cool to have! Frankly, it may even be cool to have as some utility script that users can run if they are curious.

Hmm, the table here looks a bit different? And some of these, e.g. tan seems ⪅1 on typical math. For sin/cos, I guess the problem may just be very large/small values?

Looking at the official tables, it seems the high accuracy 1 ULP versions seem in the ballpark of about 50% slower on the high-end CPUs (less so on older CPUs).

r-devulap · 2021-07-28T20:18:44Z

Hmm, the table here looks a bit different?

The table you are looking at is VML which is part of oneAPI Math Kernel Library. As @akolesov-intel clarified in #19478 (comment), SVML is the internal library in Intel Compilers which has its own implementations.

And some of these, e.g. tan seems ⪅1 on typical math. For sin/cos, I guess the problem may just be very large/small values?

Yes, generally ULP errors tend be worse for very large values. sin, cos and tan should be fairly accurate for very small values.

Looking at the official tables, it seems the high accuracy 1 ULP versions seem in the ballpark of about 50% slower on the high-end CPUs (less so on older CPUs).

That looks to be about the same SVML too.

r-devulap · 2021-07-29T20:18:06Z

Updated csv files to reflect the appropriate ULP error.

rgommers · 2021-07-31T08:02:27Z

@r-devulap it seems to me that it would also be quite useful to see typical distributions of error. You are reporting only the worst accuracy over the full range, but it also matter if the median error is similar to that, or way lower. Maybe just take one or a couple of functions and plot a histogram of abs(actual - expected)? The mean or median of such a distribution would be relevant as well.

It looks like if you tell people to choose between "high accuracy" and "low accuracy", they will go for high. But if the average error is < 1 ULP, I wouldn't call it "low accuracy" exactly.

carlkl · 2021-07-31T17:15:06Z

You may also interested in Paul Zimmermann's @zimmermann6 work:
https://members.loria.fr/PZimmermann/papers/accuracy.pdf compares the accuracy of math functions across several math libraries.
https://gitlab.inria.fr/zimmerma/math_accuracy: Tools to check the accuracy of mathematical functions metioned in the paper above.

carlkl · 2021-07-31T17:32:52Z

As for precision vs accuracy: this was discussed i.e. in this issue tread: shibatch/sleef#107 as well some other threads of the sleef project. Basically:

correctly performed domain reduction (range reduction) reduces the performance regardless of the accuracy of the function.
Precision around 1 ulp needs a higher intermediate precision. In case of sleef this is done be its own SIMD implementation of double double arithmetic.
FMA instructions helps in case of 2. Needs Haswell and above.

I agree with @rgommers: if you are not sure what you need you will tend to higher precision anyway.

r-devulap · 2021-08-02T17:22:49Z

@r-devulap it seems to me that it would also be quite useful to see typical distributions of error. You are reporting only the worst accuracy over the full range, but it also matter if the median error is similar to that, or way lower. Maybe just take one or a couple of functions and plot a histogram of abs(actual - expected)? The mean or median of such a distribution would be relevant as well.

This is pretty easy to compute. We can compare the double precision output computed via SVML to the corresponding higher accuracy 128 bit float using the nulp_diff function in NumPy:

N = 10000000
arr1 = np.random.uniform(low=-1.0, high=1.0, size=N)
svml_out = np.sin(arr1)
true_out = np.sin(arr1.astype(np.float128))
ulp = nulp_diff(svml_out, true_out, dtype=np.float64)

Here are the stats for roughly a billion uniformly sampled numbers for the worst ULP error LA functions:

ufunc	0 ulp	1 ulp	2 ulp	3 ulp
sin	39.35 %	54.95 %	5.62 %	0.07 %
cos	36.13 %	55.11 %	8.66 %	0.10 %
tan	59.80 %	39.15 %	1.02 %	0.02 %
arctan	72.42 %	27.59 %	0.00 %	0.00 %
arcsin	73.74 %	25.40 %	0.86 %	0.00 %
expm1	96.67 %	3.33 %	0.00 %	0.00 %

It looks like if you tell people to choose between "high accuracy" and "low accuracy", they will go for high. But if the average error is < 1 ULP, I wouldn't call it "low accuracy" exactly.

From the table it is clear that more than 90% of the errors are ~~1ULP~~ 1.5 ULP or lesser. For comparison, the corresponding output for libm library in glibc 2.31 shows that errors are closer to 0.5 ULP:

ufunc	0 ulp	1 ulp
sin	99.88 %	0.12 %
cos	99.87 %	0.13 %
tan	99.97 %	0.03 %
arctan	99.97 %	0.03 %
arcsin	99.97 %	0.03 %
expm1	90.79 %	9.21 %

r-devulap · 2021-08-02T17:34:56Z

For the sake of completeness: the nulp_diff function snaps the float128 to the nearest float64 and then computes the ULP diff. So, a 0 ulp diff really means < 0.5 ULP and 1 ulp means < 1.5 ULP and so on.

mattip · 2021-08-04T15:05:28Z

numpy/core/tests/data/generate_umath_validation_data.cpp

+        //{"log",log,log,1.84,1.67},
+        {"log10",log10,log10,3.5,1.92},
+        {"log1p",log1p,log1p,1.96,1.93},
+        {"log2",log2,log2,2.12,1.84},


I am confused about the f32ulp and f64ulp values here (maybe move the struct definition closer to this code so it is clear what the fields are). How did you calculate them as floats?

Cleared up on a community call: these are actually the difference between the values so they should be floating point.

mattip · 2021-08-04T20:26:08Z

Thanks @r-devulap

github-actions bot added the 05 - Testing label Jul 15, 2021

r-devulap mentioned this pull request Jul 15, 2021

ENH: Vectorizing umath module using AVX-512 (open sourced from Intel Short Vector Math Library, SVML) #19478

Merged

r-devulap force-pushed the fp-tests branch 2 times, most recently from 1b7fd08 to b3a43b4 Compare July 15, 2021 21:38

tylerjereddy reviewed Jul 16, 2021

View reviewed changes

numpy/core/tests/test_umath.py Outdated Show resolved Hide resolved

numpy/core/tests/test_umath.py Outdated Show resolved Hide resolved

r-devulap mentioned this pull request Jul 16, 2021

ENH: confirm accuracy of basic math functions #13515

Closed

r-devulap force-pushed the fp-tests branch from 5f10d04 to f174707 Compare July 16, 2021 22:35

Raghuveer Devulapalli added 5 commits July 16, 2021 15:39

TST: Add tests to validate correctness for special floating points

1adc485

TST: Add umath validation datasets

0e26409

TST: Add the validation data set generator cpp file

b5e3361

BENCH: Add benchmarking for ufuncs using SVML implementations

41b14b1

MAINT: Abstract tests to avoid repetition

f174707

Raghuveer Devulapalli added 2 commits July 29, 2021 10:04

TST: Update data set generator to add ULP error per function

936b618

TST: Update csv files to use appropriate ULP error per function

a2593d3

mattip reviewed Aug 4, 2021

View reviewed changes

mattip merged commit 07cbfff into numpy:main Aug 4, 2021

r-devulap mentioned this pull request Jan 3, 2023

Numeric mismatch of np.log() #22904

Open

mattip mentioned this pull request Feb 8, 2023

ENH: Integrate Optimized Routines for AArch64 #23171

Closed

sidneymau mentioned this pull request Feb 17, 2026

Add end-to-end CI DukeCosmology/roman_imsim#78

Merged

Uh oh!

Conversation

r-devulap commented Jul 15, 2021

Uh oh!

tylerjereddy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

r-devulap commented Jul 16, 2021

Uh oh!

r-devulap commented Jul 16, 2021

Uh oh!

r-devulap commented Jul 16, 2021

Uh oh!

charris commented Jul 16, 2021

Uh oh!

r-devulap commented Jul 16, 2021

Uh oh!

charris commented Jul 16, 2021

Uh oh!

r-devulap commented Jul 16, 2021

Uh oh!

charris commented Jul 16, 2021

Uh oh!

r-devulap commented Jul 16, 2021

Uh oh!

charris commented Jul 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

r-devulap commented Jul 16, 2021

Uh oh!

mattip commented Jul 18, 2021

Uh oh!

r-devulap commented Jul 21, 2021

Uh oh!

charris commented Jul 21, 2021

Uh oh!

mhvk commented Jul 21, 2021

Uh oh!

charris commented Jul 21, 2021

Uh oh!

r-devulap commented Jul 21, 2021

Uh oh!

charris commented Jul 21, 2021

Uh oh!

seberg commented Jul 21, 2021

Uh oh!

mattip commented Jul 25, 2021

Uh oh!

r-devulap commented Jul 27, 2021

Uh oh!

r-devulap commented Jul 27, 2021

Uh oh!

r-devulap commented Jul 28, 2021

Uh oh!

charris commented Jul 28, 2021

Uh oh!

r-devulap commented Jul 28, 2021

Uh oh!

seberg commented Jul 28, 2021

Uh oh!

r-devulap commented Jul 28, 2021

Uh oh!

r-devulap commented Jul 29, 2021

Uh oh!

rgommers commented Jul 31, 2021

Uh oh!

carlkl commented Jul 31, 2021

Uh oh!

carlkl commented Jul 31, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

r-devulap commented Aug 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

r-devulap commented Aug 2, 2021

Uh oh!

mattip Aug 4, 2021

Choose a reason for hiding this comment

charris commented Jul 16, 2021 •

edited

Loading

carlkl commented Jul 31, 2021 •

edited

Loading

r-devulap commented Aug 2, 2021 •

edited

Loading