TST/BENCH: Adding test coverage and benchmarks for floating point umath functions#19485
TST/BENCH: Adding test coverage and benchmarks for floating point umath functions#19485mattip merged 7 commits intonumpy:mainfrom
Conversation
1b7fd08 to
b3a43b4
Compare
tylerjereddy
left a comment
There was a problem hiding this comment.
That's a lot of CSV data files!
I see there's a README at that path explaining what is going on.
Super minor point, but the README says: Add a file 'umath-validation-set-<ufuncname>.txt; looks like the team is going with .csv instead.
yup, but it is in line with this wish list #13515. This patch covers most of the umath functions and hence the long list of csv files. Commit a655aa8 also adds the C++ code I wrote to generate these files. I am not sure if you would want to include that in the repository. If you don't care for it, I am okay reverting that commit.
Thanks for pointing that out. I will update the |
|
The umath validation tests fail on the manylinux2010 container which seems to have really bad ULP errors (max difference is 9.21431e+18 ULP for sin and cos). For now, I have enabled these tests only if both the compiler and the CPU supports AVX512 (which essentially restricts to testing just the AVX512 code paths in NumPy). |
umm, on a second thought, This still might not work :/ |
|
We have dropped 2010 for the development numpy wheels, looks like we need to do it locally also. |
I will stop worrying about fixing the CI failures then, both the failing checks are from manylinux2010. |
|
See #19498. |
I will rebase after that gets merged. |
I think we should wait to see if things work with 2014. |
It passes on manylinux2014_i686 on my local machine at least, optimistic it will be okay after I rebase. |
|
OK, the change to manylinux2014 is in. |
|
rebased .. |
|
This extends the existing set of |
4-ulp max errors means that up to two bits at the end of a result’s significand may be incorrect, so the top 51 bits are still correct. I think the results are still pretty accurate and not a major concern IMO. For comparison, note that such an error can be accumulated easily after a few (possibly just 8) floating-point additions, multiplications, or divisions in a row. What level of accuracy would be acceptable for these functions? |
Hard to say, I expect it would depend on the application. Astronomy might be an area where exact coordinates transformations or timing might matter. Outside of that, my rule of thumb is that if you end up with 10 digits of accuracy at the end of a calculation, you have done well enough. @mhvk might have some thoughts. |
|
Hard to make a generic statement for astropy - I assume this is mostly for more complicated functions (trig, exp, log, etc.)? For those the loss of a few bits should probably be OK, though it may well be that tests would start failing as they assume more stringent constraints even if those are not needed scientifically. For addition/multiplication, we do try very hard to keep losses as small as possible (especially in our But I am obviously very intrigued by the very large speed-ups mentioned in #19478! |
|
@r-devulap Are there timings available for the high precision versions? |
I don't have specific numbers at the moment. But since higher accuracy means lower performance (which is the trade-off), they certainly will be slower than the ones in #19478. |
It would be nice to know what the trade-off is. |
|
The one thing I wonder is if we could set the max-ulp to something function specific (and lower than 4) where we know it should be more precise (i.e. most functions both for intel SVM and other math libraries). Not because I think 4 is unreasonable for pretty much any function, but it also "documents" what our precision typically is. And it might help us get a comparison with the current precision, I still have the feeling that these are more comparable to It would be interesting to report if some exotic system is not within the precision limits we would like to see (rather than fail a test), but that is far beyond here :). |
|
If you tighten the ULP to 2, what tests fail? |
Nothing fails actually. Depending on the polynomial approximation used, the max ULP can occur at different points and its hard for a validation suite to capture those points for all the functions. I am working on getting hold of some data that might help with understanding the performance and accuracy trade offs. |
We absolutely could set the max ulp to something function specific. I am looking into it.
We already know that sin and cos implementation for some 32-bit libraries are terrible for largish numbers (see 742f3f1) . I would imagine it is quite a bit of work to document precision limits across all the platforms that NumPy supports. |
|
Here are the ULP errors per function for the low accuracy (LA) SVML function in #19478 . Accuracy for single precision functions of one argument was determined by exhaustive testing, so these are the true maximum ulp errors. For the double precision functions, the worst case error was determined by measuring accuracy for billions of test points for each function (exhaustive testing is infeasible for double precision).
Is it safe to assume that an accuracy of 2ULP is acceptable for double precision functions? For those that are worse than 2 ULP error ( |
|
@r-devulap Thanks. It is unfortunate that the biggest errs belong to some of the most used functions :) I wonder how much of that is due to the range over which the tests are run? |
Not a 100% sure but that is likely the case. When I was developing float32 sin, cos, exp and log algorithms on my own (#13368 and #13134), that was indeed the case. The range reduction step to reduce large numbers to a certain range introduces some errors and then the polynomial approximation adds some more. |
|
Nice, this table is cool to have! Frankly, it may even be cool to have as some utility script that users can run if they are curious. Hmm, the table here looks a bit different? And some of these, e.g. Looking at the official tables, it seems the high accuracy 1 ULP versions seem in the ballpark of about 50% slower on the high-end CPUs (less so on older CPUs). |
The table you are looking at is VML which is part of oneAPI Math Kernel Library. As @akolesov-intel clarified in #19478 (comment), SVML is the internal library in Intel Compilers which has its own implementations.
Yes, generally ULP errors tend be worse for very large values. sin, cos and tan should be fairly accurate for very small values.
That looks to be about the same SVML too. |
|
Updated csv files to reflect the appropriate ULP error. |
|
@r-devulap it seems to me that it would also be quite useful to see typical distributions of error. You are reporting only the worst accuracy over the full range, but it also matter if the median error is similar to that, or way lower. Maybe just take one or a couple of functions and plot a histogram of It looks like if you tell people to choose between "high accuracy" and "low accuracy", they will go for high. But if the average error is < 1 ULP, I wouldn't call it "low accuracy" exactly. |
|
You may also interested in Paul Zimmermann's @zimmermann6 work: |
|
As for precision vs accuracy: this was discussed i.e. in this issue tread: shibatch/sleef#107 as well some other threads of the sleef project. Basically:
I agree with @rgommers: if you are not sure what you need you will tend to higher precision anyway. |
This is pretty easy to compute. We can compare the double precision output computed via SVML to the corresponding higher accuracy 128 bit float using the Here are the stats for roughly a billion uniformly sampled numbers for the worst ULP error LA functions:
From the table it is clear that more than 90% of the errors are
|
|
For the sake of completeness: the |
| //{"log",log,log,1.84,1.67}, | ||
| {"log10",log10,log10,3.5,1.92}, | ||
| {"log1p",log1p,log1p,1.96,1.93}, | ||
| {"log2",log2,log2,2.12,1.84}, |
There was a problem hiding this comment.
I am confused about the f32ulp and f64ulp values here (maybe move the struct definition closer to this code so it is clear what the fields are). How did you calculate them as floats?
There was a problem hiding this comment.
Cleared up on a community call: these are actually the difference between the values so they should be floating point.
|
Thanks @r-devulap |
Expands test coverage for umath module in the following ways: