ENH: Use high accuracy SVML for double precision umath functions #24006

r-devulap · 2023-06-20T16:41:15Z

Todo:

~~Add sin, cos and tanh~~
Benchmark numbers

r-devulap · 2023-06-21T21:44:56Z

Benchmark numbers: On average the double precision high accuracy version seems 1.23x slower than the LA versions (results do not include sin, cos and tanh):

       before           after         ratio
     [92ee012a]       [a0bb1b5e]
     <main>           <svml_ha>
+     14.2±0.07μs      22.7±0.02μs     1.60  bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'tan'>, 1, 1, 'd')
+     13.2±0.02μs       19.3±0.1μs     1.46  bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'expm1'>, 1, 1, 'd')
+     27.0±0.05μs      34.8±0.01μs     1.29  bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'arccosh'>, 1, 1, 'd')
+     29.6±0.02μs      37.5±0.02μs     1.27  bench_ufunc_strides.BinaryFP.time_binary(<ufunc 'power'>, 1, 1, 1, 'd')
+     11.9±0.01μs         15.0±0μs     1.26  bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'log10'>, 1, 1, 'd')
+     14.9±0.01μs       18.6±0.5μs     1.25  bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'arcsin'>, 1, 1, 'd')
+     12.0±0.03μs      14.9±0.01μs     1.24  bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'log2'>, 1, 1, 'd')
+     11.9±0.03μs      14.7±0.01μs     1.23  bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'log'>, 1, 1, 'd')
+      15.7±0.2μs       19.0±0.3μs     1.21  bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'log1p'>, 1, 1, 'd')
+      25.0±0.6μs         30.2±2μs     1.21  bench_ufunc_strides.BinaryFP.time_binary(<ufunc 'arctan2'>, 1, 1, 1, 'd')
+     17.3±0.05μs       20.2±0.1μs     1.17  bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'arccos'>, 1, 1, 'd')
+     15.7±0.03μs         17.7±0μs     1.13  bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'sinh'>, 1, 1, 'd')
+      21.5±0.4μs      24.1±0.01μs     1.12  bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'arctanh'>, 1, 1, 'd')
+     11.1±0.03μs         12.4±0μs     1.11  bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'cosh'>, 1, 1, 'd')
+      18.6±0.9μs      20.0±0.02μs     1.07  bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'arctan'>, 1, 1, 'd')
+     14.3±0.08μs       15.2±0.2μs     1.06  bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'cbrt'>, 1, 1, 'd')

seberg · 2023-06-28T10:48:37Z

seems 1.23x slower than the LA versions

Wow, I thought it would be much more of a difference, although I will note that those benchmarks seem to run in 10-20μs and may have ~1μs overhead, so the slowdown should be a bit more in practice.
Don't we have larger benchmarks, or do they just report a constant time because all of these are in practice memory throughput bound?

TBH, if this slowdown is what it is, I am comfortable with the high precision as default.

Some of the tests seem to fail due to lower error bounds (i.e. the smoke test), I am not sure if that is the system lib or not, but I guess just means we need to relax those bounds slightly.

This test is failing due to a compiler error because inlining failed?

r-devulap · 2023-06-28T16:27:28Z

Wow, I thought it would be much more of a difference, although I will note that those benchmarks seem to run in 10-20μs and may have ~1μs overhead, so the slowdown should be a bit more in practice. Don't we have larger benchmarks, or do they just report a constant time because all of these are in practice memory throughput bound?

We benchmark 10000 elements. Let me try increasing the array size. But i suspect the numbers wont be too different.

TBH, if this slowdown is what it is, I am comfortable with the high precision as default.

I agree. 20-30% speed isn't worth all the fuss about accuracy.

Some of the tests seem to fail due to lower error bounds (i.e. the smoke test), I am not sure if that is the system lib or not, but I guess just means we need to relax those bounds slightly.

4 ULP tanh was converted to universal intrinsic and we aren't using SVML for now. I was thinking I could backport SVML to universal intrinsic but it might need more time than I thought.

This test is failing due to a compiler error because inlining failed?

Not sure, will spend some time today/tomorrow in fixing up the CI failures.

r-devulap · 2023-06-29T04:12:24Z

I think I will defer sin, cos and tanh to another PR. I will try and convert sin/cos HA SVML to universal intrinsics, which might take me some time.

r-devulap · 2023-06-29T17:07:50Z

huh, it looks like libm float64 crbt has a 2 ULP accuracy.

r-devulap · 2023-06-30T16:56:56Z

The CI looks stuck?

charris · 2023-06-30T21:10:12Z

The pyodide error looks unrelated. @hoodmane Could you take a look?

charris · 2023-06-30T21:10:32Z

Thanks @r-devulap .

hoodmane · 2023-06-30T21:37:13Z

Definitely looks like it's our fault in some way. Probably we needed to pin pydantic, looks like they released a breaking change and we don't have a pin on them. Pinging more Pyodide people who might have a better idea about this: @rth @ryanking13

hoodmane · 2023-07-01T00:13:16Z

Opened #24091.

ENH: Use high accuracy SVML for float64 umath functions

6e0df62

github-actions bot added the 01 - Enhancement label Jun 20, 2023

Raghuveer Devulapalli added 2 commits June 20, 2023 09:43

Update SVML submodule

d13bc1f

Update ULP tolerance to 1 ULP for double precision

e6b4e4a

Use 2 ULP for tanh

678afb7

BLD: Link SVML FP16 only for latest assembler

12d0df7

r-devulap force-pushed the svml_ha branch from e700045 to bab1030 Compare June 29, 2023 17:10

set 2 ULP accuracy for float64 cbrt

bab1030

Empty-Commit

b591eb6

charris merged commit d87c7e6 into numpy:main Jun 30, 2023

r-devulap mentioned this pull request Jul 16, 2023

BUG: Build fail on Fedora #24195

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Use high accuracy SVML for double precision umath functions #24006

ENH: Use high accuracy SVML for double precision umath functions #24006

Uh oh!

r-devulap commented Jun 20, 2023 •

edited

Loading

Uh oh!

r-devulap commented Jun 21, 2023 •

edited

Loading

Uh oh!

seberg commented Jun 28, 2023

Uh oh!

r-devulap commented Jun 28, 2023

Uh oh!

r-devulap commented Jun 29, 2023

Uh oh!

r-devulap commented Jun 29, 2023

Uh oh!

r-devulap commented Jun 30, 2023

Uh oh!

charris commented Jun 30, 2023

Uh oh!

charris commented Jun 30, 2023

Uh oh!

hoodmane commented Jun 30, 2023

Uh oh!

hoodmane commented Jul 1, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

ENH: Use high accuracy SVML for double precision umath functions #24006

ENH: Use high accuracy SVML for double precision umath functions #24006

Uh oh!

Conversation

r-devulap commented Jun 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

r-devulap commented Jun 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg commented Jun 28, 2023

Uh oh!

r-devulap commented Jun 28, 2023

Uh oh!

r-devulap commented Jun 29, 2023

Uh oh!

r-devulap commented Jun 29, 2023

Uh oh!

r-devulap commented Jun 30, 2023

Uh oh!

charris commented Jun 30, 2023

Uh oh!

charris commented Jun 30, 2023

Uh oh!

hoodmane commented Jun 30, 2023

Uh oh!

hoodmane commented Jul 1, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

r-devulap commented Jun 20, 2023 •

edited

Loading

r-devulap commented Jun 21, 2023 •

edited

Loading