Move legacy TH functions(sinh,cosh) to TensorIterator + Vec256 #21115

xta0 · 2019-05-30T03:03:16Z

This is a follow up on Jame's PR: #19041. The idea is to replace the legacy sinh / cosh ops that are being dispatched to TH with the operations defined in Vec256 for better performance.

benchmark(from Jame's script):

import torch, time
ops = ['sinh', 'cosh']
x = torch.rand(1024, 1024)
NITER = 10000

print('op', 'time per iter (ms)', 'gops/s', 'GB/s', sep='\t')
for op in ops:
    s = time.time()
    for i in range(NITER):
        getattr(x, op)()
    elapsed_sec = ((time.time() - s) / NITER)
    print(op, elapsed_sec * 1000, (1024*1024/elapsed_sec)/1e9, (1024*1024*4*2) / elapsed_sec / 1e9, sep='\t')

code on master:

op	time per iter (ms)	gops/s	GB/s
sinh	3.37614369392395	0.3105839369002935	2.484671495202348
cosh	3.480502033233643	0.3012714803748572	2.4101718429988574

after change (on Macbook pro 2018):

op	time per iter (ms)	gops/s	GB/s
sinh	0.8956503868103027	1.1707425301677301	9.365940241341841
cosh	0.9392147302627564	1.1164390487217428	8.931512389773943

This reverts commit c2ab94f.

ljk53 · 2019-05-30T03:34:03Z

aten/src/ATen/native/UnaryOps.cpp

 IMPLEMENT_UNARY_OP_VEC(ceil)
 IMPLEMENT_UNARY_OP_VEC(cos)
-IMPLEMENT_UNARY_OP_TH(cosh)
+IMPLEMENT_UNARY_OP_VEC(cosh)


Can we remove the IMPLEMENT_UNARY_OP_TH macro defined above?

Sounds good.

ljk53 · 2019-05-30T03:46:32Z

And did you try removing TH cpu backend for _th_cosh & _th_sinh from aten/src/ATen/Declarations.cwrap?

…into dispatch_sinh_cosh

cpuhrsch · 2019-05-30T14:31:00Z

Please notice this comment and also this comment

cc @VitalyFedyunin

cpuhrsch · 2019-05-30T14:32:23Z

glibc's math library doesn't vectorize sinh and cosh and the float specializations of vec256's sinh and cosh doesn't use anything better than that.

cpuhrsch · 2019-05-30T14:41:41Z

The sleef sinh operations are limit in ULP1 range, whereas glibc guarantees ULP2 for the full range. Looking at the function this sort of doesn't make any sense in the tails (since the function grows so large) and yet we want to still produce values that are standardized on something. We also don't want to produce 0s, which is what SLEEF would do.

Now, VML is compliant with glibc as far as I know, but there are issues with clang as mentioned. sinh and cosh were disabled as part of this PR. It's true that I should have added more detail, but we have yet to do a full investigation into why this fails for clang or at least disable it for clang.

I'd say the better strategy is to

Reenable VML for sinh and cosh via the performant macro used for the other unary functions and within the vml header
Closely monitor clang's behavior and investigate any failures.

But the current PR likely mostly takes advantage of the improved memory read patterns. Take care that the transition penalty bug isn't happening and see how other Unary functions deal with it.

xta0 · 2019-05-30T23:08:37Z

And did you try removing TH cpu backend for _th_cosh & _th_sinh from aten/src/ATen/Declarations.cwrap?

Yes, I've removed the cpu backend for both of them.

xta0 · 2019-05-30T23:28:12Z

glibc's math library doesn't vectorize sinh and cosh and the float specializations of vec256's sinh and cosh doesn't use anything better than that.

Yea, I switched the method from unary_kernel_vec to unary_kernel.

facebook-github-bot

@xta0 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

fmassa · 2019-05-31T09:39:25Z

While you are st it, can you also do the same thing for CUDA? It shouldn't be much more work, and makes everything simpler

VitalyFedyunin · 2019-05-31T17:49:46Z

aten/src/ATen/native/UnaryOps.cpp

 IMPLEMENT_UNARY_OP_VEC(ceil)
 IMPLEMENT_UNARY_OP_VEC(cos)
-IMPLEMENT_UNARY_OP_TH(cosh)
+IMPLEMENT_UNARY_OP_VEC(cosh)


Can we comment here, that kernels are not using Vec256 (because _VEC is confusing )

VitalyFedyunin

Code looks good. PR would become perfect if you remove dead TH operators code.

facebook-github-bot · 2019-05-31T19:43:12Z

@xta0 merged this pull request in 052bab7.

Summary: This is a follow up on Jame's PR: pytorch/pytorch#19041. The idea is to replace the legacy `sinh` / `cosh` ops that are being dispatched to TH with the operations defined in `Vec256` for better performance. benchmark(from Jame's script): ```python import torch, time ops = ['sinh', 'cosh'] x = torch.rand(1024, 1024) NITER = 10000 print('op', 'time per iter (ms)', 'gops/s', 'GB/s', sep='\t') for op in ops: s = time.time() for i in range(NITER): getattr(x, op)() elapsed_sec = ((time.time() - s) / NITER) print(op, elapsed_sec * 1000, (1024*1024/elapsed_sec)/1e9, (1024*1024*4*2) / elapsed_sec / 1e9, sep='\t') ``` code on master: ``` op time per iter (ms) gops/s GB/s sinh 3.37614369392395 0.3105839369002935 2.484671495202348 cosh 3.480502033233643 0.3012714803748572 2.4101718429988574 ``` after change (on Macbook pro 2018): ``` op time per iter (ms) gops/s GB/s sinh 0.8956503868103027 1.1707425301677301 9.365940241341841 cosh 0.9392147302627564 1.1164390487217428 8.931512389773943 ``` Pull Request resolved: pytorch/pytorch#21115 Reviewed By: ljk53 Differential Revision: D15574580 Pulled By: xta0 fbshipit-source-id: 392546a0df11ed4f0945f2bc84bf5dea2750b60e

xta0 added 10 commits May 29, 2019 10:32

changes

c2ab94f

Revert "changes"

25dd14b

This reverts commit c2ab94f.

udpate

f1cb0cf

update

5666143

remove iostream dependency

b8c9ad1

fix typo

ac32913

remove iostream dependency

5f0f89d

revert unnecessary changes

f8adde2

revert unnecessary changes

b9002c5

remove unnecessary changes

f680947

pytorchbot added module: cpu CPU specific problem (e.g., perf, algorithm) module: operators labels May 30, 2019

ljk53 requested review from cpuhrsch, gchanan, jamesr66a and li-roy May 30, 2019 03:24

ljk53 reviewed May 30, 2019

View reviewed changes

xta0 added 7 commits May 29, 2019 21:51

remove unnecessary changes

6159ff4

Merge branch 'dispatch_sinh_cosh' of https://github.com/xta0/pytorch …

c4fce3e

…into dispatch_sinh_cosh

remove unnecessary changes

59cee15

Merge branch 'dispatch_sinh_cosh' of https://github.com/xta0/pytorch …

6e31c50

…into dispatch_sinh_cosh

remove the IMPLEMENT_UNARY_OP_TH macro

534c06c

remove the IMPLEMENT_UNARY_OP_TH macro

8f67195

Merge branch 'dispatch_sinh_cosh' of https://github.com/xta0/pytorch …

85ef253

…into dispatch_sinh_cosh

cpuhrsch requested a review from VitalyFedyunin May 30, 2019 14:31

remove vectorization

e0d5e1f

pytorchbot added the module: internals Related to internal abstractions in c10 and ATen label May 30, 2019

remove vectorization

7fd9709

facebook-github-bot reviewed May 31, 2019

View reviewed changes

VitalyFedyunin reviewed May 31, 2019

View reviewed changes

VitalyFedyunin approved these changes May 31, 2019

View reviewed changes

facebook-github-bot closed this in 052bab7 May 31, 2019

facebook-github-bot added the merged label May 31, 2019

VitalyFedyunin added the module: porting Issues related to porting TH/THNN legacy to ATen native label Jul 11, 2019

fmassa mentioned this pull request Jul 15, 2019

Port sign operator from the TH code to Aten #22806

Closed

mruberry added the Merged label Oct 28, 2020

Move legacy TH functions(sinh,cosh) to TensorIterator + Vec256 #21115

Move legacy TH functions(sinh,cosh) to TensorIterator + Vec256 #21115

Uh oh!

Conversation

xta0 commented May 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ljk53 May 30, 2019

Choose a reason for hiding this comment

Uh oh!

xta0 May 30, 2019

Choose a reason for hiding this comment

Uh oh!

ljk53 commented May 30, 2019

Uh oh!

cpuhrsch commented May 30, 2019

Uh oh!

cpuhrsch commented May 30, 2019

Uh oh!

cpuhrsch commented May 30, 2019

Uh oh!

xta0 commented May 30, 2019

Uh oh!

xta0 commented May 30, 2019

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

fmassa commented May 31, 2019

Uh oh!

VitalyFedyunin May 31, 2019

Choose a reason for hiding this comment

Uh oh!

VitalyFedyunin left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented May 31, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

xta0 commented May 30, 2019 •

edited

Loading