Skip to content

optimize BFloat16 elemwise operators CPU: sigmoid, sigmoid_backward, tanh_backward, addcmul, addcdiv#55221

Closed
mingfeima wants to merge 24 commits intogh/mingfeima/19/basefrom
gh/mingfeima/19/head
Closed

optimize BFloat16 elemwise operators CPU: sigmoid, sigmoid_backward, tanh_backward, addcmul, addcdiv#55221
mingfeima wants to merge 24 commits intogh/mingfeima/19/basefrom
gh/mingfeima/19/head

Conversation

@mingfeima
Copy link
Copy Markdown
Collaborator

@mingfeima mingfeima commented Apr 2, 2021

Stack from ghstack:

Differential Revision: D28836797

@facebook-github-bot
Copy link
Copy Markdown
Contributor

facebook-github-bot commented Apr 2, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 6cc108f (more details on the Dr. CI page):


  • 2/2 failures possibly* introduced in this PR
    • 1/2 non-scanned failure(s)

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See GitHub Actions build linux-bionic-py3.8-gcc9-coverage / test (default, 1, 2, linux.2xlarge) (1/1)

Step: "Test PyTorch" (full log | diagnosis details | 🔁 rerun)

2021-08-18T03:00:36.3795161Z Build left local git repository checkout dirty
2021-08-18T03:00:28.3723603Z real	84m57.530s
2021-08-18T03:00:28.3724134Z user	191m29.737s
2021-08-18T03:00:28.3724487Z sys	25m30.980s
2021-08-18T03:00:28.3725112Z + assert_git_not_dirty
2021-08-18T03:00:28.3726217Z + [[ linux-bionic-py3.8-gcc9-coverage-default != *rocm* ]]
2021-08-18T03:00:28.3727798Z + [[ linux-bionic-py3.8-gcc9-coverage-default != *xla* ]]
2021-08-18T03:00:28.3728540Z ++ git status --porcelain
2021-08-18T03:00:36.3792906Z + git_status='?? third_party/pocketfft/'
2021-08-18T03:00:36.3793672Z + [[ -n ?? third_party/pocketfft/ ]]
2021-08-18T03:00:36.3794662Z + echo 'Build left local git repository checkout dirty'
2021-08-18T03:00:36.3795161Z Build left local git repository checkout dirty
2021-08-18T03:00:36.3795697Z + echo 'git status --porcelain:'
2021-08-18T03:00:36.3796141Z git status --porcelain:
2021-08-18T03:00:36.3796604Z + echo '?? third_party/pocketfft/'
2021-08-18T03:00:36.3796977Z ?? third_party/pocketfft/
2021-08-18T03:00:36.3797294Z + exit 1
2021-08-18T03:00:36.3797552Z + cleanup
2021-08-18T03:00:36.3797836Z + retcode=1
2021-08-18T03:00:36.3798095Z + set +x
2021-08-18T03:00:36.3798442Z =================== sccache compilation log ===================
2021-08-18T03:00:36.3974513Z =========== If your build fails, please take a look at the log above for possible reasons ===========

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

mingfeima added a commit that referenced this pull request Apr 2, 2021
…l, addcdiv

ghstack-source-id: 1c4aa33
Pull Request resolved: #55221
@mingfeima
Copy link
Copy Markdown
Collaborator Author

Since this PR is not related to parallelization feature, only single core perf is tested:
NB: tanh_backward doesn't have BFloat16 support previously, the before perf refers to simple impl as following:

-    AT_DISPATCH_FLOATING_TYPES(iter.dtype(), "tanh_backward_cpu", [&]() {
+    AT_DISPATCH_FLOATING_TYPES_AND(kBFloat16, iter.dtype(), "tanh_backward_cpu", [&]() {
  • performance update on avx512 machine: Xeon(R) Gold 6248 CPU @ 2.50GHz
before: sigmoid: 1x128x1024: fp32: 0.101 ms; bf16: 0.272 ms
after:  sigmoid: 1x128x1024: fp32: 0.109 ms; bf16: 0.144 ms

before: sigmoid backward: 1x128x1024: fp32: 0.114 ms; bf16: 0.154 ms
after:  sigmoid backward: 1x128x1024: fp32: 0.111 ms; bf16: 0.091 ms

before: tanh backward: 1x128x1024: fp32: 0.110 ms; bf16: 0.185 ms
after:  tanh backward: 1x128x1024: fp32: 0.111 ms; bf16: 0.085 ms
  • performance update on avx2 machine: Xeon(R) CPU E5-2680 v3 @ 2.50GHz
before: sigmoid: 1x128x1024: fp32: 0.241 ms; bf16: 0.480 ms
after:  sigmoid: 1x128x1024: fp32: 0.237 ms; bf16: 0.238 ms

before: sigmoid backward: 1x128x1024: fp32: 0.128 ms; bf16: 0.299 ms
after:  sigmoid backward: 1x128x1024: fp32: 0.131 ms; bf16: 0.149 ms

before: tanh backward: 1x128x1024: fp32: 0.134 ms; bf16: 0.299 ms
after:  tanh backward: 1x128x1024: fp32: 0.136 ms; bf16: 0.131 ms

@mingfeima mingfeima changed the title elewise op bf16 cpu: sigmoid, sigmoid_backward, tanh_backward, addcmul, addcdiv optimize BFloat16 elemwise operators CPU: sigmoid, sigmoid_backward, tanh_backward, addcmul, addcdiv Apr 2, 2021
…_backward, tanh_backward, addcmul, addcdiv"

[ghstack-poisoned]
…_backward, tanh_backward, addcmul, addcdiv"

[ghstack-poisoned]
…_backward, tanh_backward, addcmul, addcdiv"

[ghstack-poisoned]
…_backward, tanh_backward, addcmul, addcdiv"

[ghstack-poisoned]
…_backward, tanh_backward, addcmul, addcdiv"

[ghstack-poisoned]
…_backward, tanh_backward, addcmul, addcdiv"

[ghstack-poisoned]
…_backward, tanh_backward, addcmul, addcdiv"

[ghstack-poisoned]
mingfeima added a commit to mingfeima/pytorch that referenced this pull request Apr 28, 2021
…_backward, tanh_backward, addcmul, addcdiv"

[ghstack-poisoned]
mingfeima added 2 commits May 13, 2021 10:52
…_backward, tanh_backward, addcmul, addcdiv"

[ghstack-poisoned]
…_backward, tanh_backward, addcmul, addcdiv"

[ghstack-poisoned]
dgl-intel pushed a commit to dgl-intel/pytorch that referenced this pull request May 14, 2021
@VitalyFedyunin
Copy link
Copy Markdown
Contributor

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@VitalyFedyunin
Copy link
Copy Markdown
Contributor

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Copy Markdown
Contributor

@VitalyFedyunin VitalyFedyunin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change code pattern from dispatching on all types and specializing on bfloat16 to check type first and dispatch only on right types.

Comment thread aten/src/ATen/native/cpu/BinaryOpsKernel.cpp Outdated
Comment thread aten/src/ATen/native/cpu/BinaryOpsKernel.cpp Outdated
Comment thread aten/src/ATen/native/cpu/BinaryOpsKernel.cpp Outdated
@VitalyFedyunin
Copy link
Copy Markdown
Contributor

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@mingfeima
Copy link
Copy Markdown
Collaborator Author

rebased!

@mingfeima
Copy link
Copy Markdown
Collaborator Author

rebased!

@mingfeima
Copy link
Copy Markdown
Collaborator Author

rebased!

@mingfeima
Copy link
Copy Markdown
Collaborator Author

rebased!

@VitalyFedyunin
Copy link
Copy Markdown
Contributor

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@VitalyFedyunin merged this pull request in 94d6215.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants