-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[bert/RoBERTa] Optimize LayerNorm with explicit vectorization using Vec256 #29104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ec256 We would like to provide the vectorized implementation for layer norm. This PR reuses #23349. Differential Revision: [D18293522](https://our.internmc.facebook.com/intern/diff/D18293522/) [ghstack-poisoned]
…ec256 We would like to provide the vectorized implementation for layer norm. This PR reuses #23349. Differential Revision: [D18293522](https://our.internmc.facebook.com/intern/diff/D18293522/) ghstack-source-id: 93164082 Pull Request resolved: #29104
…ion using Vec256" We would like to provide the vectorized implementation for layer norm. This PR reuses #23349. Differential Revision: [D18293522](https://our.internmc.facebook.com/intern/diff/D18293522/) [ghstack-poisoned]
…ec256 Pull Request resolved: #29104 We would like to provide the vectorized implementation for layer norm. This PR reuses #23349. Differential Revision: [D18293522](https://our.internmc.facebook.com/intern/diff/D18293522/) ghstack-source-id: 93167727
…ion using Vec256" We would like to provide the vectorized implementation for layer norm. This PR reuses #23349. Differential Revision: [D18293522](https://our.internmc.facebook.com/intern/diff/D18293522/) [ghstack-poisoned]
…ec256 Pull Request resolved: #29104 We would like to provide the vectorized implementation for layer norm. This PR reuses #23349. Differential Revision: [D18293522](https://our.internmc.facebook.com/intern/diff/D18293522/) ghstack-source-id: 93170831
…ion using Vec256" We would like to provide the vectorized implementation for layer norm. This PR reuses #23349. Differential Revision: [D18293522](https://our.internmc.facebook.com/intern/diff/D18293522/) [ghstack-poisoned]
…ec256 Pull Request resolved: #29104 We would like to provide the vectorized implementation for layer norm. This PR reuses #23349. Differential Revision: [D18293522](https://our.internmc.facebook.com/intern/diff/D18293522/) ghstack-source-id: 93176468
…ion using Vec256" We would like to provide the vectorized implementation for layer norm. This PR reuses #23349. Differential Revision: [D18293522](https://our.internmc.facebook.com/intern/diff/D18293522/) [ghstack-poisoned]
…ion using Vec256" We would like to provide the vectorized implementation for layer norm. This PR reuses #23349. Differential Revision: [D18293522](https://our.internmc.facebook.com/intern/diff/D18293522/) [ghstack-poisoned]
…ec256 Pull Request resolved: #29104 We would like to provide the vectorized implementation for layer norm. This PR reuses #23349. Differential Revision: [D18293522](https://our.internmc.facebook.com/intern/diff/D18293522/) ghstack-source-id: 93608529
|
I can reproduce it on a Skylake machine before my PR with Output: |
…ion using Vec256" We would like to provide the vectorized implementation for layer norm. This PR reuses #23349. Differential Revision: [D18293522](https://our.internmc.facebook.com/intern/diff/D18293522/) [ghstack-poisoned]
…ec256 Pull Request resolved: #29104 We would like to provide the vectorized implementation for layer norm. This PR reuses #23349. Differential Revision: [D18293522](https://our.internmc.facebook.com/intern/diff/D18293522/) ghstack-source-id: 93611515
|
Fixed the issue by reusing |
jamesr66a
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Does this improve some benchmarks?
Will check the performance before landing. Thanks! |
…ion using Vec256" We would like to provide the vectorized implementation for layer norm. This PR reuses #23349. Differential Revision: [D18293522](https://our.internmc.facebook.com/intern/diff/D18293522/) [ghstack-poisoned]
|
Update the performance number in the summary. |
…ion using Vec256" We would like to provide the vectorized implementation for layer norm. This PR reuses #23349. Single Core: (Note that our benchmark generates batch_size=47 for first case and batch_size=56 for the second case. In spite of that, the vectorized version is still faster than the original reference C version without vectorization.) - Before the PR: ``` native_layer_norm 0.81% 5.884ms 0.81% 5.884ms 122.580us NaN 0.000us 0.000us 48 [[47, 1, 1024], [1024], [1024]] ``` - After the PR: ``` native_layer_norm 0.68% 5.053ms 0.68% 5.053ms 105.272us NaN 0.000us 0.000us 48 [[56, 1, 1024], [1024], [1024]] ``` 20 Cores: - Before the PR: ``` native_layer_norm 1.65% 41.682ms 1.65% 41.682ms 868.365us NaN 0.000us 0.000us 48 [[61, 64, 1024], [1024], [1024]] ``` - After the PR: ``` native_layer_norm 1.34% 33.829ms 1.34% 33.829ms 704.771us NaN 0.000us 0.000us 48 [[61, 64, 1024], [1024], [1024]] ``` Differential Revision: [D18293522](https://our.internmc.facebook.com/intern/diff/D18293522/) [ghstack-poisoned]
…ec256 Pull Request resolved: #29104 We would like to provide the vectorized implementation for layer norm. This PR reuses #23349. Differential Revision: [D18293522](https://our.internmc.facebook.com/intern/diff/D18293522/) ghstack-source-id: 95345939
|
This pull request has been merged in d6d6075. |
…29104) Summary: Pull Request resolved: pytorch#29104 We would like to provide the vectorized implementation for layer norm. This PR reuses pytorch#23349. Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- "LayerNorm" buck test mode/dev-nosan //caffe2/test:nn -- "test_LayerNorm_1d_no_elementwise_affine_eval" python run_test.py -i nn -- TestNN.test_LayerNorm_1d_no_elementwise_affine_eval Differential Revision: D18293522 fbshipit-source-id: f4cfed6e62bac1b43ee00c32b495ecc836bd9ec5
Stack from ghstack:
We would like to provide the vectorized implementation for layer norm. This PR reuses #23349.
Single Core:
(Note that our benchmark generates batch_size=47 for first case and batch_size=56 for the second case. In spite of that, the vectorized version is still faster than the original reference C version without vectorization.)
20 Cores:
Differential Revision: D18293522