-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[bert/RoBERTa] Optimize LayerNorm with explicit vectorization using Vec256 (2/2) #29154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ec256 (2/2) We would like to optimize LayerNorm with explicit vectorization using Vec256. This PR handles the special part of using fmadd with AVX256. Differential Revision: [D18307639](https://our.internmc.facebook.com/intern/diff/D18307639/) [ghstack-poisoned]
…ec256 (2/2) We would like to optimize LayerNorm with explicit vectorization using Vec256. This PR handles the special part of using fmadd with AVX256. Differential Revision: [D18307639](https://our.internmc.facebook.com/intern/diff/D18307639/) ghstack-source-id: 93220825 Pull Request resolved: #29154
…ion using Vec256 (2/2)" We would like to optimize LayerNorm with explicit vectorization using Vec256. This PR handles the special part of using fmadd with AVX256. Differential Revision: [D18307639](https://our.internmc.facebook.com/intern/diff/D18307639/) [ghstack-poisoned]
…ec256 (2/2) Pull Request resolved: #29154 We would like to optimize LayerNorm with explicit vectorization using Vec256. This PR handles the special part of using fmadd with AVX256. ghstack-source-id: 93608764 Differential Revision: [D18307639](https://our.internmc.facebook.com/intern/diff/D18307639/)
jamesr66a
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool
| const T bias = -rstd_val * mean_val; | ||
| for (int64_t j = 0; j < N; ++j) { | ||
| for (j = 0; j < N / kVecSize * kVecSize; j += kVecSize) { | ||
| const vec256::Vec256<T> gamma_vec = gamma_null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the emitted code look like with these conditionals?
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
Stack from ghstack:
We would like to optimize LayerNorm with explicit vectorization using Vec256. This PR handles the special part of using fmadd with AVX256.
Differential Revision: D18307639