-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Add torch.nn.GELU for GELU activation #28944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This pull request was exported from Phabricator. Differential Revision: D18240946 |
253e4b3 to
1450498
Compare
|
This pull request was exported from Phabricator. Differential Revision: D18240946 |
|
Link to #28947 |
1450498 to
1a0bd8b
Compare
|
This pull request was exported from Phabricator. Differential Revision: D18240946 |
1a0bd8b to
20d2c33
Compare
|
This pull request was exported from Phabricator. Differential Revision: D18240946 |
20d2c33 to
93c9246
Compare
|
This pull request was exported from Phabricator. Differential Revision: D18240946 |
93c9246 to
dc2112f
Compare
|
This pull request was exported from Phabricator. Differential Revision: D18240946 |
CircleCI build failures summaryAs of commit dc2112f:
Here are the reasons each build failed. This comment was automatically generated by Dr. CI. Please report bugs/suggestions on the GitHub issue tracker. |
dc2112f to
ae1e7c8
Compare
|
This pull request was exported from Phabricator. Differential Revision: D18240946 |
|
I'm happy to see these lines indicating that another kernel might get implemented, too, but I wonder how you are going to implement this as an interface. I've had great results with that implementation as a custom module. Will it be a separate function/module, or rather a parameter setting? |
Maybe we will add it with a parameter in gelu later. Currently the reason we didn't do that is actually that approximation will not provide a better performance compare to current implementation with MKL functions. In my testing, the tanh approximation's performance rely on the performance of tanh. The eigen which is the backend of TensorFlow provides a fast approximation of tanh here. Without that tanh implementation, I think it is not very necessary to add the approximation for gelu here. |
53c2d01 to
f303f02
Compare
|
This pull request was exported from Phabricator. Differential Revision: D18240946 |
f303f02 to
74b6d8a
Compare
|
This pull request was exported from Phabricator. Differential Revision: D18240946 |
1 similar comment
|
This pull request was exported from Phabricator. Differential Revision: D18240946 |
74b6d8a to
c97bc85
Compare
c97bc85 to
49f826b
Compare
|
This pull request was exported from Phabricator. Differential Revision: D18240946 |
Summary: Pull Request resolved: pytorch#28944 Add torch.nn.GELU for GELU activation Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- "GELU" Reviewed By: hl475, houseroad Differential Revision: D18240946 fbshipit-source-id: 708c41d2f328bdf137fb8b0c533a977725daab41
49f826b to
5a7e331
Compare
|
This pull request was exported from Phabricator. Differential Revision: D18240946 |
|
This pull request has been merged in 2460dce. |
Summary: Pull Request resolved: pytorch/pytorch#28944 Add torch.nn.GELU for GELU activation Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- "GELU" Reviewed By: hl475, houseroad Differential Revision: D18240946 fbshipit-source-id: 6284b30def9bd4c12bf7fb2ed08b1b2f0310bb78
|
Is this released on torch 1.3.1 because it wasn't fixed in that version. |
|
@calclavia This PR is not in PyTorch 1.3.1. It will be in our upcoming release 1.4.0. |
Summary:
Differential Revision: D18240946