Skip to content

Conversation

@xiaomengy
Copy link
Contributor

@xiaomengy xiaomengy commented Oct 31, 2019

Summary:

  1. Add torch.nn.GELU for GELU activation
  2. Fix fp16 failures for torch.nn.functional.gelu on CUDA
  3. Add infinitely differentiable backward implementation to support higher order gradient.
  4. Change gelu_cpu kernel to use TensorIterator.

Differential Revision: D18240946

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D18240946

@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Oct 31, 2019
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D18240946

@xiaomengy
Copy link
Contributor Author

xiaomengy commented Oct 31, 2019

Link to #28947

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D18240946

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D18240946

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D18240946

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D18240946

@kostmo
Copy link
Member

kostmo commented Oct 31, 2019

CircleCI build failures summary

As of commit dc2112f:

  • 0/1 flaky

Here are the reasons each build failed.


This comment was automatically generated by Dr. CI.
Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D18240946

@BramVanroy
Copy link

BramVanroy commented Nov 3, 2019

I'm happy to see these lines indicating that another kernel might get implemented, too, but I wonder how you are going to implement this as an interface. I've had great results with that implementation as a custom module.

https://github.com/pytorch/pytorch/blob/53c2d01ebc0e40c8982ccd2e9cb4113e1d0b5bc1/aten/src/ATen/native/cpu/Activation.cpp#L138-L140

Will it be a separate function/module, or rather a parameter setting?

@xiaomengy
Copy link
Contributor Author

I'm happy to see these lines indicating that another kernel might get implemented, too, but I wonder how you are going to implement this as an interface. I've had great results with that implementation as a custom module.

https://github.com/pytorch/pytorch/blob/53c2d01ebc0e40c8982ccd2e9cb4113e1d0b5bc1/aten/src/ATen/native/cpu/Activation.cpp#L138-L140

Will it be a separate function/module, or rather a parameter setting?

Maybe we will add it with a parameter in gelu later. Currently the reason we didn't do that is actually that approximation will not provide a better performance compare to current implementation with MKL functions. In my testing, the tanh approximation's performance rely on the performance of tanh. The eigen which is the backend of TensorFlow provides a fast approximation of tanh here. Without that tanh implementation, I think it is not very necessary to add the approximation for gelu here.
https://github.com/eigenteam/eigen-git-mirror/blob/8e409c71423fc070239cad72afdbd973727b5e0f/Eigen/src/Core/MathFunctionsImpl.h#L26

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D18240946

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D18240946

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D18240946

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D18240946

Summary:
Pull Request resolved: pytorch#28944

Add torch.nn.GELU for GELU activation

Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- "GELU"

Reviewed By: hl475, houseroad

Differential Revision: D18240946

fbshipit-source-id: 708c41d2f328bdf137fb8b0c533a977725daab41
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D18240946

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 2460dce.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Nov 4, 2019
Summary:
Pull Request resolved: pytorch/pytorch#28944

Add torch.nn.GELU for GELU activation

Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- "GELU"

Reviewed By: hl475, houseroad

Differential Revision: D18240946

fbshipit-source-id: 6284b30def9bd4c12bf7fb2ed08b1b2f0310bb78
@xiaomengy xiaomengy deleted the export-D18240946 branch November 4, 2019 19:01
@calclavia
Copy link

Is this released on torch 1.3.1 because it wasn't fixed in that version.

@yf225
Copy link
Contributor

yf225 commented Nov 26, 2019

@calclavia This PR is not in PyTorch 1.3.1. It will be in our upcoming release 1.4.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants