Skip to content

Conversation

@xiaomengy
Copy link
Contributor

Summary: Add gelu gradient for pytorch

Differential Revision: D15589816

@pytorchbot pytorchbot added module: cpu CPU specific problem (e.g., perf, algorithm) module: cuda Related to torch.cuda, and CUDA support in general module: internals Related to internal abstractions in c10 and ATen module: nn Related to torch.nn module: operators labels Jun 1, 2019
Copy link
Contributor

@soumith soumith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please check inline comment and provide resolution on what's up with the tolerance adjustment. Once you get clarity into that, and verify it's not a bug, do land.

Things reviewed in the diff:

  • MKL and non-MKL implementations match in formula
  • CUDA and CPU implementation match in formula

Things not reviewed in the diff:

  • gradient formula is correct (relying on gradcheck to say it's right)

test/test_nn.py Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks suspicious. gradchecks happen in double precision, so a tolerance of 1e-3 looks really high and a custom eps is usually not needed. Any idea what is going on? Can you check some sample inputs to inspect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part just came from testing the numerical stability for gradchecks. Atol has been removed and now it is using default value.

@pytorchbot pytorchbot added the module: docs Related to our documentation, both in docs/ and docblocks label Jun 2, 2019
xiaomengy added 2 commits June 2, 2019 00:15
Summary: Add gelu activation forward on CPU in pytorch

Differential Revision: D15400974

fbshipit-source-id: 1c59104bea69cbe26ab96921e242131890db657e
Summary:
Pull Request resolved: pytorch#21237

Add gelu gradient for pytorch

Reviewed By: zheng-xq

Differential Revision: D15589816

fbshipit-source-id: 2feb4ed779cda1dec3fe03fcfba29861b4a86d12
@xiaomengy xiaomengy deleted the export-D15589816 branch June 2, 2019 16:46
zdevito pushed a commit to zdevito/ATen that referenced this pull request Jun 2, 2019
Summary:
Pull Request resolved: pytorch/pytorch#21237

Add gelu gradient for pytorch

Reviewed By: zheng-xq

Differential Revision: D15589816

fbshipit-source-id: 76fda7c413afed5b6cc3abe3a26c258d393a53ce
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 31c79b7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: cpu CPU specific problem (e.g., perf, algorithm) module: cuda Related to torch.cuda, and CUDA support in general module: docs Related to our documentation, both in docs/ and docblocks module: internals Related to internal abstractions in c10 and ATen module: nn Related to torch.nn

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants