Supports infinite symbolic diff for gelu #81725

yueyericardo · 2022-07-19T20:41:17Z

As per title

gelu_backward code taken from torch/_decomp/decompositions.py

Lines 199 to 226 in 80231d0

    
           def gelu_backward(grad: Tensor, self: Tensor, approximate: str = "none"): 
        
               M_SQRT2 = 1.41421356237309504880 
        
               M_SQRT1_2 = 0.70710678118654752440 
        
               M_2_SQRTPI = 1.12837916709551257390 
        
               if approximate == "tanh": 
        
                   kBeta = M_SQRT2 * M_2_SQRTPI * 0.5 
        
                   kKappa = 0.044715 
        
                   x_sq = self * self 
        
                   x_cube = x_sq * self 
        
                   inner = kBeta * (self + kKappa * x_cube) 
        
                   tanh_inner = torch.tanh(inner) 
        
                   left = 0.5 * self 
        
                   right = 1 + tanh_inner 
        
                   left_derivative = 0.5 * right 
        
                   tanh_derivative = 1 - tanh_inner * tanh_inner 
        
                   inner_derivative = kBeta * (1 + 3 * kKappa * x_sq) 
        
                   right_derivative = left * tanh_derivative * inner_derivative 
        
                   return grad * (left_derivative + right_derivative) 
        
               else: 
        
                   kAlpha = M_SQRT1_2 
        
                   kBeta = M_2_SQRTPI * M_SQRT1_2 * 0.5 
        
                   cdf = 0.5 * (1 + torch.erf(self * kAlpha)) 
        
                   pdf = kBeta * torch.exp(self * self * -0.5) 
        
                   return grad * (cdf + self * pdf)

facebook-github-bot · 2022-07-19T20:41:23Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/81725
📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓Need help or want to give feedback on the CI? Visit our office hours

❌ 1 New Failures

As of commit 32c130f (more details on the Dr. CI page):

Expand to see more

1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages

pull / linux-bionic-cuda11.6-py3.7-gcc7 / test (default, 3, 4, linux.4xlarge.nvidia.gpu) (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-07-19T22:21:47.4103160Z RuntimeError: test_jit_cuda_fuser failed!

2022-07-19T22:21:44.8662654Z Generating XML reports...
2022-07-19T22:21:44.8900579Z Generated XML report: test-reports/python-unittest/test_jit_cuda_fuser/TEST-TestCudaFuser-20220719221958.xml
2022-07-19T22:21:44.8908628Z Generated XML report: test-reports/python-unittest/test_jit_cuda_fuser/TEST-TestEnableDisableCudaFuser-20220719221958.xml
2022-07-19T22:21:44.8912437Z Generated XML report: test-reports/python-unittest/test_jit_cuda_fuser/TEST-jit.test_fuser_common.TestFuserCommon-20220719221958.xml
2022-07-19T22:21:46.1876346Z Generated XML report: test-reports/python-unittest/test_jit_cuda_fuser/TEST-TestCudaFuserOpInfoCUDA-20220719221958.xml
2022-07-19T22:21:47.4096653Z Traceback (most recent call last):
2022-07-19T22:21:47.4097037Z   File "test/run_test.py", line 940, in <module>
2022-07-19T22:21:47.4099864Z     main()
2022-07-19T22:21:47.4100271Z   File "test/run_test.py", line 918, in main
2022-07-19T22:21:47.4102725Z     raise RuntimeError(err_message)
2022-07-19T22:21:47.4103160Z RuntimeError: test_jit_cuda_fuser failed!
2022-07-19T22:21:48.1331303Z 
2022-07-19T22:21:48.1331830Z real	51m35.227s
2022-07-19T22:21:48.1332130Z user	63m4.999s
2022-07-19T22:21:48.1332363Z sys	2m44.731s
2022-07-19T22:21:48.1377105Z ##[error]Process completed with exit code 1.
2022-07-19T22:21:48.1414218Z Prepare all required actions
2022-07-19T22:21:48.1414643Z Getting action download info
2022-07-19T22:21:48.3062537Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-07-19T22:21:48.3062840Z with:
2022-07-19T22:21:48.3063277Z   github-token: ***

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

davidberard98 · 2022-07-20T02:38:01Z

@yueyericardo Thanks for the PR. Can you explain the motivation for this change? (i.e. why do we prefer writing out the decomposition instead of calling gelu_backward directly?)

yueyericardo · 2022-07-20T06:08:55Z

@davidberard98 Thanks for the quick reply!
The benefit of writing decomposition in symbolic_script.cpp is that it could generate fused high-order derivative graph.

However, if we use gelu_backward directly and we would like to compute the higher order derivatives, autodiff will fall back to using autograd for the node because there isn't a gradient formula for it. In this case, gelu backward will use this implementation:

pytorch/aten/src/ATen/native/Activation.cpp

Lines 737 to 744 in 2c0b11b

    
           Tensor infinitely_differentiable_gelu_backward( 
        
               const Tensor& grad, 
        
               const Tensor& self) { 
        
             constexpr double kAlpha = M_2_SQRTPI * M_SQRT1_2 * 0.5; 
        
             Tensor cdf = (1.0 + (self * M_SQRT1_2).erf_()).mul_(0.5); 
        
             Tensor pdf = (-0.5 * self * self).exp_(); 
        
             return cdf.addcmul_(self, pdf, kAlpha).mul_(grad); 
        
           }

And it will not be fused and create much more kernels for derivatives!

yueyericardo · 2022-07-20T06:13:58Z

Thanks for the help from @zasdfgbnm on debugging the failed unittest.

@jjsjann123, The failed test here is from:

pytorch/torch/csrc/jit/codegen/cuda/type_inference.cpp

Line 33 in 2c0b11b

void copyScalarTypeAndDeviceToOutput(

And it could fixed by: csarofeen#1848, could you help cherry-pick this commit into upstream?

yueyericardo · 2022-07-20T06:16:14Z

@davidberard98 @jjsjann123
BTW, I have a similar PR that adding support for SiLU. Could you also take a look of this #81724?

jjsjann123 · 2022-07-20T15:23:32Z

Thanks for the help from @zasdfgbnm on debugging the failed unittest.

@jjsjann123, The failed test here is from:

pytorch/torch/csrc/jit/codegen/cuda/type_inference.cpp

Line 33 in 2c0b11b

void copyScalarTypeAndDeviceToOutput(

And it could fixed by: csarofeen#1848, could you help cherry-pick this commit into upstream?

Sure thing. will push that to upstream directly.

jjsjann123 · 2022-07-20T15:32:37Z

torch/csrc/jit/runtime/symbolic_script.cpp

+                M_SQRT2 = 1.41421356237309504880
+                M_SQRT1_2 = 0.70710678118654752440
+                M_2_SQRTPI = 1.12837916709551257390
+                if approximate == "tanh":


I get nervous whenever I see control flow. In the case for approximate, torch.nn.modules.GELU has approximate annotated as __constant__ so that should be fine.
But using gelu api directly with approximate as an input would stop fusion of gelu to things outside of the block. Fusing gelu_backward with neighboring ops is important.

I know this is harder, but could we instead inject the infinitely differentiable primitive implementation at gelu_backward instead? Basically adding an entry here for gelu_backward while keeping the entry for gelu unchanged.

cc'ing @kevinstephano

Let me know if I understand correctly. So I need an entry here for gelu_backward and also define the gelu_double_backward?

I found an implementation of gelu_double_backward from

pytorch/torch/csrc/autograd/FunctionsManual.cpp

Lines 3026 to 3030 in 84c8a9f

Tensor gelu_double_backward(

const Tensor& ggI,

const Tensor& gO,

const Tensor& input,

c10::string_view approximate) {

And adding it to symbolic_script.cpp: yueyericardo@88a6b37
But I found that

If I call torch.autograd.grad() to calculate derivatives: it calls GeluBackwardCUDAKernelImpl for derivatives instead of the fused kernel.

If I run out.backward(): it will have the fused kernel for backward, but it matches to the gelu_backward schema in torch/csrc/jit/codegen/cuda/parser.cpp, instead of the entry defined at symbolic_script.cpp

pytorch/torch/csrc/jit/codegen/cuda/parser.cpp

Lines 4095 to 4098 in 8d753c8

static auto gelu_backward_schema =

getOperatorForLiteral(

"aten::gelu_backward(Tensor grad_output, Tensor self, *, str approximate='none') -> Tensor")

->schema();

yueyericardo · 2022-07-20T17:49:18Z

@jjsjann123 Thanks for the quick reply!

jjsjann123 · 2022-07-20T18:17:24Z

Started the cherry-pick to unblock your PR. #81792

Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/ Code changes includes: - codegen improvements: 1. removes un-necessary sync from redundant thread compute analysis 2. symmetric API for BestEffortReplay 3. support merge on trivial reductions 4. Ampere async copy improvements - bug fixes: 1. vectorization bug fixes 2. type inference patch : fixes upstream #81725 3. segmenter bug fix with deterministic iteration ordering - parser update 1. added leaky_relu - scheduler 1. normalization scheduler clean up. 2. simplifies matmul scheduling with new transform propagator 3. merge all dimensions in PW scheduler 4. various gemm related improvements - debuggability 1. nsight compute support 2. debug dump for InlinePropagator 3. Add `UnaryOpType::Print` Squashed commits to WAR github API Commits that's actually in this PR from the devel branch: ``` dfe02f3 Merge remote-tracking branch 'csarofeen/devel' into HEAD 1617373 Add `TensorViewBuilder::shape(std::vector<Val*> shape)` (#1884) 7cfb779 Merge pull request #1887 from csarofeen/upstream_merge_0803 3399f6d Merge remote-tracking branch 'origin/viable/strict' into HEAD 01208f5 Add `UnaryOpType::Print` which can be helpful for debugging (#1878) 0646522 Remove redundant TORCH_INTERNAL_ASSERT in lower_magic_zero.cpp (#1881) 7bc76aa Fix most inlined propagator for mismatched dims (#1875) 501f4aa Nonaffine swizzle formulation ep.2: Loop swizzle variant. (#1826) d863d69 Ampere async copy ep.2: circular buffering extension to support pipelined matmul operand load (#1827) e0ae11a Larger sized mma instructions to support full vectorization (#1824) 9bb4cf7 fragment iteration to support fully unrolled mma ops (#1823) a48270a Merge all dims in pointwise scheduler (#1872) 172fb36 Make MostInlined and BestEffort inline propagation no longer assert replayed (#1868) a64462a Allow trivial reduction to be merged (#1871) 440102b Symmetric API for BestEffortReplay (#1870) d1caf33 Some misc cleanups/refactor split out from #1854 (#1867) 1013eda Remove some welford specific logic. (#1864) 51589d3 Some cleanups on tests and heuristics params (#1866) a6b3e70 Segmenter bug fix, and deterministic iteration ordering. (#1865) 1b665b9 Add nullptr checks to IrBuilder (#1861) 1cd9451 Simplify matmul scheduling with the new transform propagator. (#1817) bbc1fb9 Add leaky_relu operation (#1852) e842a9b Minor cleanup in pointwise scheduler (#1858) 9ee850c Fix stringstream usage (#1857) 20a36c1 Improve nsight compute support (#1855) 4059103 Remove debugging `true ||` from getPointwiseHeuristics (#1822) 01117bf Misc cleanup (#1853) 5cc6494 Apply the magic-zero protection to each indexed domain individually for predicate indexing (#1846) 92e6f02 Cleanup normalization scheduler (#1845) db89c65 Type inference patch (#1848) 102fe93 Add debug dump for InlinePropagator (#1847) b7a4d93 Redundant thread compute analysis to avoid un-necessary sync insertion (#1687) 942be5b Upstream ci build fixes (#1842) 0b83645 Fix vectorization bug introduced in #1831 (#1840) 63630f1 Move MaxProducerPosUpdater into InlinePropagator::tearDown (#1825) 9135a96 Fix transpose benchmark dtype (#1839) 2c9a6c0 Add extra configurability to `parallelizeAllLike` (#1831) ``` RUN_TORCHBENCH: nvfuser ghstack-source-id: 3745722 Pull Request resolved: #83067

Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/ Code changes includes: - codegen improvements: 1. removes un-necessary sync from redundant thread compute analysis 2. symmetric API for BestEffortReplay 3. support merge on trivial reductions 4. Ampere async copy improvements - bug fixes: 1. vectorization bug fixes 2. type inference patch : fixes upstream #81725 3. segmenter bug fix with deterministic iteration ordering - parser update 1. added leaky_relu - scheduler 1. normalization scheduler clean up. 2. simplifies matmul scheduling with new transform propagator 3. merge all dimensions in PW scheduler 4. various gemm related improvements - debuggability 1. nsight compute support 2. debug dump for InlinePropagator 3. Add `UnaryOpType::Print` Squashed commits to WAR github API Commits that's actually in this PR from the devel branch: ``` dfe02f3 Merge remote-tracking branch 'csarofeen/devel' into HEAD 1617373 Add `TensorViewBuilder::shape(std::vector<Val*> shape)` (#1884) 7cfb779 Merge pull request #1887 from csarofeen/upstream_merge_0803 3399f6d Merge remote-tracking branch 'origin/viable/strict' into HEAD 01208f5 Add `UnaryOpType::Print` which can be helpful for debugging (#1878) 0646522 Remove redundant TORCH_INTERNAL_ASSERT in lower_magic_zero.cpp (#1881) 7bc76aa Fix most inlined propagator for mismatched dims (#1875) 501f4aa Nonaffine swizzle formulation ep.2: Loop swizzle variant. (#1826) d863d69 Ampere async copy ep.2: circular buffering extension to support pipelined matmul operand load (#1827) e0ae11a Larger sized mma instructions to support full vectorization (#1824) 9bb4cf7 fragment iteration to support fully unrolled mma ops (#1823) a48270a Merge all dims in pointwise scheduler (#1872) 172fb36 Make MostInlined and BestEffort inline propagation no longer assert replayed (#1868) a64462a Allow trivial reduction to be merged (#1871) 440102b Symmetric API for BestEffortReplay (#1870) d1caf33 Some misc cleanups/refactor split out from #1854 (#1867) 1013eda Remove some welford specific logic. (#1864) 51589d3 Some cleanups on tests and heuristics params (#1866) a6b3e70 Segmenter bug fix, and deterministic iteration ordering. (#1865) 1b665b9 Add nullptr checks to IrBuilder (#1861) 1cd9451 Simplify matmul scheduling with the new transform propagator. (#1817) bbc1fb9 Add leaky_relu operation (#1852) e842a9b Minor cleanup in pointwise scheduler (#1858) 9ee850c Fix stringstream usage (#1857) 20a36c1 Improve nsight compute support (#1855) 4059103 Remove debugging `true ||` from getPointwiseHeuristics (#1822) 01117bf Misc cleanup (#1853) 5cc6494 Apply the magic-zero protection to each indexed domain individually for predicate indexing (#1846) 92e6f02 Cleanup normalization scheduler (#1845) db89c65 Type inference patch (#1848) 102fe93 Add debug dump for InlinePropagator (#1847) b7a4d93 Redundant thread compute analysis to avoid un-necessary sync insertion (#1687) 942be5b Upstream ci build fixes (#1842) 0b83645 Fix vectorization bug introduced in #1831 (#1840) 63630f1 Move MaxProducerPosUpdater into InlinePropagator::tearDown (#1825) 9135a96 Fix transpose benchmark dtype (#1839) 2c9a6c0 Add extra configurability to `parallelizeAllLike` (#1831) ``` RUN_TORCHBENCH: nvfuser [ghstack-poisoned]

Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/ Code changes includes: - codegen improvements: 1. removes un-necessary sync from redundant thread compute analysis 2. symmetric API for BestEffortReplay 3. support merge on trivial reductions 4. Ampere async copy improvements - bug fixes: 1. vectorization bug fixes 2. type inference patch : fixes upstream #81725 3. segmenter bug fix with deterministic iteration ordering - parser update 1. added leaky_relu - scheduler 1. normalization scheduler clean up. 2. simplifies matmul scheduling with new transform propagator 3. merge all dimensions in PW scheduler 4. various gemm related improvements - debuggability 1. nsight compute support 2. debug dump for InlinePropagator 3. Add `UnaryOpType::Print` Squashed commits to WAR github API Commits that's actually in this PR from the devel branch: ``` dfe02f3 Merge remote-tracking branch 'csarofeen/devel' into HEAD 1617373 Add `TensorViewBuilder::shape(std::vector<Val*> shape)` (#1884) 7cfb779 Merge pull request #1887 from csarofeen/upstream_merge_0803 3399f6d Merge remote-tracking branch 'origin/viable/strict' into HEAD 01208f5 Add `UnaryOpType::Print` which can be helpful for debugging (#1878) 0646522 Remove redundant TORCH_INTERNAL_ASSERT in lower_magic_zero.cpp (#1881) 7bc76aa Fix most inlined propagator for mismatched dims (#1875) 501f4aa Nonaffine swizzle formulation ep.2: Loop swizzle variant. (#1826) d863d69 Ampere async copy ep.2: circular buffering extension to support pipelined matmul operand load (#1827) e0ae11a Larger sized mma instructions to support full vectorization (#1824) 9bb4cf7 fragment iteration to support fully unrolled mma ops (#1823) a48270a Merge all dims in pointwise scheduler (#1872) 172fb36 Make MostInlined and BestEffort inline propagation no longer assert replayed (#1868) a64462a Allow trivial reduction to be merged (#1871) 440102b Symmetric API for BestEffortReplay (#1870) d1caf33 Some misc cleanups/refactor split out from #1854 (#1867) 1013eda Remove some welford specific logic. (#1864) 51589d3 Some cleanups on tests and heuristics params (#1866) a6b3e70 Segmenter bug fix, and deterministic iteration ordering. (#1865) 1b665b9 Add nullptr checks to IrBuilder (#1861) 1cd9451 Simplify matmul scheduling with the new transform propagator. (#1817) bbc1fb9 Add leaky_relu operation (#1852) e842a9b Minor cleanup in pointwise scheduler (#1858) 9ee850c Fix stringstream usage (#1857) 20a36c1 Improve nsight compute support (#1855) 4059103 Remove debugging `true ||` from getPointwiseHeuristics (#1822) 01117bf Misc cleanup (#1853) 5cc6494 Apply the magic-zero protection to each indexed domain individually for predicate indexing (#1846) 92e6f02 Cleanup normalization scheduler (#1845) db89c65 Type inference patch (#1848) 102fe93 Add debug dump for InlinePropagator (#1847) b7a4d93 Redundant thread compute analysis to avoid un-necessary sync insertion (#1687) 942be5b Upstream ci build fixes (#1842) 0b83645 Fix vectorization bug introduced in #1831 (#1840) 63630f1 Move MaxProducerPosUpdater into InlinePropagator::tearDown (#1825) 9135a96 Fix transpose benchmark dtype (#1839) 2c9a6c0 Add extra configurability to `parallelizeAllLike` (#1831) ``` RUN_TORCHBENCH: nvfuser Differential Revision: [D38543000](https://our.internmc.facebook.com/intern/diff/D38543000) Pull Request resolved: #83067 Approved by: https://github.com/davidberard98

Summary: Pull Request resolved: #83067 Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/ Code changes includes: - codegen improvements: 1. removes un-necessary sync from redundant thread compute analysis 2. symmetric API for BestEffortReplay 3. support merge on trivial reductions 4. Ampere async copy improvements - bug fixes: 1. vectorization bug fixes 2. type inference patch : fixes upstream #81725 3. segmenter bug fix with deterministic iteration ordering - parser update 1. added leaky_relu - scheduler 1. normalization scheduler clean up. 2. simplifies matmul scheduling with new transform propagator 3. merge all dimensions in PW scheduler 4. various gemm related improvements - debuggability 1. nsight compute support 2. debug dump for InlinePropagator 3. Add `UnaryOpType::Print` Squashed commits to WAR github API Commits that's actually in this PR from the devel branch: ``` dfe02f3 Merge remote-tracking branch 'csarofeen/devel' into HEAD 1617373 Add `TensorViewBuilder::shape(std::vector<Val*> shape)` (#1884) 7cfb779 Merge pull request #1887 from csarofeen/upstream_merge_0803 3399f6d Merge remote-tracking branch 'origin/viable/strict' into HEAD 01208f5 Add `UnaryOpType::Print` which can be helpful for debugging (#1878) 0646522 Remove redundant TORCH_INTERNAL_ASSERT in lower_magic_zero.cpp (#1881) 7bc76aa Fix most inlined propagator for mismatched dims (#1875) 501f4aa Nonaffine swizzle formulation ep.2: Loop swizzle variant. (#1826) d863d69 Ampere async copy ep.2: circular buffering extension to support pipelined matmul operand load (#1827) e0ae11a Larger sized mma instructions to support full vectorization (#1824) 9bb4cf7 fragment iteration to support fully unrolled mma ops (#1823) a48270a Merge all dims in pointwise scheduler (#1872) 172fb36 Make MostInlined and BestEffort inline propagation no longer assert replayed (#1868) a64462a Allow trivial reduction to be merged (#1871) 440102b Symmetric API for BestEffortReplay (#1870) d1caf33 Some misc cleanups/refactor split out from #1854 (#1867) 1013eda Remove some welford specific logic. (#1864) 51589d3 Some cleanups on tests and heuristics params (#1866) a6b3e70 Segmenter bug fix, and deterministic iteration ordering. (#1865) 1b665b9 Add nullptr checks to IrBuilder (#1861) 1cd9451 Simplify matmul scheduling with the new transform propagator. (#1817) bbc1fb9 Add leaky_relu operation (#1852) e842a9b Minor cleanup in pointwise scheduler (#1858) 9ee850c Fix stringstream usage (#1857) 20a36c1 Improve nsight compute support (#1855) 4059103 Remove debugging `true ||` from getPointwiseHeuristics (#1822) 01117bf Misc cleanup (#1853) 5cc6494 Apply the magic-zero protection to each indexed domain individually for predicate indexing (#1846) 92e6f02 Cleanup normalization scheduler (#1845) db89c65 Type inference patch (#1848) 102fe93 Add debug dump for InlinePropagator (#1847) b7a4d93 Redundant thread compute analysis to avoid un-necessary sync insertion (#1687) 942be5b Upstream ci build fixes (#1842) 0b83645 Fix vectorization bug introduced in #1831 (#1840) 63630f1 Move MaxProducerPosUpdater into InlinePropagator::tearDown (#1825) 9135a96 Fix transpose benchmark dtype (#1839) 2c9a6c0 Add extra configurability to `parallelizeAllLike` (#1831) ``` RUN_TORCHBENCH: nvfuser Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D38543000 Pulled By: davidberard98 fbshipit-source-id: 752edbfbced14fe01b84e417f23cc941b2148842

github-actions · 2022-09-18T18:40:25Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

facebook-github-bot · 2022-10-04T00:52:00Z

/easycla

As part of the transition to the PyTorch Foundation, this project now requires contributions be covered under the new CLA. See #85559 for additional details.

This comment will trigger a new check of this PR. If you are already covered, you will simply see a new "EasyCLA" check that passes. If you are not covered, a bot will leave a new comment with a link to sign.

linux-foundation-easycla · 2022-10-04T00:52:04Z

The committers listed above are authorized under a signed CLA.

✅ login: yueyericardo / name: Jinze (Richard) Xue (32c130f)

Supports infinite symbolic diff for gelu

32c130f

facebook-github-bot added the cla signed label Jul 19, 2022

facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Jul 19, 2022

pytorchbot added the open source label Jul 19, 2022

bdhirsh requested a review from davidberard98 July 20, 2022 02:05

bdhirsh added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 20, 2022

jjsjann123 reviewed Jul 20, 2022

View reviewed changes

jjsjann123 mentioned this pull request Jul 20, 2022

Type inference patch (#1848) #81792

Closed

jjsjann123 mentioned this pull request Aug 9, 2022

[NVFuser] Upstream push 0809 #83067

Closed

github-actions bot added the Stale label Sep 18, 2022

github-actions bot closed this Nov 3, 2022

	def gelu_backward(grad: Tensor, self: Tensor, approximate: str = "none"):
	M_SQRT2 = 1.41421356237309504880
	M_SQRT1_2 = 0.70710678118654752440
	M_2_SQRTPI = 1.12837916709551257390
	if approximate == "tanh":
	kBeta = M_SQRT2 * M_2_SQRTPI * 0.5
	kKappa = 0.044715
	x_sq = self * self
	x_cube = x_sq * self
	inner = kBeta * (self + kKappa * x_cube)
	tanh_inner = torch.tanh(inner)

	left = 0.5 * self
	right = 1 + tanh_inner

	left_derivative = 0.5 * right

	tanh_derivative = 1 - tanh_inner * tanh_inner
	inner_derivative = kBeta * (1 + 3 * kKappa * x_sq)
	right_derivative = left * tanh_derivative * inner_derivative

	return grad * (left_derivative + right_derivative)
	else:
	kAlpha = M_SQRT1_2
	kBeta = M_2_SQRTPI * M_SQRT1_2 * 0.5
	cdf = 0.5 * (1 + torch.erf(self * kAlpha))
	pdf = kBeta * torch.exp(self * self * -0.5)
	return grad * (cdf + self * pdf)

	Tensor gelu_double_backward(
	const Tensor& ggI,
	const Tensor& gO,
	const Tensor& input,
	c10::string_view approximate) {

	static auto gelu_backward_schema =
	getOperatorForLiteral(
	"aten::gelu_backward(Tensor grad_output, Tensor self, *, str approximate='none') -> Tensor")
	->schema();

Supports infinite symbolic diff for gelu #81725

Supports infinite symbolic diff for gelu #81725

Uh oh!

Conversation

yueyericardo commented Jul 19, 2022

Uh oh!

facebook-github-bot commented Jul 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

❌ 1 New Failures

🕵️ 1 new failure recognized by patterns

pull / linux-bionic-cuda11.6-py3.7-gcc7 / test (default, 3, 4, linux.4xlarge.nvidia.gpu) (1/1)

Uh oh!

davidberard98 commented Jul 20, 2022

Uh oh!

yueyericardo commented Jul 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yueyericardo commented Jul 20, 2022

Uh oh!

yueyericardo commented Jul 20, 2022

Uh oh!

jjsjann123 commented Jul 20, 2022

Uh oh!

jjsjann123 Jul 20, 2022

Choose a reason for hiding this comment

Uh oh!

yueyericardo Jul 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yueyericardo commented Jul 20, 2022

Uh oh!

jjsjann123 commented Jul 20, 2022

Uh oh!

github-actions bot commented Sep 18, 2022

Uh oh!

facebook-github-bot commented Oct 4, 2022

Uh oh!

linux-foundation-easycla bot commented Oct 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

facebook-github-bot commented Jul 19, 2022 •

edited

Loading

yueyericardo commented Jul 20, 2022 •

edited

Loading

yueyericardo Jul 20, 2022 •

edited

Loading

linux-foundation-easycla bot commented Oct 4, 2022 •

edited

Loading