[ONNX] Support gelu for fp16 export by BowenBao · Pull Request #50487 · pytorch/pytorch

BowenBao · 2021-01-13T18:17:05Z

Need to replace dtype of export created scalars from float to double. (In torch implicit conversion logic, python numbers are double)

Test case skipped in CI due to that current CI job env does not have CUDA support.

facebook-github-bot · 2021-01-13T18:17:22Z

💊 CI failures summary and remediations

As of commit ffb185e (more details on the Dr. CI page):

1/4 failures possibly* introduced in this PR
- 1/1 non-CircleCI failure(s)
3/4 tentatively recognized as flaky ❄️
- Click here to rerun these jobs

❄️ 3 failures tentatively classified as flaky

but reruns have not yet been triggered to confirm:

pytorch_linux_xenial_py3_clang7_onnx_ort_test2 (1/3)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun) ❄️

Jan 14 06:25:49 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_listunpack Fatal Python error: Segmentation fault

Jan 14 06:25:47 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_kldiv_loss PASSED [ 93%]
Jan 14 06:25:47 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_l1_norm PASSED [ 93%]
Jan 14 06:25:47 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_l2_norm PASSED [ 93%]
Jan 14 06:25:47 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_layer_norm PASSED [ 93%]
Jan 14 06:25:47 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_le PASSED [ 93%]
Jan 14 06:25:48 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_le_scalar PASSED [ 93%]
Jan 14 06:25:48 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_len SKIPPED [ 94%]
Jan 14 06:25:48 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_len_list PASSED [ 94%]
Jan 14 06:25:48 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_list SKIPPED [ 94%]
Jan 14 06:25:48 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_list_pass PASSED [ 94%]
Jan 14 06:25:49 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_listunpack Fatal Python error: Segmentation fault
Jan 14 06:25:49 
Jan 14 06:25:49 Thread 0x00007f9f6276e700 (most recent call first):
Jan 14 06:25:49   File "/opt/conda/lib/python3.6/threading.py", line 299 in wait
Jan 14 06:25:49   File "/opt/conda/lib/python3.6/threading.py", line 551 in wait
Jan 14 06:25:49   File "/opt/conda/lib/python3.6/site-packages/tqdm/_monitor.py", line 59 in run
Jan 14 06:25:49   File "/opt/conda/lib/python3.6/threading.py", line 916 in _bootstrap_inner
Jan 14 06:25:49   File "/opt/conda/lib/python3.6/threading.py", line 884 in _bootstrap
Jan 14 06:25:49 
Jan 14 06:25:49 Current thread 0x00007f9f97d69700 (most recent call first):
Jan 14 06:25:49   File "/opt/conda/lib/python3.6/site-packages/_pytest/_io/saferepr.py", line 58 in repr_instance

pytorch_linux_xenial_py3_clang7_onnx_ort_test1 (2/3)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun) ❄️

Jan 14 06:30:49 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_listunpack Fatal Python error: Segmentation fault

Jan 14 06:30:48 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_kldiv_loss PASSED [ 93%]
Jan 14 06:30:48 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_l1_norm PASSED [ 93%]
Jan 14 06:30:48 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_l2_norm PASSED [ 93%]
Jan 14 06:30:48 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_layer_norm PASSED [ 93%]
Jan 14 06:30:48 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_le PASSED [ 93%]
Jan 14 06:30:48 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_le_scalar PASSED [ 93%]
Jan 14 06:30:48 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_len SKIPPED [ 94%]
Jan 14 06:30:48 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_len_list PASSED [ 94%]
Jan 14 06:30:48 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_list SKIPPED [ 94%]
Jan 14 06:30:48 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_list_pass PASSED [ 94%]
Jan 14 06:30:49 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_listunpack Fatal Python error: Segmentation fault
Jan 14 06:30:49 
Jan 14 06:30:49 Thread 0x00007f126cd7c700 (most recent call first):
Jan 14 06:30:49   File "/opt/conda/lib/python3.6/threading.py", line 299 in wait
Jan 14 06:30:49   File "/opt/conda/lib/python3.6/threading.py", line 551 in wait
Jan 14 06:30:49   File "/opt/conda/lib/python3.6/site-packages/tqdm/_monitor.py", line 59 in run
Jan 14 06:30:49   File "/opt/conda/lib/python3.6/threading.py", line 916 in _bootstrap_inner
Jan 14 06:30:49   File "/opt/conda/lib/python3.6/threading.py", line 884 in _bootstrap
Jan 14 06:30:49 
Jan 14 06:30:49 Current thread 0x00007f12a2377700 (most recent call first):
Jan 14 06:30:49   File "/opt/conda/lib/python3.6/site-packages/_pytest/_io/saferepr.py", line 58 in repr_instance

pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2 (3/3)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun) ❄️

Jan 14 08:09:26 RuntimeError: CUDA error: an illegal memory access was encountered

Jan 14 08:09:26                        ~~~~ <--- HERE
Jan 14 08:09:26 RuntimeError: CUDA error: an illegal memory access was encountered
Jan 14 08:09:26 
Jan 14 08:09:26 
Jan 14 08:09:26 ======================================================================
Jan 14 08:09:26 ERROR [0.203s]: test_where_and_typing (__main__.TestTEFuser)
Jan 14 08:09:26 ----------------------------------------------------------------------
Jan 14 08:09:26 Traceback (most recent call last):
Jan 14 08:09:26   File "test_jit_fuser_te.py", line 1142, in test_where_and_typing
Jan 14 08:09:26     x = torch.randn(4, 4, dtype=torch.double, device=device)
Jan 14 08:09:26 RuntimeError: CUDA error: an illegal memory access was encountered
Jan 14 08:09:26 
Jan 14 08:09:26 ======================================================================
Jan 14 08:09:26 ERROR [0.175s]: test_zero_element_tensors_cuda (__main__.TestTEFuser)
Jan 14 08:09:26 ----------------------------------------------------------------------
Jan 14 08:09:26 Traceback (most recent call last):
Jan 14 08:09:26   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 888, in wrapper
Jan 14 08:09:26     method(*args, **kwargs)
Jan 14 08:09:26   File "test_jit_fuser_te.py", line 178, in test_zero_element_tensors_cuda
Jan 14 08:09:26     self._test_zero_element_tensors(device="cuda")
Jan 14 08:09:26   File "test_jit_fuser_te.py", line 174, in _test_zero_element_tensors

ci.pytorch.org: 1 failed

Failed: pr/pytorch-linux-bionic-rocm3.10-py3.6

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

neginraoof · 2021-01-14T17:43:41Z

test/onnx/test_pytorch_onnx_onnxruntime_cuda.py

+
+from test_pytorch_onnx_onnxruntime import TestONNXRuntime
+
+class TestONNXRuntime_cuda(unittest.TestCase):


Thanks for setting this up. Are you going to add a CI job for these test?

I see that this test is already added to CI job accidentally, i'll update to disable it for now.

Need to replace dtype of export created scalars from float to double. (In torch implicit conversion logic, python numbers are double) Test case skipped in CI due to that current CI job env does not have CUDA support. [ghstack-poisoned]

Need to replace dtype of export created scalars from float to double. (In torch implicit conversion logic, python numbers are double) Test case skipped in CI due to that current CI job env does not have CUDA support. ghstack-source-id: d47760e Pull Request resolved: #50911

Need to replace dtype of export created scalars from float to double. (In torch implicit conversion logic, python numbers are double) Test case skipped in CI due to that current CI job env does not have CUDA support. [ghstack-poisoned]

Need to replace dtype of export created scalars from float to double. (In torch implicit conversion logic, python numbers are double) Test case skipped in CI due to that current CI job env does not have CUDA support. ghstack-source-id: 715eec1 Pull Request resolved: #50911

Need to replace dtype of export created scalars from float to double. (In torch implicit conversion logic, python numbers are double) Test case skipped in CI due to that current CI job env does not have CUDA support. Differential Revision: [D26023938](https://our.internmc.facebook.com/intern/diff/D26023938) [ghstack-poisoned]

Need to replace dtype of export created scalars from float to double. (In torch implicit conversion logic, python numbers are double) Test case skipped in CI due to that current CI job env does not have CUDA support. [ghstack-poisoned]

Need to replace dtype of export created scalars from float to double. (In torch implicit conversion logic, python numbers are double) Test case skipped in CI due to that current CI job env does not have CUDA support. Differential Revision: [D26050889](https://our.internmc.facebook.com/intern/diff/D26050889) [ghstack-poisoned]

Summary: Pull Request resolved: #50911 Need to replace dtype of export created scalars from float to double. (In torch implicit conversion logic, python numbers are double) Test case skipped in CI due to that current CI job env does not have CUDA support. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26050889 Pulled By: SplitInfinity fbshipit-source-id: 1fdde23a68d4793e6b9a82840acc213e5c3aa760

Need to replace dtype of export created scalars from float to double. (In torch implicit conversion logic, python numbers are double) Test case skipped in CI due to that current CI job env does not have CUDA support. ghstack-source-id: 1be05b0 Pull Request resolved: pytorch#50911

facebook-github-bot added the cla signed label Jan 13, 2021

pytorchbot added the open source label Jan 13, 2021

spandantiwari force-pushed the onnx_ms_1 branch from 33f26ed to 4db8cb4 Compare January 14, 2021 01:27

BowenBao force-pushed the onnx_ms_1 branch from 4db8cb4 to 5ea9584 Compare January 14, 2021 04:14

BowenBao added 3 commits January 13, 2021 21:13

fix gelu symbolic for mixed_precision

c5c852f

update test for gelu fp16

9918c5c

add new test file for cuda tests

b686cb4

BowenBao force-pushed the onnx_mp_gelu branch from fd6c0cd to b686cb4 Compare January 14, 2021 05:17

remove unused import

ffb185e

neginraoof approved these changes Jan 14, 2021

View reviewed changes

skip ort cuda test in CI

1907d93

BowenBao merged commit c363592 into pytorch:onnx_ms_1 Jan 21, 2021

BowenBao mentioned this pull request Jan 21, 2021

[ONNX] Use parameter values in onnx shape inference (#49706) #50905

Closed

This was referenced Jan 22, 2021

[ONNX] Replace optional parameters of Resize with placeholder for ops13. (#50574) #50954

Closed

[ONNX] Fix param names (#50764) #50955

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ONNX] Support gelu for fp16 export#50487

[ONNX] Support gelu for fp16 export#50487
BowenBao merged 5 commits intopytorch:onnx_ms_1from
BowenBao:onnx_mp_gelu

BowenBao commented Jan 13, 2021

Uh oh!

facebook-github-bot commented Jan 13, 2021 •

edited

Loading

Uh oh!

neginraoof Jan 14, 2021

Uh oh!

BowenBao Jan 15, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		from test_pytorch_onnx_onnxruntime import TestONNXRuntime

		class TestONNXRuntime_cuda(unittest.TestCase):

Conversation

BowenBao commented Jan 13, 2021

Uh oh!

facebook-github-bot commented Jan 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

❄️ 3 failures tentatively classified as flaky

pytorch_linux_xenial_py3_clang7_onnx_ort_test2 (1/3)

pytorch_linux_xenial_py3_clang7_onnx_ort_test1 (2/3)

pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2 (3/3)

ci.pytorch.org: 1 failed

Uh oh!

neginraoof Jan 14, 2021

Choose a reason for hiding this comment

Uh oh!

BowenBao Jan 15, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

facebook-github-bot commented Jan 13, 2021 •

edited

Loading