Skip to content

[ONNX] Support gelu for fp16 export#50487

Merged
BowenBao merged 5 commits intopytorch:onnx_ms_1from
BowenBao:onnx_mp_gelu
Jan 21, 2021
Merged

[ONNX] Support gelu for fp16 export#50487
BowenBao merged 5 commits intopytorch:onnx_ms_1from
BowenBao:onnx_mp_gelu

Conversation

@BowenBao
Copy link
Copy Markdown
Collaborator

Need to replace dtype of export created scalars from float to double. (In torch implicit conversion logic, python numbers are double)

Test case skipped in CI due to that current CI job env does not have CUDA support.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

facebook-github-bot commented Jan 13, 2021

💊 CI failures summary and remediations

As of commit ffb185e (more details on the Dr. CI page):



❄️ 3 failures tentatively classified as flaky

but reruns have not yet been triggered to confirm:

See CircleCI build pytorch_linux_xenial_py3_clang7_onnx_ort_test2 (1/3)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun) ❄️

Jan 14 06:25:49 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_listunpack Fatal Python error: Segmentation fault
Jan 14 06:25:47 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_kldiv_loss PASSED [ 93%]
Jan 14 06:25:47 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_l1_norm PASSED [ 93%]
Jan 14 06:25:47 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_l2_norm PASSED [ 93%]
Jan 14 06:25:47 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_layer_norm PASSED [ 93%]
Jan 14 06:25:47 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_le PASSED [ 93%]
Jan 14 06:25:48 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_le_scalar PASSED [ 93%]
Jan 14 06:25:48 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_len SKIPPED [ 94%]
Jan 14 06:25:48 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_len_list PASSED [ 94%]
Jan 14 06:25:48 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_list SKIPPED [ 94%]
Jan 14 06:25:48 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_list_pass PASSED [ 94%]
Jan 14 06:25:49 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_listunpack Fatal Python error: Segmentation fault
Jan 14 06:25:49 
Jan 14 06:25:49 Thread 0x00007f9f6276e700 (most recent call first):
Jan 14 06:25:49   File "/opt/conda/lib/python3.6/threading.py", line 299 in wait
Jan 14 06:25:49   File "/opt/conda/lib/python3.6/threading.py", line 551 in wait
Jan 14 06:25:49   File "/opt/conda/lib/python3.6/site-packages/tqdm/_monitor.py", line 59 in run
Jan 14 06:25:49   File "/opt/conda/lib/python3.6/threading.py", line 916 in _bootstrap_inner
Jan 14 06:25:49   File "/opt/conda/lib/python3.6/threading.py", line 884 in _bootstrap
Jan 14 06:25:49 
Jan 14 06:25:49 Current thread 0x00007f9f97d69700 (most recent call first):
Jan 14 06:25:49   File "/opt/conda/lib/python3.6/site-packages/_pytest/_io/saferepr.py", line 58 in repr_instance

See CircleCI build pytorch_linux_xenial_py3_clang7_onnx_ort_test1 (2/3)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun) ❄️

Jan 14 06:30:49 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_listunpack Fatal Python error: Segmentation fault
Jan 14 06:30:48 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_kldiv_loss PASSED [ 93%]
Jan 14 06:30:48 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_l1_norm PASSED [ 93%]
Jan 14 06:30:48 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_l2_norm PASSED [ 93%]
Jan 14 06:30:48 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_layer_norm PASSED [ 93%]
Jan 14 06:30:48 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_le PASSED [ 93%]
Jan 14 06:30:48 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_le_scalar PASSED [ 93%]
Jan 14 06:30:48 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_len SKIPPED [ 94%]
Jan 14 06:30:48 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_len_list PASSED [ 94%]
Jan 14 06:30:48 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_list SKIPPED [ 94%]
Jan 14 06:30:48 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_list_pass PASSED [ 94%]
Jan 14 06:30:49 test/onnx/test_pytorch_onnx_onnxruntime_cuda.py::TestONNXRuntime::test_listunpack Fatal Python error: Segmentation fault
Jan 14 06:30:49 
Jan 14 06:30:49 Thread 0x00007f126cd7c700 (most recent call first):
Jan 14 06:30:49   File "/opt/conda/lib/python3.6/threading.py", line 299 in wait
Jan 14 06:30:49   File "/opt/conda/lib/python3.6/threading.py", line 551 in wait
Jan 14 06:30:49   File "/opt/conda/lib/python3.6/site-packages/tqdm/_monitor.py", line 59 in run
Jan 14 06:30:49   File "/opt/conda/lib/python3.6/threading.py", line 916 in _bootstrap_inner
Jan 14 06:30:49   File "/opt/conda/lib/python3.6/threading.py", line 884 in _bootstrap
Jan 14 06:30:49 
Jan 14 06:30:49 Current thread 0x00007f12a2377700 (most recent call first):
Jan 14 06:30:49   File "/opt/conda/lib/python3.6/site-packages/_pytest/_io/saferepr.py", line 58 in repr_instance

See CircleCI build pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2 (3/3)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun) ❄️

Jan 14 08:09:26 RuntimeError: CUDA error: an illegal memory access was encountered
Jan 14 08:09:26                        ~~~~ <--- HERE
Jan 14 08:09:26 RuntimeError: CUDA error: an illegal memory access was encountered
Jan 14 08:09:26 
Jan 14 08:09:26 
Jan 14 08:09:26 ======================================================================
Jan 14 08:09:26 ERROR [0.203s]: test_where_and_typing (__main__.TestTEFuser)
Jan 14 08:09:26 ----------------------------------------------------------------------
Jan 14 08:09:26 Traceback (most recent call last):
Jan 14 08:09:26   File "test_jit_fuser_te.py", line 1142, in test_where_and_typing
Jan 14 08:09:26     x = torch.randn(4, 4, dtype=torch.double, device=device)
Jan 14 08:09:26 RuntimeError: CUDA error: an illegal memory access was encountered
Jan 14 08:09:26 
Jan 14 08:09:26 ======================================================================
Jan 14 08:09:26 ERROR [0.175s]: test_zero_element_tensors_cuda (__main__.TestTEFuser)
Jan 14 08:09:26 ----------------------------------------------------------------------
Jan 14 08:09:26 Traceback (most recent call last):
Jan 14 08:09:26   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 888, in wrapper
Jan 14 08:09:26     method(*args, **kwargs)
Jan 14 08:09:26   File "test_jit_fuser_te.py", line 178, in test_zero_element_tensors_cuda
Jan 14 08:09:26     self._test_zero_element_tensors(device="cuda")
Jan 14 08:09:26   File "test_jit_fuser_te.py", line 174, in _test_zero_element_tensors

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.


from test_pytorch_onnx_onnxruntime import TestONNXRuntime

class TestONNXRuntime_cuda(unittest.TestCase):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for setting this up. Are you going to add a CI job for these test?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this test is already added to CI job accidentally, i'll update to disable it for now.

@BowenBao BowenBao merged commit c363592 into pytorch:onnx_ms_1 Jan 21, 2021
BowenBao added a commit that referenced this pull request Jan 21, 2021
Need to replace dtype of export created scalars from float to double. (In torch implicit conversion logic, python numbers are double)

Test case skipped in CI due to that current CI job env does not have CUDA support.

[ghstack-poisoned]
BowenBao added a commit that referenced this pull request Jan 21, 2021
Need to replace dtype of export created scalars from float to double. (In torch implicit conversion logic, python numbers are double)

Test case skipped in CI due to that current CI job env does not have CUDA support.

ghstack-source-id: d47760e
Pull Request resolved: #50911
BowenBao added a commit that referenced this pull request Jan 21, 2021
Need to replace dtype of export created scalars from float to double. (In torch implicit conversion logic, python numbers are double)

Test case skipped in CI due to that current CI job env does not have CUDA support.

[ghstack-poisoned]
BowenBao added a commit that referenced this pull request Jan 22, 2021
Need to replace dtype of export created scalars from float to double. (In torch implicit conversion logic, python numbers are double)

Test case skipped in CI due to that current CI job env does not have CUDA support.

[ghstack-poisoned]
BowenBao added a commit that referenced this pull request Jan 22, 2021
Need to replace dtype of export created scalars from float to double. (In torch implicit conversion logic, python numbers are double)

Test case skipped in CI due to that current CI job env does not have CUDA support.

ghstack-source-id: 715eec1
Pull Request resolved: #50911
BowenBao added a commit that referenced this pull request Jan 22, 2021
Need to replace dtype of export created scalars from float to double. (In torch implicit conversion logic, python numbers are double)

Test case skipped in CI due to that current CI job env does not have CUDA support.

Differential Revision: [D26023938](https://our.internmc.facebook.com/intern/diff/D26023938)

[ghstack-poisoned]
BowenBao added a commit that referenced this pull request Jan 25, 2021
Need to replace dtype of export created scalars from float to double. (In torch implicit conversion logic, python numbers are double)

Test case skipped in CI due to that current CI job env does not have CUDA support.

[ghstack-poisoned]
BowenBao added a commit that referenced this pull request Jan 25, 2021
Need to replace dtype of export created scalars from float to double. (In torch implicit conversion logic, python numbers are double)

Test case skipped in CI due to that current CI job env does not have CUDA support.

Differential Revision: [D26050889](https://our.internmc.facebook.com/intern/diff/D26050889)

[ghstack-poisoned]
BowenBao added a commit that referenced this pull request Jan 26, 2021
Need to replace dtype of export created scalars from float to double. (In torch implicit conversion logic, python numbers are double)

Test case skipped in CI due to that current CI job env does not have CUDA support.

Differential Revision: [D26050889](https://our.internmc.facebook.com/intern/diff/D26050889)

[ghstack-poisoned]
facebook-github-bot pushed a commit that referenced this pull request Jan 28, 2021
Summary:
Pull Request resolved: #50911

Need to replace dtype of export created scalars from float to double. (In torch implicit conversion logic, python numbers are double)

Test case skipped in CI due to that current CI job env does not have CUDA support.

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D26050889

Pulled By: SplitInfinity

fbshipit-source-id: 1fdde23a68d4793e6b9a82840acc213e5c3aa760
BowenBao added a commit to BowenBao/pytorch that referenced this pull request Jan 28, 2021
Need to replace dtype of export created scalars from float to double. (In torch implicit conversion logic, python numbers are double)

Test case skipped in CI due to that current CI job env does not have CUDA support.

ghstack-source-id: 1be05b0
Pull Request resolved: pytorch#50911
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants