Skip to content

Conversation

@ptrblck
Copy link
Collaborator

@ptrblck ptrblck commented May 7, 2020

Could fix #37725 by skipping the depthwisew-workload check introduced in #22302.
We are currently running more checks to verify the functionality.

CC @VitalyFedyunin @ngimel @xwang233

@dr-ci
Copy link

dr-ci bot commented May 7, 2020

💊 CI failures summary and remediations

As of commit e154be6 (more details on the Dr. CI page):



🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_clang5_asan_test (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

May 07 21:04:07 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:11:3 in
May 07 21:04:07     #7 0x5629c9c5f74b in PyEval_EvalCode /tmp/build/80754af9/python_1585002248360/work/Python/ceval.c:731 
May 07 21:04:07     #8 0x5629c9cdf633 in run_mod /tmp/build/80754af9/python_1585002248360/work/Python/pythonrun.c:1025 
May 07 21:04:07     #9 0x5629c9cdf6cc in PyRun_StringFlags /tmp/build/80754af9/python_1585002248360/work/Python/pythonrun.c:949 
May 07 21:04:07     #10 0x5629c9cdf72e in PyRun_SimpleStringFlags /tmp/build/80754af9/python_1585002248360/work/Python/pythonrun.c:445 
May 07 21:04:07     #11 0x5629c9ce3532 in run_command /tmp/build/80754af9/python_1585002248360/work/Modules/main.c:301 
May 07 21:04:07     #12 0x5629c9ce3532 in Py_Main /tmp/build/80754af9/python_1585002248360/work/Modules/main.c:749 
May 07 21:04:07     #13 0x5629c9bae1fd in main /tmp/build/80754af9/python_1585002248360/work/Programs/python.c:69 
May 07 21:04:07     #14 0x7f76bfa1482f in __libc_start_main /build/glibc-LK5gWL/glibc-2.23/csu/../csu/libc-start.c:291 
May 07 21:04:07     #15 0x5629c9c8cc29 in _start /home/rdonnelly/mc/conda-bld/compilers_linux-64_1534865402226/work/.build/src/glibc-2.12.2/csu/../sysdeps/x86_64/elf/start.S:103 
May 07 21:04:07  
May 07 21:04:07 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:11:3 in  
May 07 21:04:07 + retcode=1 
May 07 21:04:07 + set -e 
May 07 21:04:07 + return 1 
May 07 21:04:07 + [[ pytorch-linux-xenial-py3-clang5-asan-test == *-NO_AVX-* ]] 
May 07 21:04:07 + [[ pytorch-linux-xenial-py3-clang5-asan-test == *-NO_AVX2-* ]] 
May 07 21:04:07 + '[' -n https://github.com/pytorch/pytorch/pull/38044 ']' 
May 07 21:04:07 ++ mktemp 
May 07 21:04:07 + DETERMINE_FROM=/tmp/tmp.qlnWzj7KNB 
May 07 21:04:07 + file_diff_from_base /tmp/tmp.qlnWzj7KNB 
May 07 21:04:07 + set +e 

❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

See CircleCI build pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun) ❄️

May 07 21:43:40 ConnectionResetError: [Errno 104] Connection reset by peer
May 07 21:43:40   File "/opt/conda/lib/python3.6/multiprocessing/queues.py", line 113, in get 
May 07 21:43:40     return _ForkingPickler.loads(res) 
May 07 21:43:40   File "/opt/conda/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 282, in rebuild_storage_fd 
May 07 21:43:40     fd = df.detach() 
May 07 21:43:40   File "/opt/conda/lib/python3.6/multiprocessing/resource_sharer.py", line 58, in detach 
May 07 21:43:40     return reduction.recv_handle(conn) 
May 07 21:43:40   File "/opt/conda/lib/python3.6/multiprocessing/reduction.py", line 182, in recv_handle 
May 07 21:43:40     return recvfds(s, 1)[0] 
May 07 21:43:40   File "/opt/conda/lib/python3.6/multiprocessing/reduction.py", line 153, in recvfds 
May 07 21:43:40     msg, ancdata, flags, addr = sock.recvmsg(1, socket.CMSG_SPACE(bytes_size)) 
May 07 21:43:40 ConnectionResetError: [Errno 104] Connection reset by peer 
May 07 21:43:40  
May 07 21:43:40 Process ErrorTrackingProcess-120: 
May 07 21:43:40 Traceback (most recent call last): 
May 07 21:43:40   File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap 
May 07 21:43:40     self.run() 
May 07 21:43:40   File "/var/lib/jenkins/workspace/test/test_dataloader.py", line 362, in run 
May 07 21:43:40     super(ErrorTrackingProcess, self).run() 
May 07 21:43:40   File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 93, in run 
May 07 21:43:40     self._target(*self._args, **self._kwargs) 
May 07 21:43:40   File "/var/lib/jenkins/workspace/test/test_dataloader.py", line 630, in _test_proper_exit 

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

See how this bot performed.

This comment has been revised 2 times.

@ezyang ezyang changed the title [DRAFT] relax depthwise conditions for channels-last convs [WIP] relax depthwise conditions for channels-last convs May 13, 2020
@ezyang
Copy link
Contributor

ezyang commented May 13, 2020

Remove WIP label when you're ready for review

@VitalyFedyunin
Copy link
Contributor

Looks like we fine to land it as checked with 388k shapes.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ptrblck ptrblck changed the title [WIP] relax depthwise conditions for channels-last convs Relax depthwise conditions for channels-last convs May 13, 2020
@ptrblck
Copy link
Collaborator Author

ptrblck commented May 19, 2020

@VitalyFedyunin, @ngimel
Xiao checked, that the change skips the use_cudnn check for NHWC, so we'll add it to cudnn_conv_use_channels_last and try to relax it for the channels-last format based on the checked shapes.

@ptrblck
Copy link
Collaborator Author

ptrblck commented May 22, 2020

Closing in favor of #38904

@ptrblck ptrblck closed this May 22, 2020
facebook-github-bot pushed a commit that referenced this pull request Jun 22, 2020
Summary:
Follow up of #38044. Thanks ptrblck, mcarilli for the help on discussing the changes!

Could fix #37725 by skipping the depthwise-workload check introduced in #22302. This PR also relaxed dilated convolution for channels-last.

The testing script is https://gist.github.com/xwang233/82a707f69bb710cb612349280a2c5f41. About 387k conv arguments were tested and no cudnn exception was thrown.

cc ngimel VitalyFedyunin ptrblck mcarilli
Pull Request resolved: #38904

Differential Revision: D22155797

Pulled By: VitalyFedyunin

fbshipit-source-id: 81b5736cec67ea263029121521c6acafd9dddba6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Exception: Operator conv2d lost channels_last property

5 participants