-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Closed
Labels
high prioritymodule: flaky-testsProblem is a flaky test in CIProblem is a flaky test in CImodule: rpcRelated to RPC, distributed autograd, RRef, and distributed optimizerRelated to RPC, distributed autograd, RRef, and distributed optimizeroncall: distributedAdd this issue/PR to distributed oncall triage queueAdd this issue/PR to distributed oncall triage queuetriage review
Description
🐛 test_backward_ddp_outside is flaky
Jun 22 00:59:56 ======================================================================
Jun 22 00:59:56 ERROR [61.552s]: test_backward_ddp_outside (__main__.TestDdpUnderDistAutogradWrapper)
Jun 22 00:59:56 ----------------------------------------------------------------------
Jun 22 00:59:56 Traceback (most recent call last):
Jun 22 00:59:56 File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/testing/_internal/common_distributed.py", line 204, in wrapper
Jun 22 00:59:56 self._join_processes(fn)
Jun 22 00:59:56 File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/testing/_internal/common_distributed.py", line 306, in _join_processes
Jun 22 00:59:56 self._check_return_codes(elapsed_time)
Jun 22 00:59:56 File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/testing/_internal/common_distributed.py", line 339, in _check_return_codes
Jun 22 00:59:56 raise RuntimeError(error)
Jun 22 00:59:56 RuntimeError: Processes 5 exited with error code 10
Jun 22 00:59:56
Jun 22 00:59:56 ----------------------------------------------------------------------
cc @ezyang @gchanan @zou3519 @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @xush6528 @osalpekar @jjlilley
Metadata
Metadata
Assignees
Labels
high prioritymodule: flaky-testsProblem is a flaky test in CIProblem is a flaky test in CImodule: rpcRelated to RPC, distributed autograd, RRef, and distributed optimizerRelated to RPC, distributed autograd, RRef, and distributed optimizeroncall: distributedAdd this issue/PR to distributed oncall triage queueAdd this issue/PR to distributed oncall triage queuetriage review