Skip to content

[reland ci-all tests] move rebuild buckets from end of first iteration to beginning of second iteration#44893

Closed
zhaojuanmao wants to merge 1 commit intomasterfrom
ci-all/yanlizhao/moveRebuildBucket
Closed

[reland ci-all tests] move rebuild buckets from end of first iteration to beginning of second iteration#44893
zhaojuanmao wants to merge 1 commit intomasterfrom
ci-all/yanlizhao/moveRebuildBucket

Conversation

@zhaojuanmao
Copy link
Copy Markdown
Contributor

This PR is pushing the same commit of #44798 to ci-all/... branch for triggering all tests.

Update for relanding: in ddp.join(), moved _rebuild_buckets from end of backward to beginning of forward as well.

Part of relanding PR #41954, this refactoring is to move rebuild_buckets call from end of first iteration to beginning of second iteration

…to beginning of second iteration"

[test all]

Update for relanding: in ddp.join(), moved _rebuild_buckets from end of backward to beginning of forward as well.

Part of relanding PR #41954, this refactoring is to move rebuild_buckets call from end of first iteration to beginning of second iteration

Differential Revision: [D23735185](https://our.internmc.facebook.com/intern/diff/D23735185/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23735185/)!

[ghstack-poisoned]
@dr-ci
Copy link
Copy Markdown

dr-ci bot commented Sep 17, 2020

💊 CI failures summary and remediations

As of commit 8a6fd07 (more details on the Dr. CI page):



🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_paralleltbb_linux_xenial_py3_6_gcc5_4_test (1/2)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Sep 17 19:26:24 [E request_callback_no_python.cpp:618] Received error while processing request type 2: RuntimeError: Can not pickle torch.futures.Future
Sep 17 19:26:24 At: 
Sep 17 19:26:24   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(93): serialize 
Sep 17 19:26:24   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(145): serialize 
Sep 17 19:26:24  
Sep 17 19:26:24 [E request_callback_no_python.cpp:618] Received error while processing request type 2: RuntimeError: Can not pickle torch.futures.Future 
Sep 17 19:26:24  
Sep 17 19:26:24 At: 
Sep 17 19:26:24   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(93): serialize 
Sep 17 19:26:24   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(145): serialize 
Sep 17 19:26:24  
Sep 17 19:26:24 [E request_callback_no_python.cpp:618] Received error while processing request type 2: RuntimeError: Can not pickle torch.futures.Future 
Sep 17 19:26:24  
Sep 17 19:26:24 At: 
Sep 17 19:26:24   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(93): serialize 
Sep 17 19:26:24   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(145): serialize 
Sep 17 19:26:24  
Sep 17 19:26:24 [W tensorpipe_agent.cpp:576] RPC agent for worker1 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown) 
Sep 17 19:26:24 [W tensorpipe_agent.cpp:576] RPC agent for worker2 encountered error when reading incoming request from worker1: EOF: end of file (this is expected to happen during shutdown) 
Sep 17 19:26:24 [W tensorpipe_agent.cpp:576] RPC agent for worker3 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown) 
Sep 17 19:26:24 ok (1.539s) 
Sep 17 19:26:25   test_return_future_remote (__main__.TensorPipeRpcTestWithSpawn) ... [W tensorpipe_agent.cpp:576] RPC agent for worker0 encountered error when reading incoming request from worker3: EOF: end of file (this is expected to happen during shutdown) 

See CircleCI build pytorch_linux_bionic_py3_8_gcc9_coverage_test (2/2)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Sep 17 19:29:41 [E request_callback_no_python.cpp:618] Received error while processing request type 2: RuntimeError: Can not pickle torch.futures.Future
Sep 17 19:29:41 At: 
Sep 17 19:29:41   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(93): serialize 
Sep 17 19:29:41   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(145): serialize 
Sep 17 19:29:41  
Sep 17 19:29:41 [E request_callback_no_python.cpp:618] Received error while processing request type 2: RuntimeError: Can not pickle torch.futures.Future 
Sep 17 19:29:41  
Sep 17 19:29:41 At: 
Sep 17 19:29:41   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(93): serialize 
Sep 17 19:29:41   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(145): serialize 
Sep 17 19:29:41  
Sep 17 19:29:41 [E request_callback_no_python.cpp:618] Received error while processing request type 2: RuntimeError: Can not pickle torch.futures.Future 
Sep 17 19:29:41  
Sep 17 19:29:41 At: 
Sep 17 19:29:41   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(93): serialize 
Sep 17 19:29:41   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(145): serialize 
Sep 17 19:29:41  
Sep 17 19:29:41 [W tensorpipe_agent.cpp:576] RPC agent for worker1 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown) 
Sep 17 19:29:41 [W tensorpipe_agent.cpp:576] RPC agent for worker2 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown) 
Sep 17 19:29:42 ok (1.442s) 
Sep 17 19:29:43   test_return_future_remote (__main__.TensorPipeRpcTestWithSpawn) ... [W tensorpipe_agent.cpp:576] RPC agent for worker1 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown) 
Sep 17 19:29:43 ok (1.440s) 

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 2 times.

@zhaojuanmao
Copy link
Copy Markdown
Contributor Author

test failures are not related, as reference, regular CI tests for the same commit passed in the #44798 as well

@zhaojuanmao
Copy link
Copy Markdown
Contributor Author

test only

1 similar comment
@zhaojuanmao
Copy link
Copy Markdown
Contributor Author

test only

@github-actions github-actions bot deleted the ci-all/yanlizhao/moveRebuildBucket branch February 9, 2024 01:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants