Skip to content

Conversation

@ailzhang
Copy link
Contributor

@ailzhang ailzhang commented May 19, 2020

ghstack PRs has target branch changed to gh/xxx/1234/base so the merge didn't work. Change it to master by default.
IIRC we don't use ghstack with release branches so this should be good? cc: @ezyang

@ailzhang ailzhang force-pushed the fix_merge_master_on_ghstack branch from ed54aa4 to bd40bc5 Compare May 19, 2020 23:30
@ailzhang ailzhang requested review from ezyang and seemethere May 19, 2020 23:55
@ailzhang ailzhang changed the title Merge with origin/master for ghstack PRs. For jobs need a merge, merge with origin/master for ghstack PRs. May 20, 2020
@dr-ci
Copy link

dr-ci bot commented May 20, 2020

💊 CI failures summary and remediations

As of commit bd40bc5 (more details on the Dr. CI page):


  • 2/3 failures possibly* introduced in this PR
    • 1/2 non-CircleCI failure(s)
  • 1/3 broken upstream at merge base 363a2d9 since May 19

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_ge_config_simple_test (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

May 20 01:21:12 test_nested_backward_accumulate_grads (__main__.TensorPipeAgentDistAutogradTestWithSpawn) ... [E request_callback_impl.cpp:96] Received error while processing request type 19: currentRpcAgent_ INTERNAL ASSERT FAILED at "/var/lib/jenkins/workspace/torch/csrc/distributed/rpc/rpc_agent.cpp":246, please report a bug to PyTorch. Current RPC agent is not set!
May 20 01:21:10 frame #10: clone + 0x6d (0x7f0230d2141d in /lib/x86_64-linux-gnu/libc.so.6) 
May 20 01:21:10  
May 20 01:21:10 [W tensorpipe_agent.cpp:222] RPC agent is being closed. Skip sending rpc response 
May 20 01:21:10 [W tensorpipe_agent.cpp:222] RPC agent is being closed. Skip sending rpc response 
May 20 01:21:10 [W tensorpipe_agent.cpp:258] Server read message: EOF: end of file 
May 20 01:21:10 [W tensorpipe_agent.cpp:383] Read response error: EOF: end of file 
May 20 01:21:10 [E container.cpp:248] Could not release Dist Autograd Context on node 0: EOF: end of file 
May 20 01:21:10 [W tensorpipe_agent.cpp:258] Server read message: EOF: end of file 
May 20 01:21:10 [W tensorpipe_agent.cpp:258] Server read message: EOF: end of file 
May 20 01:21:11 ok (10.140s) 
May 20 01:21:12   test_nested_backward_accumulate_grads (__main__.TensorPipeAgentDistAutogradTestWithSpawn) ... [E request_callback_impl.cpp:96] Received error while processing request type 19: currentRpcAgent_ INTERNAL ASSERT FAILED at "/var/lib/jenkins/workspace/torch/csrc/distributed/rpc/rpc_agent.cpp":246, please report a bug to PyTorch. Current RPC agent is not set! 
May 20 01:21:12 Exception raised from getCurrentRpcAgent at /var/lib/jenkins/workspace/torch/csrc/distributed/rpc/rpc_agent.cpp:246 (most recent call first): 
May 20 01:21:12 frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x69 (0x7f35d7151f79 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so) 
May 20 01:21:12 frame #1: torch::distributed::rpc::RpcAgent::getCurrentRpcAgent() + 0x3f4 (0x7f35d1f65974 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so) 
May 20 01:21:12 frame #2: torch::distributed::autograd::CleanupAutogradContextReq::fromMessage(torch::distributed::rpc::Message const&) + 0x64 (0x7f35d1f58f04 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so) 
May 20 01:21:12 frame #3: torch::distributed::rpc::deserializeRequest(torch::distributed::rpc::Message const&) + 0x5f (0x7f35d1f9ddcf in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so) 
May 20 01:21:12 frame #4: torch::distributed::rpc::RequestCallbackImpl::processMessage(torch::distributed::rpc::Message&) const + 0xfa (0x7f35d8151bca in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so) 
May 20 01:21:12 frame #5: torch::distributed::rpc::RequestCallback::operator()(torch::distributed::rpc::Message&) const + 0x1e (0x7f35d1f64e6e in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so) 
May 20 01:21:12 frame #6: <unknown function> + 0xa64c13 (0x7f35d815bc13 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so) 
May 20 01:21:12 frame #7: c10::ThreadPool::main_loop(unsigned long) + 0x2fb (0x7f35d713f90b in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so) 
May 20 01:21:12 frame #8: <unknown function> + 0xc8421 (0x7f35d764b421 in /opt/conda/lib/libstdc++.so.6) 

🚧 1 ongoing upstream failure:

These were probably caused by upstream breakages that are not fixed yet:


ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

See how this bot performed.

This comment has been revised 2 times.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ailzhang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ailzhang merged this pull request in ca1978c.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants