Skip to content

Conversation

@driazati
Copy link
Contributor

@driazati driazati commented Jun 18, 2021

Stack from ghstack:

@facebook-github-bot facebook-github-bot added oncall: jit Add this issue/PR to JIT oncall triage queue cla signed labels Jun 18, 2021
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jun 18, 2021

💊 CI failures summary and remediations

As of commit 467c4cc (more details on the Dr. CI page and at hud.pytorch.org/pr/60298):


  • 4/4 failures possibly* introduced in this PR
    • 1/4 non-scanned failure(s)

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_macos_10_13_py3_test (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

Jun 21 08:09:58 test_udf_remote_message_delay...yUniqueId(created_on=0, local_id=0) to be created.
Jun 21 08:09:14 frame #13: c10::ThreadPool::main_loop(unsigned long) + 569 (0x10c1aa5e9 in libc10.dylib)
Jun 21 08:09:14 frame #14: void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, c10::ThreadPool::ThreadPool(int, int, std::__1::function<void ()>)::$_0> >(void*) + 67 (0x10c1aac93 in libc10.dylib)
Jun 21 08:09:14 frame #15: _pthread_start + 148 (0x7fff67967109 in libsystem_pthread.dylib)
Jun 21 08:09:14 frame #16: thread_start + 15 (0x7fff67962b8b in libsystem_pthread.dylib)
Jun 21 08:09:14 
Jun 21 08:09:15 ok (3.907s)
Jun 21 08:09:31   test_rpc_builtin_timeout (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... ok (16.385s)
Jun 21 08:09:41   test_rpc_script_timeout (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... ok (9.900s)
Jun 21 08:09:45   test_rref_to_here_timeout (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... ok (3.961s)
Jun 21 08:09:53   test_udf_remote_message_delay_timeout (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... ok (8.091s)
Jun 21 08:09:58   test_udf_remote_message_delay_timeout_to_self (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... [E request_callback_no_python.cpp:552] Received error while processing request type 261: falseINTERNAL ASSERT FAILED at "../torch/csrc/distributed/rpc/rref_context.cpp":387, please report a bug to PyTorch. Expected OwnerRRef with id GloballyUniqueId(created_on=0, local_id=0) to be created.
Jun 21 08:09:58 Exception raised from getOwnerRRef at ../torch/csrc/distributed/rpc/rref_context.cpp:387 (most recent call first):
Jun 21 08:09:58 frame #0: c10::Error::Error(c10::SourceLocation, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) + 98 (0x113cbe932 in libc10.dylib)
Jun 21 08:09:58 frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 106 (0x113cbd0aa in libc10.dylib)
Jun 21 08:09:58 frame #2: c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 64 (0x113cbd2e0 in libc10.dylib)
Jun 21 08:09:58 frame #3: torch::distributed::rpc::RRefContext::getOwnerRRef(torch::distributed::rpc::GloballyUniqueId const&, bool) + 1572 (0x11e751f84 in libtorch_cpu.dylib)
Jun 21 08:09:58 frame #4: torch::distributed::rpc::RequestCallbackNoPython::assignOwnerRRef(torch::distributed::rpc::GloballyUniqueId const&, torch::distributed::rpc::GloballyUniqueId const&, c10::intrusive_ptr<c10::ivalue::Future, c10::detail::intrusive_target_default_null_type<c10::ivalue::Future> >) const + 86 (0x11e73dbf6 in libtorch_cpu.dylib)
Jun 21 08:09:58 frame #5: torch::distributed::rpc::RequestCallbackImpl::processPythonRemoteCall(torch::distributed::rpc::RpcCommandBase&, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >) const + 179 (0x11a843433 in libtorch_python.dylib)
Jun 21 08:09:58 frame #6: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >) const + 512 (0x11e73c910 in libtorch_cpu.dylib)
Jun 21 08:09:58 frame #7: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >) const + 74 (0x11a84400a in libtorch_python.dylib)
Jun 21 08:09:58 frame #8: c10::intrusive_ptr<c10::ivalue::Future, c10::detail::intrusive_target_default_null_type<c10::ivalue::Future> > c10::ivalue::Future::thenAsync<torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >) const::$_1>(torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >) const::$_1, std::__1::shared_ptr<c10::Type>)::'lambda'(c10::ivalue::Future&)::operator()(c10::ivalue::Future&) + 223 (0x11e74433f in libtorch_cpu.dylib)

2 failures not recognized by patterns:

Job Step Action
GitHub Actions Lint / shellcheck Run ShellCheck 🔁 rerun
GitHub Actions clang-format / clang-format Run clang-format 🔁 rerun

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

driazati added a commit that referenced this pull request Jun 18, 2021
ghstack-source-id: 9c66869
Pull Request resolved: #60298
driazati added a commit that referenced this pull request Jun 21, 2021
ghstack-source-id: 7f72a14
Pull Request resolved: #60298
@driazati driazati closed this Jun 21, 2021
@facebook-github-bot facebook-github-bot deleted the gh/driazati/46/head branch July 22, 2021 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants