Skip to content

Conversation

@janeyx99
Copy link
Contributor

@janeyx99 janeyx99 commented Jul 15, 2021

Move non-libtorch Linux 11.3 scheduled CI job to GHA.
Libtorch builds will be migrated here: #61774

Successful run: https://github.com/pytorch/pytorch/actions/runs/1035592487

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jul 15, 2021

💊 CI failures summary and remediations

As of commit adf6d2e (more details on the Dr. CI page and at hud.pytorch.org/pr/61732):


  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_macos_10_13_py3_test (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

Jul 16 17:52:03 test_remote_message_script_de...yUniqueId(created_on=0, local_id=0) to be created.
Jul 16 17:51:35 frame #12: std::__1::__function::__func<std::__1::__bind<torch::distributed::rpc::ProcessGroupAgent::enqueueRecv(torch::distributed::rpc::RecvWork)::$_6, torch::distributed::rpc::RecvWork>, std::__1::allocator<std::__1::__bind<torch::distributed::rpc::ProcessGroupAgent::enqueueRecv(torch::distributed::rpc::RecvWork)::$_6, torch::distributed::rpc::RecvWork> >, void ()>::operator()() + 42 (0x11e1e7c2a in libtorch_cpu.dylib)
Jul 16 17:51:35 frame #13: c10::ThreadPool::main_loop(unsigned long) + 569 (0x118499369 in libc10.dylib)
Jul 16 17:51:35 frame #14: void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, c10::ThreadPool::ThreadPool(int, int, std::__1::function<void ()>)::$_0> >(void*) + 67 (0x118499a13 in libc10.dylib)
Jul 16 17:51:35 frame #15: _pthread_start + 148 (0x7fff7026f109 in libsystem_pthread.dylib)
Jul 16 17:51:35 frame #16: thread_start + 15 (0x7fff7026ab8b in libsystem_pthread.dylib)
Jul 16 17:51:35 
Jul 16 17:51:35 ok (4.029s)
Jul 16 17:51:43   test_remote_message_dropped_pickle (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... ok (8.158s)
Jul 16 17:51:52   test_remote_message_dropped_pickle_to_self (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... ok (8.423s)
Jul 16 17:51:59   test_remote_message_script_delay_timeout (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... ok (7.285s)
Jul 16 17:52:03   test_remote_message_script_delay_timeout_to_self (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... [E request_callback_no_python.cpp:555] Received error while processing request type 260: falseINTERNAL ASSERT FAILED at "../torch/csrc/distributed/rpc/rref_context.cpp":390, please report a bug to PyTorch. Expected OwnerRRef with id GloballyUniqueId(created_on=0, local_id=0) to be created.
Jul 16 17:52:03 Exception raised from getOwnerRRef at ../torch/csrc/distributed/rpc/rref_context.cpp:390 (most recent call first):
Jul 16 17:52:03 frame #0: c10::Error::Error(c10::SourceLocation, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) + 98 (0x10a84f6b2 in libc10.dylib)
Jul 16 17:52:03 frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 106 (0x10a84de2a in libc10.dylib)
Jul 16 17:52:03 frame #2: c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 64 (0x10a84e060 in libc10.dylib)
Jul 16 17:52:03 frame #3: torch::distributed::rpc::RRefContext::getOwnerRRef(torch::distributed::rpc::GloballyUniqueId const&, bool) + 1711 (0x116c9312f in libtorch_cpu.dylib)
Jul 16 17:52:03 frame #4: torch::distributed::rpc::RequestCallbackNoPython::assignOwnerRRef(torch::distributed::rpc::GloballyUniqueId const&, torch::distributed::rpc::GloballyUniqueId const&, c10::intrusive_ptr<c10::ivalue::Future, c10::detail::intrusive_target_default_null_type<c10::ivalue::Future> >) const + 86 (0x116c7d986 in libtorch_cpu.dylib)
Jul 16 17:52:03 frame #5: torch::distributed::rpc::RequestCallbackImpl::processScriptRemoteCall(torch::distributed::rpc::RpcCommandBase&, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >) const + 376 (0x109e4b7a8 in libtorch_python.dylib)
Jul 16 17:52:03 frame #6: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >) const + 437 (0x116c7c5d5 in libtorch_cpu.dylib)
Jul 16 17:52:03 frame #7: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >) const + 74 (0x109e4c51a in libtorch_python.dylib)
Jul 16 17:52:03 frame #8: c10::intrusive_ptr<c10::ivalue::Future, c10::detail::intrusive_target_default_null_type<c10::ivalue::Future> > c10::ivalue::Future::thenAsync<torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >) const::$_1>(torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >) const::$_1, std::__1::shared_ptr<c10::Type>)::'lambda'(c10::ivalue::Future&)::operator()(c10::ivalue::Future&) + 223 (0x116c8429f in libtorch_cpu.dylib)

Preview docs built from this PR

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@janeyx99 janeyx99 requested a review from a team July 15, 2021 22:37
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to enable this and the other libtorch in a followup PR

@janeyx99 janeyx99 changed the title Move scheduled linux CI to GHA Move non-libtorch scheduled linux CI to GHA Jul 16, 2021
@facebook-github-bot
Copy link
Contributor

@janeyx99 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@janeyx99 janeyx99 force-pushed the move-scheduled-linux-tests branch from 15e958e to adf6d2e Compare July 16, 2021 16:09
@facebook-github-bot
Copy link
Contributor

@janeyx99 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@janeyx99 merged this pull request in 3fd9dcf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants