Move non-libtorch scheduled linux CI to GHA #61732

janeyx99 · 2021-07-15T22:34:17Z

Move non-libtorch Linux 11.3 scheduled CI job to GHA.
Libtorch builds will be migrated here: #61774

Successful run: https://github.com/pytorch/pytorch/actions/runs/1035592487

facebook-github-bot · 2021-07-15T22:34:23Z

💊 CI failures summary and remediations

As of commit adf6d2e (more details on the Dr. CI page and at hud.pytorch.org/pr/61732):

1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_macos_10_13_py3_test (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

Jul 16 17:52:03 test_remote_message_script_de...yUniqueId(created_on=0, local_id=0) to be created.

Jul 16 17:51:35 frame #12: std::__1::__function::__func<std::__1::__bind<torch::distributed::rpc::ProcessGroupAgent::enqueueRecv(torch::distributed::rpc::RecvWork)::$_6, torch::distributed::rpc::RecvWork>, std::__1::allocator<std::__1::__bind<torch::distributed::rpc::ProcessGroupAgent::enqueueRecv(torch::distributed::rpc::RecvWork)::$_6, torch::distributed::rpc::RecvWork> >, void ()>::operator()() + 42 (0x11e1e7c2a in libtorch_cpu.dylib)
Jul 16 17:51:35 frame #13: c10::ThreadPool::main_loop(unsigned long) + 569 (0x118499369 in libc10.dylib)
Jul 16 17:51:35 frame #14: void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, c10::ThreadPool::ThreadPool(int, int, std::__1::function<void ()>)::$_0> >(void*) + 67 (0x118499a13 in libc10.dylib)
Jul 16 17:51:35 frame #15: _pthread_start + 148 (0x7fff7026f109 in libsystem_pthread.dylib)
Jul 16 17:51:35 frame #16: thread_start + 15 (0x7fff7026ab8b in libsystem_pthread.dylib)
Jul 16 17:51:35 
Jul 16 17:51:35 ok (4.029s)
Jul 16 17:51:43   test_remote_message_dropped_pickle (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... ok (8.158s)
Jul 16 17:51:52   test_remote_message_dropped_pickle_to_self (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... ok (8.423s)
Jul 16 17:51:59   test_remote_message_script_delay_timeout (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... ok (7.285s)
Jul 16 17:52:03   test_remote_message_script_delay_timeout_to_self (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... [E request_callback_no_python.cpp:555] Received error while processing request type 260: falseINTERNAL ASSERT FAILED at "../torch/csrc/distributed/rpc/rref_context.cpp":390, please report a bug to PyTorch. Expected OwnerRRef with id GloballyUniqueId(created_on=0, local_id=0) to be created.
Jul 16 17:52:03 Exception raised from getOwnerRRef at ../torch/csrc/distributed/rpc/rref_context.cpp:390 (most recent call first):
Jul 16 17:52:03 frame #0: c10::Error::Error(c10::SourceLocation, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) + 98 (0x10a84f6b2 in libc10.dylib)
Jul 16 17:52:03 frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 106 (0x10a84de2a in libc10.dylib)
Jul 16 17:52:03 frame #2: c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 64 (0x10a84e060 in libc10.dylib)
Jul 16 17:52:03 frame #3: torch::distributed::rpc::RRefContext::getOwnerRRef(torch::distributed::rpc::GloballyUniqueId const&, bool) + 1711 (0x116c9312f in libtorch_cpu.dylib)
Jul 16 17:52:03 frame #4: torch::distributed::rpc::RequestCallbackNoPython::assignOwnerRRef(torch::distributed::rpc::GloballyUniqueId const&, torch::distributed::rpc::GloballyUniqueId const&, c10::intrusive_ptr<c10::ivalue::Future, c10::detail::intrusive_target_default_null_type<c10::ivalue::Future> >) const + 86 (0x116c7d986 in libtorch_cpu.dylib)
Jul 16 17:52:03 frame #5: torch::distributed::rpc::RequestCallbackImpl::processScriptRemoteCall(torch::distributed::rpc::RpcCommandBase&, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >) const + 376 (0x109e4b7a8 in libtorch_python.dylib)
Jul 16 17:52:03 frame #6: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >) const + 437 (0x116c7c5d5 in libtorch_cpu.dylib)
Jul 16 17:52:03 frame #7: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >) const + 74 (0x109e4c51a in libtorch_python.dylib)
Jul 16 17:52:03 frame #8: c10::intrusive_ptr<c10::ivalue::Future, c10::detail::intrusive_target_default_null_type<c10::ivalue::Future> > c10::ivalue::Future::thenAsync<torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >) const::$_1>(torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >) const::$_1, std::__1::shared_ptr<c10::Type>)::'lambda'(c10::ivalue::Future&)::operator()(c10::ivalue::Future&) + 223 (0x116c8429f in libtorch_cpu.dylib)

Preview docs built from this PR

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

.github/templates/linux_ci_workflow.yml.j2

janeyx99 · 2021-07-16T15:56:36Z

.github/scripts/generate_ci_workflows.py

I'm going to enable this and the other libtorch in a followup PR

facebook-github-bot · 2021-07-16T15:57:55Z

@janeyx99 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-07-16T17:49:52Z

@janeyx99 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-07-16T19:18:25Z

@janeyx99 merged this pull request in 3fd9dcf.

Move scheduled linux CI to GHA

94d7962

janeyx99 requested review from driazati, seemethere and zhouzhuojie as code owners July 15, 2021 22:34

facebook-github-bot added the cla signed label Jul 15, 2021

Actually run the tests to see if it works

8c25f5a

janeyx99 requested a review from a team July 15, 2021 22:37

seemethere reviewed Jul 15, 2021

View reviewed changes

.github/templates/linux_ci_workflow.yml.j2 Outdated Show resolved Hide resolved

janeyx99 commented Jul 16, 2021

View reviewed changes

.github/scripts/generate_ci_workflows.py Outdated

Copy link

Contributor Author

janeyx99 Jul 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to enable this and the other libtorch in a followup PR

janeyx99 changed the title ~~Move scheduled linux CI to GHA~~ Move non-libtorch scheduled linux CI to GHA Jul 16, 2021

Don't migrate libtorch in this PR just yet

adf6d2e

janeyx99 force-pushed the move-scheduled-linux-tests branch from 15e958e to adf6d2e Compare July 16, 2021 16:09

seemethere approved these changes Jul 16, 2021

View reviewed changes

malfet approved these changes Jul 16, 2021

View reviewed changes

facebook-github-bot closed this in 3fd9dcf Jul 16, 2021

facebook-github-bot added the Merged label Jul 16, 2021

janeyx99 mentioned this pull request Jul 19, 2021

Migrate build/test jobs from CircleCI to GitHub Actions #57686

Closed

71 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Move non-libtorch scheduled linux CI to GHA #61732

Move non-libtorch scheduled linux CI to GHA #61732

Uh oh!

janeyx99 commented Jul 15, 2021 •

edited

Loading

Uh oh!

facebook-github-bot commented Jul 15, 2021 •

edited

Loading

Uh oh!

Uh oh!

janeyx99 Jul 16, 2021

Uh oh!

facebook-github-bot commented Jul 16, 2021

Uh oh!

facebook-github-bot commented Jul 16, 2021

Uh oh!

facebook-github-bot commented Jul 16, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Move non-libtorch scheduled linux CI to GHA #61732

Move non-libtorch scheduled linux CI to GHA #61732

Uh oh!

Conversation

janeyx99 commented Jul 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Jul 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

pytorch_macos_10_13_py3_test (1/1)

Uh oh!

Uh oh!

janeyx99 Jul 16, 2021

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jul 16, 2021

Uh oh!

facebook-github-bot commented Jul 16, 2021

Uh oh!

facebook-github-bot commented Jul 16, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

janeyx99 commented Jul 15, 2021 •

edited

Loading

facebook-github-bot commented Jul 15, 2021 •

edited

Loading