Skip to content

Conversation

@pritamdamania87
Copy link
Contributor

@pritamdamania87 pritamdamania87 commented Nov 4, 2019

Stack from ghstack:

We would skip rpc.join_rpc() in the case of clean_shutdown=False.
This would exit the process without properly cleaning up the local RPCAgent
resulting in a crash.

As a result, to fix this we still call rpc.join_rpc() even in an unclean
shutdown. Note that, rpc.join_rpc() needs to be replaced with a local
shutdown call eventually since we need a way to shutdown the local RPC agent
properly.

Differential Revision: D18306941

We would skip rpc.join_rpc() in the case of `clean_shutdown=False`.
This would exit the process without properly cleaning up the local RPCAgent
resulting in a crash.

As a result, to fix this we still call rpc.join_rpc() even in an unclean
shutdown. Note that, rpc.join_rpc() needs to be replaced with a local
`shutdown` call eventually since we need a way to shutdown the local RPC agent
properly.

Differential Revision: [D18306941](https://our.internmc.facebook.com/intern/diff/D18306941/)

[ghstack-poisoned]
pritamdamania87 pushed a commit that referenced this pull request Nov 4, 2019
We would skip rpc.join_rpc() in the case of `clean_shutdown=False`.
This would exit the process without properly cleaning up the local RPCAgent
resulting in a crash.

As a result, to fix this we still call rpc.join_rpc() even in an unclean
shutdown. Note that, rpc.join_rpc() needs to be replaced with a local
`shutdown` call eventually since we need a way to shutdown the local RPC agent
properly.

Differential Revision: [D18306941](https://our.internmc.facebook.com/intern/diff/D18306941/)

ghstack-source-id: 93216133
Pull Request resolved: #29148
@rohan-varma
Copy link
Contributor

Note that, rpc.join_rpc() needs to be replaced with a local
shutdown call eventually since we need a way to shutdown the local RPC agent
properly.

This should be unblocked once pytorch/gloo#228 is landed.

…False`."

We would skip rpc.join_rpc() in the case of `clean_shutdown=False`.
This would exit the process without properly cleaning up the local RPCAgent
resulting in a crash.

As a result, to fix this we still call rpc.join_rpc() even in an unclean
shutdown. Note that, rpc.join_rpc() needs to be replaced with a local
`shutdown` call eventually since we need a way to shutdown the local RPC agent
properly.

Differential Revision: [D18306941](https://our.internmc.facebook.com/intern/diff/D18306941/)

[ghstack-poisoned]
pritamdamania87 pushed a commit that referenced this pull request Nov 8, 2019
Pull Request resolved: #29148

We would skip rpc.join_rpc() in the case of `clean_shutdown=False`.
This would exit the process without properly cleaning up the local RPCAgent
resulting in a crash.

As a result, to fix this we still call rpc.join_rpc() even in an unclean
shutdown. Note that, rpc.join_rpc() needs to be replaced with a local
`shutdown` call eventually since we need a way to shutdown the local RPC agent
properly.

Differential Revision: [D18306941](https://our.internmc.facebook.com/intern/diff/D18306941/)
ghstack-source-id: 93561508
# since we need to shutdown the RPC agent. If we don't shutdown the
# RPC agent, tests would fail since RPC agent threads, locks and
# condition variables are not properly terminated.
rpc.join_rpc()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, replace this with the local shutdown call after the shutdown API is ready.

fut = rpc.rpc_async(dst_name, set_termination_signal, args=())
futs.append(fut)
for fut in futs:
assert fut.wait() is None, "Sending termination signal failed."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And replace this part with the rpc.all_shutdown() API to be added.

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 310343e.

@facebook-github-bot facebook-github-bot deleted the gh/pritamdamania87/22/head branch November 15, 2019 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants