-
Notifications
You must be signed in to change notification settings - Fork 339
Add abortWaitSend/abortWaitRecv to unbound_buffer #232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This pull request was exported from Phabricator. Differential Revision: D18321454 |
Summary: Pull Request resolved: pytorch#232 Closes pytorch#227. Adds 2 methods: abortWaitSend, abortWaitRecv so that we can interrupt a blocking waitRecv or waitSend, per pietern 's suggestion. This is useful if a gloo process is waiting for a message in one thread, but another thread decides to shutdown and wants to stop it gracefully. The methods set a boolean variable and notify a waiting condition variable. The condition variable predicate is changed to check this condition. The implementation is both for the tcp and uv transport methods. Unit tests are also added. Differential Revision: D18321454 fbshipit-source-id: fb75eedcbec46c2878e43209170f72ef6b48ce70
5baefea to
e2c184a
Compare
Error is relevant. |
|
This pull request was exported from Phabricator. Differential Revision: D18321454 |
Summary: Pull Request resolved: pytorch#232 Closes pytorch#227. Adds 2 methods: abortWaitSend, abortWaitRecv so that we can interrupt a blocking waitRecv or waitSend, per pietern 's suggestion. This is useful if a gloo process is waiting for a message in one thread, but another thread decides to shutdown and wants to stop it gracefully. The methods set a boolean variable and notify a waiting condition variable. The condition variable predicate is changed to check this condition. The implementation is both for the tcp and uv transport methods. Unit tests are also added. Differential Revision: D18321454 fbshipit-source-id: 2b0b5af02cf266fd49a4b78d722ff7f2be47bd15
e2c184a to
d3d59fc
Compare
|
This pull request was exported from Phabricator. Differential Revision: D18321454 |
It looks like this has been fixed by adding |
pietern
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @rohan-varma. LGTM.
|
This pull request has been merged in 7c54124. |
Summary: Update gloo submodule to use the new APIs introduced in pytorch/gloo#232. Done by `cd third_party/gloo && git checkout 7c54124` which is gloo's latest commit. Next step would be to consume the introduced APIs in `ProcessGroup::Work`. Then we can use this layer to be able to interrupt `ProcessGroupAgent` (only for the gloo backend). Pull Request resolved: #29248 Reviewed By: xush6528 Differential Revision: D18350654 Pulled By: rohan-varma fbshipit-source-id: e41f7446bbb500087a0ca3919173b2e8379c7ce7
After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `localShutdown` method aborts this thread and joins the other threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid statem. A destructor is also added which simply calls `localShutdown`, and we ensure that this function is not called multiple times. Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `localShutdown` method aborts this thread and joins the other threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid statem. A destructor is also added which simply calls `localShutdown`, and we ensure that this function is not called multiple times. Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
…gent" After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `localShutdown` method aborts this thread and joins the other threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid statem. A destructor is also added which simply calls `localShutdown`, and we ensure that this function is not called multiple times. Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
…gent" After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `localShutdown` method aborts this thread and joins the other threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid statem. A destructor is also added which simply calls `localShutdown`, and we ensure that this function is not called multiple times. Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
…gent" After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `localShutdown` method aborts this thread and joins the other threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid statem. A destructor is also added which simply calls `localShutdown`, and we ensure that this function is not called multiple times. Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
…gent" After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `localShutdown` method aborts this thread and joins the other threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid statem. A destructor is also added which simply calls `localShutdown`, and we ensure that this function is not called multiple times. Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
…gent" After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `localShutdown` method aborts this thread and joins the other threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid statem. A destructor is also added which simply calls `localShutdown`, and we ensure that this function is not called multiple times. Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
…gent" After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `localShutdown` method aborts this thread and joins the other threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid statem. A destructor is also added which simply calls `localShutdown`, and we ensure that this function is not called multiple times. Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
…gent" After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `localShutdown` method aborts this thread and joins the other threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid statem. A destructor is also added which simply calls `localShutdown`, and we ensure that this function is not called multiple times. Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `localShutdown` method aborts this thread and joins the other threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid statem. A destructor is also added which simply calls `localShutdown`, and we ensure that this function is not called multiple times. Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `localShutdown` method aborts this thread and joins the other threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid statem. A destructor is also added which simply calls `localShutdown`, and we ensure that this function is not called multiple times. Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `localShutdown` method aborts this thread and joins the other threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid statem. A destructor is also added which simply calls `localShutdown`, and we ensure that this function is not called multiple times. Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `localShutdown` method aborts this thread and joins the other threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid statem. A destructor is also added which simply calls `localShutdown`, and we ensure that this function is not called multiple times. Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `localShutdown` method aborts this thread and joins the other threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid statem. A destructor is also added which simply calls `localShutdown`, and we ensure that this function is not called multiple times. Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `localShutdown` method aborts this thread and joins the other threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid statem. A destructor is also added which simply calls `localShutdown`, and we ensure that this function is not called multiple times. Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `localShutdown` method aborts this thread and joins the other threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid statem. A destructor is also added which simply calls `localShutdown`, and we ensure that this function is not called multiple times. Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
…agent" After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `localShutdown` method aborts this thread and joins the other threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid statem. A destructor is also added which simply calls `localShutdown`, and we ensure that this function is not called multiple times. Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `localShutdown` method aborts this thread and joins the other threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid statem. A destructor is also added which simply calls `localShutdown`, and we ensure that this function is not called multiple times. Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
…agent" After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `localShutdown` method aborts this thread and joins the other threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid statem. A destructor is also added which simply calls `localShutdown`, and we ensure that this function is not called multiple times. Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `localShutdown` method aborts this thread and joins the other threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid statem. A destructor is also added which simply calls `localShutdown`, and we ensure that this function is not called multiple times. Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
…agent" After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `localShutdown` method aborts this thread and joins the other threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid statem. A destructor is also added which simply calls `localShutdown`, and we ensure that this function is not called multiple times. Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `localShutdown` method aborts this thread and joins the other threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid statem. A destructor is also added which simply calls `localShutdown`, and we ensure that this function is not called multiple times. Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
…agent" After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `localShutdown` method aborts this thread and joins the other threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid statem. A destructor is also added which simply calls `localShutdown`, and we ensure that this function is not called multiple times. Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `localShutdown` method aborts this thread and joins the other threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid statem. A destructor is also added which simply calls `localShutdown`, and we ensure that this function is not called multiple times. Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
…agent" After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `localShutdown` method aborts this thread and joins the other threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid statem. A destructor is also added which simply calls `localShutdown`, and we ensure that this function is not called multiple times. Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `localShutdown` method aborts this thread and joins the other threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid statem. A destructor is also added which simply calls `localShutdown`, and we ensure that this function is not called multiple times. Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
…agent" After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `localShutdown` method aborts this thread and joins the other threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid statem. A destructor is also added which simply calls `localShutdown`, and we ensure that this function is not called multiple times. Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `localShutdown` method aborts this thread and joins the other threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid statem. A destructor is also added which simply calls `localShutdown`, and we ensure that this function is not called multiple times. Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
…agent" After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `localShutdown` method aborts this thread and joins the other threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid statem. A destructor is also added which simply calls `localShutdown`, and we ensure that this function is not called multiple times. Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `localShutdown` method aborts this thread and joins the other threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid statem. A destructor is also added which simply calls `localShutdown`, and we ensure that this function is not called multiple times. Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
…agent" After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `localShutdown` method aborts this thread and joins the other threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid statem. A destructor is also added which simply calls `localShutdown`, and we ensure that this function is not called multiple times. Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `localShutdown` method aborts this thread and joins the other threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid statem. A destructor is also added which simply calls `localShutdown`, and we ensure that this function is not called multiple times. Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
…agent" After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `localShutdown` method aborts this thread and joins the other threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid statem. A destructor is also added which simply calls `localShutdown`, and we ensure that this function is not called multiple times. Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `localShutdown` method aborts this thread and joins the other threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid statem. A destructor is also added which simply calls `localShutdown`, and we ensure that this function is not called multiple times. Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
…agent" After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `localShutdown` method aborts this thread and joins the other threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid statem. A destructor is also added which simply calls `localShutdown`, and we ensure that this function is not called multiple times. Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `localShutdown` method aborts this thread and joins the other threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid statem. A destructor is also added which simply calls `localShutdown`, and we ensure that this function is not called multiple times. Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
…up agent and refactor join" After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `shutdown` method aborts this and joins the all the threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid state. The new way of shutting down is: ``` rpc.wait_all_workers() rpc.shutdown() ``` Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
…actor join" After changes in the previous PR (#29928) and the gloo code (pytorch/gloo#232), we can now interrupt the listener thread in process group agent, which was blocked on `wait`ing for some work to be received. The `shutdown` method aborts this and joins the all the threads. A condition variable is also added to ensure that we don't abort the thread while `recvWork` is in an invalid state. The new way of shutting down is: ``` rpc.wait_all_workers() rpc.shutdown() ``` Differential Revision: [D5578006](https://our.internmc.facebook.com/intern/diff/D5578006/) [ghstack-poisoned]
Summary:
Closes #227.
Adds 2 methods: abortWaitSend, abortWaitRecv so that we can interrupt a blocking waitRecv or waitSend, per pietern 's suggestion. This is useful if a gloo process is waiting for a message in one thread, but another thread decides to shutdown and wants to stop it gracefully.
The methods set a boolean variable and notify a waiting condition variable. The condition variable predicate is changed to check this condition. The implementation is both for the tcp and uv transport methods. Unit tests are also added.
Differential Revision: D18321454