-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Support send/recv for the gloo process group #11387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to store srcRank in work, let's also do that in MPI too
torch/lib/c10d/ProcessGroupGloo.cpp
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/lib/c10d/ProcessGroupGloo.cpp
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/lib/c10d/ProcessGroupGloo.hpp
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
@teng-li In the MPI backend this is not needed as we capture it in the lambda. For the gloo backend we need to capture it because we don't have a "run" lambda. The operation gets kicked off and completes on the gloo device thread, not on the c10d thread pool. Then the returned work object can call Updated to use |
2a9777e to
dee3c39
Compare
|
build still failing |
|
The gloo bump surfaces a build issue with GCC 7. It's missing the include. Submitted pytorch/gloo#131 to fix. |
dee3c39 to
1c1b782
Compare
|
Updated gloo submodule to include the GCC 7 fix. |
teng-li
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Watch for the CI
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pietern has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pietern is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: This change removes the skips for the existing send/recv tests in the backwards compatibility layer. Pull Request resolved: pytorch#11387 Reviewed By: teng-li Differential Revision: D9729330 Pulled By: pietern fbshipit-source-id: f8899219a94d806386d03e9ef53bff622d8658a3
This change removes the skips for the existing send/recv tests in the backwards compatibility layer.