-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Add RpcAgentOptions struct type, which bundles different required arguments for different RpcAgents
#29972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This pull request was exported from Phabricator. Differential Revision: D18549919 |
3a4e80e to
d64671f
Compare
|
This pull request was exported from Phabricator. Differential Revision: D18549919 |
torch/csrc/distributed/rpc/init.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs for this one as well (can be done in separate PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason this is not using shared_ptr_class_?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't do it on purpose. What is the benefit of using shared_ptr_class_ ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From pybind doc:
If no such holder type template argument is given, the default for a type named Type is std::unique_ptr, which means that the object is deallocated when Python’s reference count goes to zero.
torch/distributed/rpc/api.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason c10d::Store is not quoted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should quote it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
backen -> backend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a pretty long name, but I don't have a better suggestion. Does rpc_agent_options_factory work? If we do that init_backend_handler will become backend_factory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is a backend_factory, it always creates the same type of backend. I can't think of a better name.
d64671f to
c6bd871
Compare
|
This pull request was exported from Phabricator. Differential Revision: D18549919 |
torch/distributed/rpc/__init__.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't init_method be subsumed by rpc_agent_options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pritamdamania87 It's used by all backbends. So we don't want to put it into RpcAgentOptions.
torch/distributed/rpc/__init__.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comment, should self_name and self_rank be name and rank? For example, init_process_group uses rank and not self_rank.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrshenli What do you think? If we all agree. I will do the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for this change (drop self_ in the name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
c6bd871 to
b9fa82e
Compare
|
This pull request was exported from Phabricator. Differential Revision: D18549919 |
b9fa82e to
59aa685
Compare
|
This pull request was exported from Phabricator. Differential Revision: D18549919 |
mrshenli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks!
It looks like CI tests didn't launch properly (only 7 out of ~60 were launched). Please make sure all CI tests pass before landing.
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xush6528 is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
@pytorchbot retest this please |
…rguments for different `RpcAgent`s (pytorch#29972) Summary: pytorch#28226 introduced `worker_to_id` arg to the `def init_rpc` function for other `RpcAgent`. While it's not really used by `ProcessGroupAgent`. Cleanup is wanted for this, as described in pytorch#29031. To adapt to the difference of different `RpcAgent`, adding a `RpcAgentOptions` base classes, which allow leveraging inheritance to add extra fields. closes pytorch#29031 Pull Request resolved: pytorch#29972 Differential Revision: D18549919 Pulled By: xush6528 fbshipit-source-id: 7bea84148341d00bbcb29f9bc72707ba5e6ba238
59aa685 to
28a2e33
Compare
|
This pull request was exported from Phabricator. Differential Revision: D18549919 |
Summary:
#28226 introduced
worker_to_idarg to thedef init_rpcfunction for otherRpcAgent. While it's not really used byProcessGroupAgent. Cleanup is wanted for this, as described in #29031.To adapt to the difference of different
RpcAgent, adding aRpcAgentOptionsbase classes, which allow leveraging inheritance to add extra fields.closes #29031
Differential Revision: D18549919