Skip to content

Conversation

@xush6528
Copy link
Contributor

@xush6528 xush6528 commented Nov 16, 2019

Summary:

#28226 introduced worker_to_id arg to the def init_rpc function for other RpcAgent. While it's not really used by ProcessGroupAgent. Cleanup is wanted for this, as described in #29031.

To adapt to the difference of different RpcAgent, adding a RpcAgentOptions base classes, which allow leveraging inheritance to add extra fields.

closes #29031

Differential Revision: D18549919

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D18549919

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D18549919

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs for this one as well (can be done in separate PR)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason this is not using shared_ptr_class_?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't do it on purpose. What is the benefit of using shared_ptr_class_ ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From pybind doc:

If no such holder type template argument is given, the default for a type named Type is std::unique_ptr, which means that the object is deallocated when Python’s reference count goes to zero.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason c10d::Store is not quoted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should quote it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backen -> backend

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a pretty long name, but I don't have a better suggestion. Does rpc_agent_options_factory work? If we do that init_backend_handler will become backend_factory.

Copy link
Contributor Author

@xush6528 xush6528 Nov 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is a backend_factory, it always creates the same type of backend. I can't think of a better name.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D18549919

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't init_method be subsumed by rpc_agent_options?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pritamdamania87 It's used by all backbends. So we don't want to put it into RpcAgentOptions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment, should self_name and self_rank be name and rank? For example, init_process_group uses rank and not self_rank.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrshenli What do you think? If we all agree. I will do the change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for this change (drop self_ in the name)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D18549919

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D18549919

Copy link
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

It looks like CI tests didn't launch properly (only 7 out of ~60 were launched). Please make sure all CI tests pass before landing.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xush6528 is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@xush6528
Copy link
Contributor Author

@pytorchbot retest this please

…rguments for different `RpcAgent`s (pytorch#29972)

Summary:
pytorch#28226 introduced `worker_to_id` arg to the `def init_rpc` function for other `RpcAgent`. While it's not really used by `ProcessGroupAgent`. Cleanup is wanted for this, as described in pytorch#29031.

To adapt to the difference of different `RpcAgent`, adding a `RpcAgentOptions` base classes, which allow leveraging inheritance to add extra fields.

closes pytorch#29031
Pull Request resolved: pytorch#29972

Differential Revision: D18549919

Pulled By: xush6528

fbshipit-source-id: 7bea84148341d00bbcb29f9bc72707ba5e6ba238
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D18549919

@facebook-github-bot
Copy link
Contributor

@xush6528 merged this pull request in 21dc1d4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove worker_name_to_id from rpc init API

5 participants