Skip to content

Conversation

@teng-li
Copy link
Contributor

@teng-li teng-li commented Sep 6, 2018

Fixed a few bugs that were not tested in the c10d frontend APIs, including
get_rank, get_world_size, and destroy_process_group of a given group.

These APIs are added to the CI tests.

Also added all the group related tests, including full-group, and partial groups (existing ones), since both will hit different code paths.

Also removed experimental APIs for c10d initially used in DDP, now we don't use it anyway.

@teng-li teng-li added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Sep 6, 2018
@teng-li teng-li requested a review from pietern September 6, 2018 02:10
@teng-li teng-li changed the title [c10d] Full-fledged group testings and bug fixes for c10d frontend APIs [c10d] Full-fledged group testings and fixes for c10d frontend APIs Sep 6, 2018
@teng-li teng-li force-pushed the more_test branch 6 times, most recently from 50127ec to 093f5bd Compare September 6, 2018 03:38
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

teng-li has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@teng-li
Copy link
Contributor Author

teng-li commented Sep 6, 2018

Test failures are not related

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

teng-li is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

PenghuiCheng pushed a commit to PenghuiCheng/pytorch that referenced this pull request Sep 11, 2018
…#11318)

Summary:
Fixed a few bugs that were not tested in the c10d frontend APIs, including
get_rank, get_world_size, and destroy_process_group of a given group.

These APIs are added to the CI tests.

Also added all the group related tests, including full-group, and partial groups (existing ones), since both will hit different code paths.

Also removed experimental APIs for c10d initially used in DDP, now we don't use it anyway.
Pull Request resolved: pytorch#11318

Reviewed By: pietern

Differential Revision: D9675896

Pulled By: teng-li

fbshipit-source-id: a2eac2c57933effa2d139855f786e64919a95bfc
@ezyang ezyang added the merged label Jun 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

oncall: distributed Add this issue/PR to distributed oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants