Refactor core DistributedDataParallel tests #20235

pietern · 2019-05-07T19:25:45Z

Stack:
    :white_circle: #20236 Make DistributedDataParallel usable with CPU models  💚
    :black_circle: #20235 Refactor core DistributedDataParallel tests  💚
    :white_circle: #20234 Add c10d::broadcast_coalesced and tests  💚

The tests expected to only run for CUDA models. In a future commit we
need to update this to work for CPU models as well. Therefore, we can
no longer rely on only integers being passed for device identifiers.
With this change we pass both the materialized list of devices to use
(as torch.Device objects), as well as an optional list of integers.
The latter is specified to exercise the code in the
DistributedDataParallel constructor that turns a list of integers into
CUDA devices, IFF it is used to wrap a single-device CUDA module.

This commit also groups together the 'str' and non-'str' tests. These
used to test passing the list of devices as integers or as
torch.Device instances. These are now executed from the same test.

Differential Revision: D15245429

Differential Revision: D15245429 Differential Version: 81320055

mrshenli

If device_ids is indeed unnecessary, shall we consolidate?

mrshenli · 2019-05-07T19:44:58Z

test/test_c10d.py

-            copy.deepcopy(model).cuda(gpus[0]),
-            device_ids=gpus,
+            copy.deepcopy(model).to(devices[0]),
+            device_ids=device_ids,


device_ids in DDP should be able to directly handle devices, as we wrote:

device_ids (list of int or torch.device): CUDA devices. This should ...

The existing tests exercised both and I didn't want to change that. We should test both the list of integers and list of torch.device cases in the suite (it could be broken out as a separate lightweight test instead of performing a full end to end tests).

Differential Revision: D15245429 Differential Version: 81326667

Differential Revision: D15245429 Differential Version: 81330801

Differential Revision: D15245429 Differential Version: 81461821

mrshenli · 2019-05-09T20:58:10Z

@pytorchbot retest this please

facebook-github-bot · 2019-05-09T22:35:53Z

This pull request has been merged in f32c9bd.

V1: Initial commit

eb9e637

Differential Revision: D15245429 Differential Version: 81320055

pietern requested review from apaszke and mrshenli as code owners May 7, 2019 19:25

pytorchbot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label May 7, 2019

This was referenced May 7, 2019

Add c10d::broadcast_coalesced and tests #20234

Closed

Make DistributedDataParallel usable with CPU models #20236

Closed

mrshenli approved these changes May 7, 2019

View reviewed changes

pietern added 3 commits May 7, 2019 13:40

V2: Address comments

d43b44b

Differential Revision: D15245429 Differential Version: 81326667

V3: Fix many-GPU tests

b422c36

Differential Revision: D15245429 Differential Version: 81330801

V4: Merge with parent diff changes

3fdd1bd

Differential Revision: D15245429 Differential Version: 81461821

facebook-github-bot closed this in f32c9bd May 9, 2019

facebook-github-bot added the merged label May 9, 2019

pietern deleted the export-D15245429 branch May 9, 2019 22:37

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor core DistributedDataParallel tests #20235

Refactor core DistributedDataParallel tests #20235

Uh oh!

pietern commented May 7, 2019 •

edited

Loading

Uh oh!

mrshenli left a comment

Uh oh!

mrshenli May 7, 2019

Uh oh!

pietern May 7, 2019 •

edited

Loading

Uh oh!

mrshenli commented May 9, 2019

Uh oh!

facebook-github-bot commented May 9, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Refactor core DistributedDataParallel tests #20235

Refactor core DistributedDataParallel tests #20235

Uh oh!

Conversation

pietern commented May 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mrshenli left a comment

Choose a reason for hiding this comment

Uh oh!

mrshenli May 7, 2019

Choose a reason for hiding this comment

Uh oh!

pietern May 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mrshenli commented May 9, 2019

Uh oh!

facebook-github-bot commented May 9, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

pietern commented May 7, 2019 •

edited

Loading

pietern May 7, 2019 •

edited

Loading