DistributedDataParallel non-floating point dtype parameter with requires_grad=False

## 🐛 Bug

1. Using DistributedDataParallel
2. on a model that has at-least one non-floating point dtype parameter with requires_grad=False
3. with a WORLD_SIZE <= nGPUs/2 on the machine

results in an error "Only Tensors of floating point dtype can require gradients".

## To Reproduce

Steps to reproduce the behavior:

1. Use a machine which has at least 4 GPUs
2. Build pytorch from source for python3.6 OR use one of the available docker images.
3. Run the following command: "BACKEND=nccl WORLD_SIZE=2 TEMP_DIR=/tmp python3.6 test_distributed.py --verbose TestDistBackend.test_DistributedDataParallel"

The model used in the test has a `long` (`int64`) parameter with `requires_grad=False`: https://github.com/pytorch/pytorch/blob/master/test/test_distributed.py#L59

On a ROCm build of PyTorch, I get the below stack trace (although this issue isn't ROCm-specific):
```
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "test_distributed.py", line 2097, in _run
    getattr(self, self.id().split(".")[2])()
  File "test_distributed.py", line 2023, in wrapper
    fn(self)
  File "test_distributed.py", line 117, in wrapper
    return func(*args, **kwargs)
  File "test_distributed.py", line 133, in wrapper
    return func(*args, **kwargs)
  File "test_distributed.py", line 1849, in test_DistributedDataParallel
    self._test_DistributedDataParallel(gpu_subset=gpus, rank=rank)
  File "test_distributed.py", line 1784, in _test_DistributedDataParallel
    model_DDP, device_ids=gpu_subset
  File "/root/.local/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 305, in __init__
    self._ddp_init_helper()
  File "/root/.local/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 323, in _ddp_init_helper
    self._module_copies = replicate(self.module, self.device_ids, detach=True)
  File "/root/.local/lib/python3.6/site-packages/torch/nn/parallel/replicate.py", line 147, in replicate
    setattr(replica, key, Parameter(param))
  File "/root/.local/lib/python3.6/site-packages/torch/nn/parameter.py", line 26, in __new__
    return torch.Tensor._make_subclass(cls, data, requires_grad)
RuntimeError: Only Tensors of floating point dtype can require gradients
FAIL
```

## Expected behavior

Test should pass.

## Environment

```
Collecting environment information...
PyTorch version: 1.4.0a0+b8f50d9
Is debug build: No
CUDA used to build PyTorch: Could not collect

OS: Ubuntu 16.04.5 LTS
GCC version: Could not collect
CMake version: version 3.6.3

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect
ROCm version: 2.10

Versions of relevant libraries:
[pip3] numpy==1.17.4
[pip3] torch==1.4.0a0+b8f50d9
[pip3] torchvision==0.4.2
[conda] Could not collect
```


cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @xush6528 @osalpekar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DistributedDataParallel non-floating point dtype parameter with requires_grad=False #32018

🐛 Bug

To Reproduce

Expected behavior

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DistributedDataParallel non-floating point dtype parameter with requires_grad=False #32018

Description

🐛 Bug

To Reproduce

Expected behavior

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions