If a module passed to DistributedDataParallel has no parameter required gradient, expect_sparse_gradient[0] in _ddp_init_helper function will raise error.

## 🐛 Bug

If a module passed to DistributedDataParallel has no parameter required gradient, expect_sparse_gradient[0] in _ddp_init_helper function will raise error.

In `self.criterionPerceptron`, it's set that `for param in xxx.parameters(): param.requires_grad=False`

```
    self.criterionPerceptron = nn.parallel.DistributedDataParallel(self.criterionPerceptron, device_ids=[opt.local_rank], output_device=opt.local_rank)
  File "/opt/miniconda2/lib/python2.7/site-packages/torch/nn/parallel/distributed.py", line 300, in __init__
    self._ddp_init_helper()
  File "/opt/miniconda2/lib/python2.7/site-packages/torch/nn/parallel/distributed.py", line 368, in _ddp_init_helper
    expect_sparse_gradient[0])
RuntimeError: tensors.size() > 0 INTERNAL ASSERT FAILED at pytorch/torch/csrc/distributed/c10d/reducer.cpp:672, please report a bug to PyTorch.  (compute_bucket_assignment_by_size at pytorch/torch/csrc/distributed/c10d/reducer.cpp:672)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x6a (0x7f52bbd5877a in /opt/miniconda2/lib/python2.7/site-packages/torch/lib/libc10.so)
frame #1: c10d::compute_bucket_assignment_by_size(std::vector<at::Tensor, std::allocator<at::Tensor> > const&, std::vector<unsigned long, std::allocator<unsigned long> > const&, std::vector<bool, std::allocator<bool> > const&) + 0x951 (0x7f52d4e3f8a1 in /opt/miniconda2/lib/python2.7/site-packages/torch/lib/libtorch_python.so)
frame #2: <unknown function> + 0x6c2cb1 (0x7f52d4e2fcb1 in /opt/miniconda2/lib/python2.7/site-packages/torch/lib/libtorch_python.so)
frame #3: <unknown function> + 0x6c2f0e (0x7f52d4e2ff0e in /opt/miniconda2/lib/python2.7/site-packages/torch/lib/libtorch_python.so)
frame #4: <unknown function> + 0x1d24f0 (0x7f52d493f4f0 in /opt/miniconda2/lib/python2.7/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #26: __libc_start_main + 0xf0 (0x7f52e2b66830 in /lib/x86_64-linux-gnu/libc.so.6)
frame #27: <unknown function> + 0x107f (0x55e911d9b07f in /opt/miniconda2/bin/python)
```

## To Reproduce

Steps to reproduce the behavior:

1. create a module that does not required gradients
2. pass it to DistributedDataParallel

## Environment
 - PyTorch Version : 1.2+




cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

If a module passed to DistributedDataParallel has no parameter required gradient, expect_sparse_gradient[0] in _ddp_init_helper function will raise error. #25550

🐛 Bug

To Reproduce

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

If a module passed to DistributedDataParallel has no parameter required gradient, expect_sparse_gradient[0] in _ddp_init_helper function will raise error. #25550

Description

🐛 Bug

To Reproduce

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions