C++ Frontend data_parallel Does Not Update Weights

## 🐛 Bug



I have adapted the mnist.cpp example to work with two GPUs using torch::nn::parallel::data_parallel. However, the loss does not decay, and the accuracy never improves beyond 0.114 (the original single GPU example reaches around 0.99 accuracy).

## To Reproduce

Steps to reproduce the behavior:

1. Compile and run the attached example (change mnist_parallel.cpp.txt to mnist_parallel.cpp).
[mnist_parallel.cpp.txt](https://github.com/pytorch/pytorch/files/3100695/mnist_parallel.cpp.txt)
[CMakeLists.txt](https://github.com/pytorch/pytorch/files/3100696/CMakeLists.txt)


## Expected behavior


I would expect that the network could improve accuracy during training, but it does not.
## Environment

Please copy and paste the output from our
[environment collection script](https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py)
(or fill out the checklist below manually).

You can get the script and run it with:
PyTorch version: 1.0.1
Is debug build: No
CUDA used to build PyTorch: 10.0.130

OS: Ubuntu 16.04.6 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
CMake version: version 3.14.0

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 10.0.130
GPU models and configuration: 
GPU 0: GeForce GTX 1080 Ti
GPU 1: GeForce GTX 1080 Ti

Nvidia driver version: 410.104
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.5.0
/usr/local/cuda-10.0/targets/x86_64-linux/lib/libcudnn.so.7

Versions of relevant libraries:
[pip3] numpy==1.15.0
[conda] blas                      1.0                         mkl  
[conda] magma-cuda10              2.4.0                         1    cpbotha
[conda] magma-cuda100             2.5.0                         1    pytorch
[conda] magma-cuda90              2.5.0                         1    pytorch
[conda] mkl                       2019.3                      199  
[conda] mkl-include               2019.3                      199  
[conda] mkl-service               1.1.2            py36he904b0f_5  
[conda] mkl_fft                   1.0.10           py36ha843d7b_0  
[conda] mkl_random                1.0.2            py36hd81dba3_0  
[conda] mkldnn                    0.16.1                        0    mingfeima
[conda] pytorch                   1.0.1           cuda100py36he554f03_0  
[conda] torchvision               0.2.1                    py36_0

## Additional context



If this is not a bug, and I am setting up the code incorrectly, please let me know and I will issue a feature request instead for a good data_parallel C++ example.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

C++ Frontend data_parallel Does Not Update Weights #19540

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

C++ Frontend data_parallel Does Not Update Weights #19540

Description

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions