Skip to content

Error Sharing CUDA Models between Processes using torch.multiprocessing #9996

@yangky11

Description

@yangky11

I got a runtime error when trying to share a CUDA model with a child process.

The code I was running:

import torch
import torch.multiprocessing as mp
import torch.nn as nn


def train(model):
  print(model)
  # do something..


if __name__ == '__main__':
    model = nn.Linear(10, 1)
    model.cuda()
    model.share_memory()

    mp.set_start_method('spawn')
    p = mp.Process(target=train, args=(model,))
    p.start()
    p.join()

The error traceback:

Traceback (most recent call last):
  File "test.py", line 18, in <module>
    p.start()
  File "/home/yangky/anaconda3/lib/python3.6/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/home/yangky/anaconda3/lib/python3.6/multiprocessing/context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/home/yangky/anaconda3/lib/python3.6/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/home/yangky/anaconda3/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/home/yangky/anaconda3/lib/python3.6/multiprocessing/popen_fork.py", line 26, in __init__
    self._launch(process_obj)
  File "/home/yangky/anaconda3/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/home/yangky/anaconda3/lib/python3.6/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "/home/yangky/anaconda3/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 179, in reduce_storage
    raise RuntimeError("Cannot pickle CUDA storage; try pickling a CUDA tensor instead")
RuntimeError: Cannot pickle CUDA storage; try pickling a CUDA tensor instead

System Info

PyTorch version: 0.4.1
Is debug build: No
CUDA used to build PyTorch: 8.0.61

OS: CentOS Linux 7 (Core)
GCC version: (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11)
CMake version: version 3.9.4

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 8.0.61
GPU models and configuration: GPU 0: GeForce GTX 1080 Ti
Nvidia driver version: 384.69
cuDNN version: Could not collect

Versions of relevant libraries:
[pip] numpy (1.14.3)
[pip] numpydoc (0.7.0)
[pip] torch (0.4.1)
[pip] torchvision (0.2.1)
[conda] cuda80 1.0 h205658b_0 pytorch
[conda] magma-cuda80 2.3.0 1 pytorch
[conda] pytorch 0.4.1 py36_cuda8.0.61_cudnn7.1.2_1 [cuda80] pytorch
[conda] torchvision 0.2.1 py36_1 pytorch

cc @ezyang @gchanan @zou3519 @ngimel

Metadata

Metadata

Assignees

No one assigned

    Labels

    high prioritymodule: cudaRelated to torch.cuda, and CUDA support in generalmodule: multiprocessingRelated to torch.multiprocessingtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions