Correctly share CUDA Parameters. #10220

ezyang · 2018-08-03T20:45:47Z

    Correctly share CUDA Parameters, requires_grad and hooks.
    
    Previously, the following was true:
    
    - If you put a Parameter for a CUDA tensor
      in multiprocessing queue (or otherwise tried to transfer it),
      this failed, saying that we cannot pickle CUDA storage.
      This is issue #9996.
    
    - If you put a leaf Tensor that requires_grad=True through the
      multiprocessing queue, it would come out the other end as
      requires_grad=False (It should have come out the other end
      as requires_grad=True).  Similarly, backwards hooks were
      lost.
    
    - If you put a non-leaf Tensor that requires_grad=True through
      the multiprocessing queue, it would come out the other end
      as requires_grad=False.
    
    The root cause for the first issue was that implementation of
    reductions for Parameter used the superclass implementation
    (tensor) in __reduce_ex__, but this always picks up the
    non-ForkingPickler reduction, which doesn't work with CUDA tensors.
    So, we registered a new ForkingPickler specifically for Parameter,
    and adjusted the code to correctly rewrap a Tensor in a Parameter
    if it was originally a parameter.
        
    While working on this, we realized that requires_grad and backwards
    hooks would not be preserved in the ForkingPickler reduction
    implementation.  We fixed the reducer to save these parameters.
    However, Adam Paszke pointed out that we shouldn't allow sending
    requires_grad=True, non-leaf Tensors over a multiprocessing
    queue, since we don't actually support autograd over process
    boundar.  We now throw an error in this case; this may cause
    previously working code to fail, but this is easy enough to fix;
    just detach() the tensor before sending it.  The error message says
    so.
    
    Fixes #9996.

facebook-github-bot

ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ailzhang

Thanks @ezyang ! This fixes the issue for me. The lint test probably just need a retest.

torch/multiprocessing/reductions.py

Previously, the following was true: - If you put a Parameter for a CUDA tensor in multiprocessing queue (or otherwise tried to transfer it), this failed, saying that we cannot pickle CUDA storage. This is issue pytorch#9996. - If you put a leaf Tensor that requires_grad=True through the multiprocessing queue, it would come out the other end as requires_grad=False (It should have come out the other end as requires_grad=True). Similarly, backwards hooks were lost. - If you put a non-leaf Tensor that requires_grad=True through the multiprocessing queue, it would come out the other end as requires_grad=False. The root cause for the first issue was that implementation of reductions for Parameter used the superclass implementation (tensor) in __reduce_ex__, but this always picks up the non-ForkingPickler reduction, which doesn't work with CUDA tensors. So, we registered a new ForkingPickler specifically for Parameter, and adjusted the code to correctly rewrap a Tensor in a Parameter if it was originally a parameter. While working on this, we realized that requires_grad and backwards hooks would not be preserved in the ForkingPickler reduction implementation. We fixed the reducer to save these parameters. However, Adam Paszke pointed out that we shouldn't allow sending requires_grad=True, non-leaf Tensors over a multiprocessing queue, since we don't actually support autograd over process boundar. We now throw an error in this case; this may cause previously working code to fail, but this is easy enough to fix; just detach() the tensor before sending it. The error message says so. Fixes pytorch#9996. Signed-off-by: Edward Z. Yang <[email protected]>

ezyang · 2018-08-06T16:37:29Z

@ailzhang @fmassa @apaszke OK, the patch has been updated to fix another bug with multiprocessing requires_grad sharing, and the code is a bit clearer now.

facebook-github-bot

ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Signed-off-by: Edward Z. Yang <[email protected]>

facebook-github-bot

ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

colesbury

I think you can avoid the with open(os.devnull, "w") and just use ctx.SimpleQueue with normal try-except behavior.

test/test_multiprocessing.py

+        for device in devices:
+            var0 = Variable(torch.arange(1., 26, device=device).view(5, 5), requires_grad=True)
+            var = var0 * 2
+            # We can't do the pickling indirectly, e.g., with a queue.put,


test/test_multiprocessing.py

+            devices.append('cuda')
+        for device in devices:
+            for requires_grad in [True, False]:
+                var = Variable(torch.arange(1., 26, device=device).view(5, 5), requires_grad=requires_grad)


Signed-off-by: Edward Z. Yang <[email protected]>

facebook-github-bot

ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: ``` Correctly share CUDA Parameters, requires_grad and hooks. Previously, the following was true: - If you put a Parameter for a CUDA tensor in multiprocessing queue (or otherwise tried to transfer it), this failed, saying that we cannot pickle CUDA storage. This is issue pytorch#9996. - If you put a leaf Tensor that requires_grad=True through the multiprocessing queue, it would come out the other end as requires_grad=False (It should have come out the other end as requires_grad=True). Similarly, backwards hooks were lost. - If you put a non-leaf Tensor that requires_grad=True through the multiprocessing queue, it would come out the other end as requires_grad=False. The root cause for the first issue was that implementation of reductions for Parameter used the superclass implementation (tensor) in __reduce_ex__, but this always picks up the non-ForkingPickler reduction, which doesn't work with CUDA tensors. So, we registered a new ForkingPickler specifically for Parameter, and adjusted the code to correctly rewrap a Tensor in a Parameter if it was originally a parameter. While working on this, we realized that requires_grad and backwards hooks would not be preserved in the ForkingPickler reduction implementation. We fixed the reducer to save these parameters. However, Adam Paszke pointed out that we shouldn't allow sending requires_grad=True, non-leaf Tensors over a multiprocessing queue, since we don't actually support autograd over process boundar. We now throw an error in this case; this may cause previously working code to fail, but this is easy enough to fix; just detach() the tensor before sending it. The error message says so. Fixes pytorch#9996. ``` Pull Request resolved: pytorch#10220 Differential Revision: D9160746 Pulled By: ezyang fbshipit-source-id: a39c0dbc012ba5afc7a9e646da5c7f325b3cf05c

ezyang requested a review from ailzhang August 3, 2018 20:45

ezyang requested review from apaszke, colesbury, gchanan, soumith and zdevito as code owners August 3, 2018 20:45

ezyang mentioned this pull request Aug 3, 2018

Error Sharing CUDA Models between Processes using torch.multiprocessing #9996

Closed

facebook-github-bot reviewed Aug 3, 2018

View reviewed changes

ailzhang approved these changes Aug 3, 2018

View reviewed changes

fmassa reviewed Aug 3, 2018

View reviewed changes

torch/multiprocessing/reductions.py Outdated

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

ezyang force-pushed the pr/fix-cuda-parameter-sharing branch from e5b61f2 to f247c31 Compare August 6, 2018 16:36

facebook-github-bot reviewed Aug 6, 2018

View reviewed changes

ezyang added 2 commits August 6, 2018 11:05

Properly gate spawn tests.

e82a0d3

Signed-off-by: Edward Z. Yang <[email protected]>

try harder

a6213c3

Signed-off-by: Edward Z. Yang <[email protected]>

facebook-github-bot reviewed Aug 7, 2018

View reviewed changes

ezyang mentioned this pull request Aug 7, 2018

fix serialization of nn.Parameter with dill #10296

Closed

colesbury approved these changes Aug 9, 2018

View reviewed changes

CR comments.

d3c0554

Signed-off-by: Edward Z. Yang <[email protected]>

facebook-github-bot reviewed Aug 10, 2018

View reviewed changes

facebook-github-bot closed this in 674f7a9 Aug 10, 2018

ezyang added the merged label Jun 26, 2019

Correctly share CUDA Parameters. #10220

Correctly share CUDA Parameters. #10220

Uh oh!

Conversation

ezyang commented Aug 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

ailzhang left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

ezyang commented Aug 6, 2018

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

colesbury left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ezyang commented Aug 3, 2018 •

edited

Loading