-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Correctly share CUDA Parameters. #10220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
ailzhang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ezyang ! This fixes the issue for me. The lint test probably just need a retest.
torch/multiprocessing/reductions.py
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
Previously, the following was true: - If you put a Parameter for a CUDA tensor in multiprocessing queue (or otherwise tried to transfer it), this failed, saying that we cannot pickle CUDA storage. This is issue pytorch#9996. - If you put a leaf Tensor that requires_grad=True through the multiprocessing queue, it would come out the other end as requires_grad=False (It should have come out the other end as requires_grad=True). Similarly, backwards hooks were lost. - If you put a non-leaf Tensor that requires_grad=True through the multiprocessing queue, it would come out the other end as requires_grad=False. The root cause for the first issue was that implementation of reductions for Parameter used the superclass implementation (tensor) in __reduce_ex__, but this always picks up the non-ForkingPickler reduction, which doesn't work with CUDA tensors. So, we registered a new ForkingPickler specifically for Parameter, and adjusted the code to correctly rewrap a Tensor in a Parameter if it was originally a parameter. While working on this, we realized that requires_grad and backwards hooks would not be preserved in the ForkingPickler reduction implementation. We fixed the reducer to save these parameters. However, Adam Paszke pointed out that we shouldn't allow sending requires_grad=True, non-leaf Tensors over a multiprocessing queue, since we don't actually support autograd over process boundar. We now throw an error in this case; this may cause previously working code to fail, but this is easy enough to fix; just detach() the tensor before sending it. The error message says so. Fixes pytorch#9996. Signed-off-by: Edward Z. Yang <[email protected]>
e5b61f2 to
f247c31
Compare
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Signed-off-by: Edward Z. Yang <[email protected]>
Signed-off-by: Edward Z. Yang <[email protected]>
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
colesbury
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can avoid the with open(os.devnull, "w") and just use ctx.SimpleQueue with normal try-except behavior.
test/test_multiprocessing.py
Outdated
| for device in devices: | ||
| var0 = Variable(torch.arange(1., 26, device=device).view(5, 5), requires_grad=True) | ||
| var = var0 * 2 | ||
| # We can't do the pickling indirectly, e.g., with a queue.put, |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
test/test_multiprocessing.py
Outdated
| devices.append('cuda') | ||
| for device in devices: | ||
| for requires_grad in [True, False]: | ||
| var = Variable(torch.arange(1., 26, device=device).view(5, 5), requires_grad=requires_grad) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
Signed-off-by: Edward Z. Yang <[email protected]>
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary:
```
Correctly share CUDA Parameters, requires_grad and hooks.
Previously, the following was true:
- If you put a Parameter for a CUDA tensor
in multiprocessing queue (or otherwise tried to transfer it),
this failed, saying that we cannot pickle CUDA storage.
This is issue pytorch#9996.
- If you put a leaf Tensor that requires_grad=True through the
multiprocessing queue, it would come out the other end as
requires_grad=False (It should have come out the other end
as requires_grad=True). Similarly, backwards hooks were
lost.
- If you put a non-leaf Tensor that requires_grad=True through
the multiprocessing queue, it would come out the other end
as requires_grad=False.
The root cause for the first issue was that implementation of
reductions for Parameter used the superclass implementation
(tensor) in __reduce_ex__, but this always picks up the
non-ForkingPickler reduction, which doesn't work with CUDA tensors.
So, we registered a new ForkingPickler specifically for Parameter,
and adjusted the code to correctly rewrap a Tensor in a Parameter
if it was originally a parameter.
While working on this, we realized that requires_grad and backwards
hooks would not be preserved in the ForkingPickler reduction
implementation. We fixed the reducer to save these parameters.
However, Adam Paszke pointed out that we shouldn't allow sending
requires_grad=True, non-leaf Tensors over a multiprocessing
queue, since we don't actually support autograd over process
boundar. We now throw an error in this case; this may cause
previously working code to fail, but this is easy enough to fix;
just detach() the tensor before sending it. The error message says
so.
Fixes pytorch#9996.
```
Pull Request resolved: pytorch#10220
Differential Revision: D9160746
Pulled By: ezyang
fbshipit-source-id: a39c0dbc012ba5afc7a9e646da5c7f325b3cf05c
Uh oh!
There was an error while loading. Please reload this page.