-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Update multiprocessing note now that shared CUDA tensors are refcounted #19904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should still warn that even this refcounting can't save you if the child process exits!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure I will add a sentence. Do you mean that it can't save you in the sense that
- the refcount is not decremented when child process exits abnormally,
or - the child process will segfault if the sending process exits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added both. Let me know if they aren't what you expected, or are incorrect.
|
@VitalyFedyunin Could you review this PR? |
|
Yes, @VitalyFedyunin, it would be great if you could to take a look. @ssnl poke me about this in a few days if it is still not reviewed by then. |
|
on it |
| as long as the receiving process retains a copy of the tensor. It is implemented | ||
| under the hood but requires users to follow the best practices for the program | ||
| to run correctly. For example, the sending process must stay alive as long as | ||
| the consumer process has references to the tensor, and the refcounting can not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, the sending process must stay alive as long as
the consumer process has references to the tensor.
Not sure if it obvious or worth mentioning, but sending process must stay alive as long as there are non zero elements in the 'outgoing' queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh interesting. What will happen when there are shared CUDA tensor in queues but the sending process exits? Are those not freed by CUDA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, as it is consumer's code which is doing ref-dec. In theory we can add this corner case, but it will complicate things by a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. But if the consumer either (1) eventually dies, or (2) gets the tensor from the queue and it gets GC'ed. The memory is still freed, right?
|
@pytorchbot merge this please |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soumith is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
The mp notes are not updated after #16854. (The torch.multiprocessing page is.)