Update multiprocessing note now that shared CUDA tensors are refcounted #19904

ssnl · 2019-04-29T04:28:42Z

The mp notes are not updated after #16854. (The torch.multiprocessing page is.)

apaszke · 2019-04-30T12:47:17Z

docs/source/notes/multiprocessing.rst

We should still warn that even this refcounting can't save you if the child process exits!

Sure I will add a sentence. Do you mean that it can't save you in the sense that

the refcount is not decremented when child process exits abnormally,
or

the child process will segfault if the sending process exits?

I added both. Let me know if they aren't what you expected, or are incorrect.

ssnl · 2019-05-04T06:34:32Z

@VitalyFedyunin Could you review this PR?

ezyang · 2019-05-06T14:42:38Z

Yes, @VitalyFedyunin, it would be great if you could to take a look. @ssnl poke me about this in a few days if it is still not reviewed by then.

VitalyFedyunin · 2019-05-06T15:23:30Z

on it

VitalyFedyunin · 2019-05-06T15:29:23Z

docs/source/notes/multiprocessing.rst

+as long as the receiving process retains a copy of the tensor. It is implemented
+under the hood but requires users to follow the best practices for the program
+to run correctly. For example, the sending process must stay alive as long as
+the consumer process has references to the tensor, and the refcounting can not


For example, the sending process must stay alive as long as
the consumer process has references to the tensor.

Not sure if it obvious or worth mentioning, but sending process must stay alive as long as there are non zero elements in the 'outgoing' queue.

Oh interesting. What will happen when there are shared CUDA tensor in queues but the sending process exits? Are those not freed by CUDA?

No, as it is consumer's code which is doing ref-dec. In theory we can add this corner case, but it will complicate things by a lot.

I see. But if the consumer either (1) eventually dies, or (2) gets the tensor from the queue and it gets GC'ed. The memory is still freed, right?

ssnl · 2019-05-14T20:38:18Z

@pytorchbot merge this please

facebook-github-bot

@soumith is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2019-05-26T01:02:59Z

@soumith merged this pull request in 83fe928.

ssnl requested a review from VitalyFedyunin April 29, 2019 04:28

pytorchbot added the module: docs Related to our documentation, both in docs/ and docblocks label Apr 29, 2019

soumith approved these changes Apr 29, 2019

View reviewed changes

apaszke reviewed Apr 30, 2019

View reviewed changes

ssnl mentioned this pull request May 2, 2019

Forking is not possible anymore when using any PyTorch version from about 2 months ago #20048

Closed

ssnl added 3 commits May 3, 2019 21:43

Update multiprocessing note now that shared CUDA tensors are refcounted

e452a7e

grammar

956aa0b

address comments

79dd56c

ssnl force-pushed the cuda-multi-doc branch from 62c28fe to 79dd56c Compare May 4, 2019 02:01

VitalyFedyunin reviewed May 6, 2019

View reviewed changes

VitalyFedyunin approved these changes May 6, 2019

View reviewed changes

pytorchbot added the merge-this-please Was marked for merge with @pytorchbot merge this please label May 14, 2019

facebook-github-bot reviewed May 26, 2019

View reviewed changes

facebook-github-bot closed this in 83fe928 May 26, 2019

facebook-github-bot added the merged label May 26, 2019

ssnl deleted the cuda-multi-doc branch May 27, 2019 05:22

ezyang added the open source label Jun 24, 2019

Update multiprocessing note now that shared CUDA tensors are refcounted #19904

Update multiprocessing note now that shared CUDA tensors are refcounted #19904

Uh oh!

Conversation

ssnl commented Apr 29, 2019

Uh oh!

apaszke Apr 30, 2019

Choose a reason for hiding this comment

Uh oh!

ssnl May 4, 2019

Choose a reason for hiding this comment

Uh oh!

ssnl May 4, 2019

Choose a reason for hiding this comment

Uh oh!

ssnl commented May 4, 2019

Uh oh!

ezyang commented May 6, 2019

Uh oh!

VitalyFedyunin commented May 6, 2019

Uh oh!

VitalyFedyunin May 6, 2019

Choose a reason for hiding this comment

Uh oh!

ssnl May 6, 2019

Choose a reason for hiding this comment

Uh oh!

VitalyFedyunin May 6, 2019

Choose a reason for hiding this comment

Uh oh!

ssnl May 6, 2019

Choose a reason for hiding this comment

Uh oh!

ssnl commented May 14, 2019

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented May 26, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants