[CUDA12] Autograd engine use current device only#92354
[CUDA12] Autograd engine use current device only#92354Aidyn-A wants to merge 16 commits intopytorch:masterfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/92354
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 9bfd185: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Continuing from #94929 (comment) |
|
@albanD do you have any comments on this PR? |
albanD
left a comment
There was a problem hiding this comment.
This would deserve a more detailed comment about what it does.
But that only helps if you already initialized cuda on at least one device?
If you're doing such a disruptive change, I guess at this point you might as well just move the set_device into the inner loop after we get work:
pytorch/torch/csrc/autograd/engine.cpp
Line 516 in b0b5f3c
And make sure that when we don't change the device, this will be super cheap to do ?
cc @ngimel
53155f0 to
375569a
Compare
|
@pytorchbot label "topic: not user facing", "ciflow/trunk" |
|
Didn't find following labels among repository labels: topic: not user facing, |
|
@pytorchbot label "topic: not user facing" |
aa3b32d to
76ab85d
Compare
albanD
left a comment
There was a problem hiding this comment.
Thanks for the update the change is at the right place, only a small perf concern
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
This is a device agnostic version #91191. The reason of existence of this PR is device agnostic policy of autograd engine. Hence, the compile time `USE_CUDA` is not supported, so doing something like: https://github.com/pytorch/pytorch/blob/fa1ea9f9bcaa77c1370468059be95ad9b421f500/torch/csrc/autograd/engine.cpp#L351-L357 is not effective. In this PR a check upon CUDA devices in device registry is added such that threads set the same CUDA device. Pull Request resolved: pytorch/pytorch#92354 Approved by: https://github.com/albanD, https://github.com/ngimel
This is a device agnostic version #91191. The reason of existence of this PR is device agnostic policy of autograd engine. Hence, the compile time `USE_CUDA` is not supported, so doing something like: https://github.com/pytorch/pytorch/blob/fa1ea9f9bcaa77c1370468059be95ad9b421f500/torch/csrc/autograd/engine.cpp#L351-L357 is not effective. In this PR a check upon CUDA devices in device registry is added such that threads set the same CUDA device. Pull Request resolved: pytorch/pytorch#92354 Approved by: https://github.com/albanD, https://github.com/ngimel
This is a device agnostic version #91191.
The reason of existence of this PR is device agnostic policy of autograd engine. Hence, the compile time
USE_CUDAis not supported, so doing something like:pytorch/torch/csrc/autograd/engine.cpp
Lines 351 to 357 in fa1ea9f
is not effective.
In this PR a check upon CUDA devices in device registry is added such that threads set the same CUDA device.