[DO NOT MERGE] [CUDA12] Conditionally set device in device guard by Aidyn-A · Pull Request #91219 · pytorch/pytorch

Aidyn-A · 2022-12-21T00:46:38Z

CUDA 12 introduces behavioral changes in cudaSetDevice. In the old version it would just set the device to be used for kernel launches and memory allocations without creating a CUDA context. Now, in CUDA 12, every time cudaSetDevice is called for the first time it creates a CUDA context. See issue #91122.
This PR introduces a workaround for cases like:

import torch
x = torch.randn(1, device="cuda:1")

in which the primary context would be allocated on devices 0 and 1 because device 0 the default one.
The real danger would have happened in distributed jobs where all ranks set device to 0 in the destructor of CUDA guard creating nproc_per_node CUDA contexts.

pytorch-bot · 2022-12-21T00:46:40Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91219

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Failures

As of commit 790d2d9:

NEW FAILURES - The following jobs have failed:

linux-bionic-cuda11.7-py3.10-gcc7-bazel-test / build-and-test

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…set_device

Do the logic if CUDA_VERSION >= 12000

…set_device

Aidyn-A · 2023-01-17T18:52:46Z

cc @ngimel

lezcano · 2023-01-30T10:29:55Z

This is the only PR left to close #91122. PTAL @ngimel

…set_device

Aidyn-A · 2023-02-15T00:50:41Z

The usage of aten in c10 is not preferred. Would it make more sense to use CUDA Driver API in c10 like here #94864?

jjsjann123 · 2023-02-15T09:11:59Z

looks like c10 bzl rule is still setting USE_CUDA, while cmake is not using that any more. Seems like a consistency issue that we should patch and remove?! https://github.com/pytorch/pytorch/blob/master/c10/cuda/build.bzl#L28

cc'ing @vors

conditionally set device

cf79478

pytorchbot added the open source label Dec 21, 2022

Aidyn-A added 6 commits December 21, 2022 11:13

check if use_cuda

57fd4b2

Merge branch 'pytorch:master' into cuda12_device_guard_conditionally_…

dcee0e2

…set_device

check if use_cuda

3178c21

Merge branch 'pytorch:master' into cuda12_device_guard_conditionally_…

f22ab34

…set_device

Update InlineDeviceGuard.h

4ee842a

Do the logic if CUDA_VERSION >= 12000

Merge branch 'pytorch:master' into cuda12_device_guard_conditionally_…

d1d398b

…set_device

eqy mentioned this pull request Jan 17, 2023

[CUDA][CUDA 12] CUDA 12 Support Tracking Issue #91122

Closed

5 tasks

Update InlineDeviceGuard.h

9b5a20d

Aidyn-A marked this pull request as ready for review January 17, 2023 18:52

bdhirsh requested a review from ngimel January 18, 2023 12:43

bdhirsh added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 18, 2023

Aidyn-A changed the title ~~[CUDA12] Conditionally set device in device guard~~ [DO NOT MERGE] [CUDA12] Conditionally set device in device guard Jan 30, 2023

Aidyn-A marked this pull request as draft January 30, 2023 20:36

Aidyn-A added 12 commits February 1, 2023 08:52

update for BUILD_TEST

f2de9f9

Merge branch 'pytorch:master' into cuda12_device_guard_conditionally_…

2570458

…set_device

update build

66cc86f

update edge

fc7ee1f

fix ~InlineDeviceGuard

4d2c970

fix ifdef

c5e7ad3

fix windows build

0aed968

fix windows build

03d3d5c

update windows build

fae0f70

fix windows build

13509fa

fux windows build

dc02128

fix windows build

a43a049

fux windows build

f2eaff9

atalman added the with-ssh label Feb 6, 2023

Aidyn-A force-pushed the cuda12_device_guard_conditionally_set_device branch from 32819e7 to f2eaff9 Compare February 9, 2023 22:58

Aidyn-A added 7 commits February 12, 2023 20:01

fix windows build

53ab643

Merge branch 'pytorch:master' into cuda12_device_guard_conditionally_…

916162d

…set_device

fix windows build

7b98c9b

fix windows build

8e4497a

fix bazel build

8cf5faa

fix baze build

e8e1ab1

fix bazel build

790d2d9

Aidyn-A closed this Apr 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DO NOT MERGE] [CUDA12] Conditionally set device in device guard#91219

[DO NOT MERGE] [CUDA12] Conditionally set device in device guard#91219
Aidyn-A wants to merge 28 commits intopytorch:masterfrom
Aidyn-A:cuda12_device_guard_conditionally_set_device

Aidyn-A commented Dec 21, 2022 •

edited

Loading

Uh oh!

pytorch-bot bot commented Dec 21, 2022 •

edited

Loading

Uh oh!

Aidyn-A commented Jan 17, 2023

Uh oh!

lezcano commented Jan 30, 2023

Uh oh!

Aidyn-A commented Feb 15, 2023 •

edited

Loading

Uh oh!

jjsjann123 commented Feb 15, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

Aidyn-A commented Dec 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Dec 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91219

❌ 1 Failures

Uh oh!

Aidyn-A commented Jan 17, 2023

Uh oh!

lezcano commented Jan 30, 2023

Uh oh!

Aidyn-A commented Feb 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jjsjann123 commented Feb 15, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Aidyn-A commented Dec 21, 2022 •

edited

Loading

pytorch-bot bot commented Dec 21, 2022 •

edited

Loading

Aidyn-A commented Feb 15, 2023 •

edited

Loading