Introduce a device-agnostic runtime API design #132204

guangyey · 2024-07-31T02:17:37Z

Stack from ghstack (oldest at bottom):

Motivation

According to [RFC]A device-agnostic Python runtime API design for stream-based accelerators, this PR intends to introduce a device-agnostic runtime API design.
I personally prefer the Simple Version APIs that no longer accept the device type as an input argument. It means we will leverage getAccelerator to fetch the current accelerator. And it is flexible to expand these APIs to handle multiple types of accelerator scenarios. The design does NOT break the previous design philosophies.
I also believe that namespace torch.accelerator is better. It lets users know that the APIs they are calling are running on an accelerator rather than CPU. This is important. Meanwhile, we can follow a simple API design principle:

Device-agnostic APIs should be placed under the torch.accelerator namespace and not accept a device_type optional parameter.
Device-specific APIs should be placed under device-specific submodules.
APIS required by both CPU and accelerators should be placed under the torch namespace and accept a device_type optional parameter.

Also, I list the pros and cons of Simple Version here:
Pros:

torch.accelerator.foo will have the same input argument as torch.xxx.foo, bringing a better user experience;
more concise, facilitate the developer to write a device-agnostic code.

Cons:

no obvious drawbacks.

Additional Context

I list the new APIs here:

torch.accelerator.is_available() -> bool:
torch.accelerator.current_accelerator() -> torch.device:
torch.accelerator.device_count() -> int:
torch.accelerator.current_device_idx() -> int:
torch.accelerator.set_device_idx(device: Union[torch.device, str, int, None]) -> None:
torch.accelerator.current_stream(device: Union[torch.device, str, int, None]) -> torch.Stream:
torch.accelerator.set_stream(stream: torch.Stream) -> None:
torch.accelerator.synchronize(device: Union[torch.device, str, int, None]) -> None:

According to the discussion with Alban, we decide to change the API name set_device to set_device_idx and current_device to current_device_idx for more explicit. And will submit other PR to support device and stream context manager.

cc @albanD @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

pytorch-bot · 2024-07-31T02:17:40Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/132204

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit 202d5de with merge base 0efa590 ():

NEW FAILURE - The following job has failed:

xpu / win-vs2022-xpu-py3 / build (gh)
ninja: build stopped: subcommand failed

FLAKY - The following job failed but was likely due to flakiness present on trunk:

Mac MPS / macos-py3-arm64-mps / test (mps, 1, 1, macos-m2-15) (gh) (detected as infra flaky with no runner)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 197d7d9 Pull Request resolved: #132204

[ghstack-poisoned]

ghstack-source-id: c800335 Pull Request resolved: #132204

[ghstack-poisoned]

albanD · 2024-10-24T20:32:38Z

Thanks for taking the time to chime in @rwightman !

These are definitely very valid points! I think I agree with you that device capability-based device module is a good global solution.
I think there are 3 pieces here in my mind:

The any-device shared API available on every "device module", as given by torch.get_device_module().
An accelerator-like device shared API
Per-device API (is the hw feature enabled or disabled for example)

I do think there is a need for each layer there, mainly 3) will always be needed as there are always device-specific stuff we'll need, 2) is needed because we have a whole set of device that have a shared set of capabilities and having a single concept for them allows us to a) unify the stack for them all the way down (less code, less bugs, better consistency), b) have a common language to talk for each of these devices and c) expose a simpler PrivateUse1 API for out-of-tree backends that want to opt-in to this. I also think that a lot of library code we have today (fsdp, ac, offloading, etc) have a strong "cpu vs accelerator" idea, thus explicitly defining these concepts helps.

is even one step further, where we need to enforce consistency across all the devices.

I'm sure you've seen part of the discussion in the issues above about device module vs custom namespace.
My concern with aiming for 1) straight away is that it will be even more alignment work to convince everyone about it. 2) is a good intermediate state to reduce the amount of alignment and provide a good chunk of the benefits.

albanD

The default to cpu is the last contentious point from my point of view.

albanD · 2024-10-24T20:35:08Z

docs/source/accelerator.rst

+    current_stream
+    device_count
+    is_available
+    set_device_idx


nit: put set_device_idx next to the current_device_idx above

albanD · 2024-10-24T20:41:23Z

torch/csrc/DeviceAccelerator.cpp

+
+  m.def("_accelerator_getAccelerator", []() {
+    // If no accelerator is currently available, return CPU.
+    return c10::Device(at::getAccelerator(false).value_or(c10::kCPU));


From discussion, feels like we shouldn't do or cpu here and fail.
We can easily add this back later if it is feels really needed (going from error -> cpu is not BC-breaking).

Sounds good. I also think at this stage self-consistent is more important.

[ghstack-poisoned]

albanD

The change sounds good. Doc build needs fixing

[ghstack-poisoned]

guangyey · 2024-10-27T03:11:05Z

The change sounds good. Doc build needs fixing

Thanks very much, doc has been fixed.

guangyey · 2024-10-27T03:42:52Z

"Unrelated failure"
@pytorchbot merge -i

pytorchmergebot · 2024-10-27T03:44:30Z

Merge started

Your change will be merged while ignoring the following 1 checks: xpu / win-vs2022-xpu-py3 / build

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-10-27T09:43:13Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

guangyey · 2024-10-27T10:35:31Z

@pytorchbot merge -f "unrelated failure, Macos job was queuing"

pytorchmergebot · 2024-10-27T10:36:57Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

XuehaiPan · 2024-10-27T11:33:46Z

torch/accelerator/__init__.py

+    return torch._C._accelerator_getAccelerator()
+
+
+def current_device_idx() -> int:


nit: why not use index instead?

Thanks @XuehaiPan , I will discuss with Alban to confirm if it is OK to change to index.

guangyey requested review from EikanWang and gujinghui as code owners July 31, 2024 02:17

guangyey added a commit that referenced this pull request Jul 31, 2024

Introduce a device-agnostic runtime API design

944b2ce

ghstack-source-id: 197d7d9 Pull Request resolved: #132204

guangyey changed the title ~~Introduce a device-agnostic runtime API design~~ [WIP] Introduce a device-agnostic runtime API design Jul 31, 2024

guangyey marked this pull request as draft July 31, 2024 02:18

pytorchbot added the open source label Jul 31, 2024

Update

5470787

[ghstack-poisoned]

guangyey added a commit that referenced this pull request Jul 31, 2024

Introduce a device-agnostic runtime API design

e9fa57b

ghstack-source-id: c800335 Pull Request resolved: #132204

Update

69c6e1f

[ghstack-poisoned]

guangyey mentioned this pull request Aug 1, 2024

[WIP] Refactor distributed code via device-agnostic API #132371

Closed

pytorch-bot bot added the release notes: vulkan release notes category label Aug 1, 2024

guangyey added 8 commits August 1, 2024 08:57

Update

4e4b26b

[ghstack-poisoned]

Update

f928077

[ghstack-poisoned]

Update

e1574e6

[ghstack-poisoned]

Update

f7dcbd7

[ghstack-poisoned]

Update

0ba8d54

[ghstack-poisoned]

Update

6aa3f52

[ghstack-poisoned]

Update

d1f46ae

[ghstack-poisoned]

Update

79e7d96

[ghstack-poisoned]

guangyey added module: python frontend For issues relating to PyTorch's Python frontend and removed release notes: vulkan release notes category labels Aug 9, 2024

pytorch-bot bot added the release notes: vulkan release notes category label Aug 9, 2024

guangyey added 3 commits August 9, 2024 10:03

Update

372489b

[ghstack-poisoned]

Update

5fe2466

[ghstack-poisoned]

Update

0570980

[ghstack-poisoned]

Update

6ac17bf

[ghstack-poisoned]

albanD reviewed Oct 24, 2024

View reviewed changes

Update

d9d80ed

[ghstack-poisoned]

albanD approved these changes Oct 25, 2024

View reviewed changes

guangyey added 4 commits October 25, 2024 17:32

Update

e1a3839

[ghstack-poisoned]

Update

7986a9c

[ghstack-poisoned]

Update

db99a58

[ghstack-poisoned]

Update

202d5de

[ghstack-poisoned]

pytorchmergebot added the merging label Oct 27, 2024

pytorchmergebot added the Merged label Oct 27, 2024

pytorchmergebot closed this in 40c098f Oct 27, 2024

pytorchmergebot removed the merging label Oct 27, 2024

XuehaiPan reviewed Oct 27, 2024

View reviewed changes

wizzniu mentioned this pull request Oct 29, 2024

Update pin memory related APIs to not pass 'device' argument #131858

Closed

guangyey mentioned this pull request Nov 5, 2024

Use device-agnostic runtime API in distributed DDP/FSDP instead of cuda device specific. #137678

Closed

github-actions bot deleted the gh/guangyey/57/head branch November 28, 2024 02:12

daisyden mentioned this pull request Dec 4, 2024

[RFC] Add Intel GPU Support to Torch Test Cases #142029

Open

guangyey mentioned this pull request Dec 10, 2024

Support with statement on torch.Stream #140138

Closed

EikanWang mentioned this pull request Dec 18, 2024

xpu: implement communication collectives for XPU backend #143335

Closed

3 tasks

This was referenced May 5, 2025

[Accelerator] Fix Python typing in accelerator #152394

Closed

[RFC] A device-agnostic Python runtime API design for stream-based accelerators #128403

Closed

guangyey mentioned this pull request Jun 23, 2025

Add unified memory APIs for torch.accelerator #152932

Closed

		return torch._C._accelerator_getAccelerator()


		def current_device_idx() -> int:

Introduce a device-agnostic runtime API design #132204

Introduce a device-agnostic runtime API design #132204

Uh oh!

Conversation

guangyey commented Jul 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Additional Context

Uh oh!

pytorch-bot bot commented Jul 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/132204

❌ 1 New Failure, 1 Unrelated Failure

Uh oh!

albanD commented Oct 24, 2024

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

albanD Oct 24, 2024

Choose a reason for hiding this comment

Uh oh!

guangyey Oct 25, 2024

Choose a reason for hiding this comment

Uh oh!

albanD Oct 24, 2024

Choose a reason for hiding this comment

Uh oh!

guangyey Oct 25, 2024

Choose a reason for hiding this comment

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

guangyey commented Oct 27, 2024

Uh oh!

guangyey commented Oct 27, 2024

Uh oh!

pytorchmergebot commented Oct 27, 2024

Merge started

Uh oh!

pytorchmergebot commented Oct 27, 2024

Uh oh!

guangyey commented Oct 27, 2024

Uh oh!

pytorchmergebot commented Oct 27, 2024

Merge started

Uh oh!

XuehaiPan Oct 27, 2024

Choose a reason for hiding this comment

Uh oh!

guangyey Oct 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

guangyey commented Jul 31, 2024 •

edited

Loading

pytorch-bot bot commented Jul 31, 2024 •

edited

Loading

guangyey Oct 28, 2024 •

edited

Loading