Skip to content

Conversation

@guangyey
Copy link
Collaborator

@guangyey guangyey commented Jul 31, 2024

Stack from ghstack (oldest at bottom):

Motivation

According to [RFC]A device-agnostic Python runtime API design for stream-based accelerators, this PR intends to introduce a device-agnostic runtime API design.
I personally prefer the Simple Version APIs that no longer accept the device type as an input argument. It means we will leverage getAccelerator to fetch the current accelerator. And it is flexible to expand these APIs to handle multiple types of accelerator scenarios. The design does NOT break the previous design philosophies.
I also believe that namespace torch.accelerator is better. It lets users know that the APIs they are calling are running on an accelerator rather than CPU. This is important. Meanwhile, we can follow a simple API design principle:

  1. Device-agnostic APIs should be placed under the torch.accelerator namespace and not accept a device_type optional parameter.
  2. Device-specific APIs should be placed under device-specific submodules.
  3. APIS required by both CPU and accelerators should be placed under the torch namespace and accept a device_type optional parameter.

Also, I list the pros and cons of Simple Version here:
Pros:

  • torch.accelerator.foo will have the same input argument as torch.xxx.foo, bringing a better user experience;
  • more concise, facilitate the developer to write a device-agnostic code.

Cons:

  • no obvious drawbacks.

Additional Context

I list the new APIs here:

torch.accelerator.is_available() -> bool:
torch.accelerator.current_accelerator() -> torch.device:
torch.accelerator.device_count() -> int:
torch.accelerator.current_device_idx() -> int:
torch.accelerator.set_device_idx(device: Union[torch.device, str, int, None]) -> None:
torch.accelerator.current_stream(device: Union[torch.device, str, int, None]) -> torch.Stream:
torch.accelerator.set_stream(stream: torch.Stream) -> None:
torch.accelerator.synchronize(device: Union[torch.device, str, int, None]) -> None:

According to the discussion with Alban, we decide to change the API name set_device to set_device_idx and current_device to current_device_idx for more explicit. And will submit other PR to support device and stream context manager.

cc @albanD @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

@pytorch-bot
Copy link

pytorch-bot bot commented Jul 31, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/132204

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit 202d5de with merge base 0efa590 (image):

NEW FAILURE - The following job has failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

guangyey added a commit that referenced this pull request Jul 31, 2024
ghstack-source-id: 197d7d9
Pull Request resolved: #132204
@guangyey guangyey changed the title Introduce a device-agnostic runtime API design [WIP] Introduce a device-agnostic runtime API design Jul 31, 2024
@guangyey guangyey marked this pull request as draft July 31, 2024 02:18
[ghstack-poisoned]
guangyey added a commit that referenced this pull request Jul 31, 2024
ghstack-source-id: c800335
Pull Request resolved: #132204
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@guangyey guangyey added module: python frontend For issues relating to PyTorch's Python frontend and removed release notes: vulkan release notes category labels Aug 9, 2024
@pytorch-bot pytorch-bot bot added the release notes: vulkan release notes category label Aug 9, 2024
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@albanD
Copy link
Collaborator

albanD commented Oct 24, 2024

Thanks for taking the time to chime in @rwightman !

These are definitely very valid points! I think I agree with you that device capability-based device module is a good global solution.
I think there are 3 pieces here in my mind:

  1. The any-device shared API available on every "device module", as given by torch.get_device_module().
  2. An accelerator-like device shared API
  3. Per-device API (is the hw feature enabled or disabled for example)

I do think there is a need for each layer there, mainly 3) will always be needed as there are always device-specific stuff we'll need, 2) is needed because we have a whole set of device that have a shared set of capabilities and having a single concept for them allows us to a) unify the stack for them all the way down (less code, less bugs, better consistency), b) have a common language to talk for each of these devices and c) expose a simpler PrivateUse1 API for out-of-tree backends that want to opt-in to this. I also think that a lot of library code we have today (fsdp, ac, offloading, etc) have a strong "cpu vs accelerator" idea, thus explicitly defining these concepts helps.

  1. is even one step further, where we need to enforce consistency across all the devices.

I'm sure you've seen part of the discussion in the issues above about device module vs custom namespace.
My concern with aiming for 1) straight away is that it will be even more alignment work to convince everyone about it. 2) is a good intermediate state to reduce the amount of alignment and provide a good chunk of the benefits.

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default to cpu is the last contentious point from my point of view.

current_stream
device_count
is_available
set_device_idx
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: put set_device_idx next to the current_device_idx above

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


m.def("_accelerator_getAccelerator", []() {
// If no accelerator is currently available, return CPU.
return c10::Device(at::getAccelerator(false).value_or(c10::kCPU));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From discussion, feels like we shouldn't do or cpu here and fail.
We can easily add this back later if it is feels really needed (going from error -> cpu is not BC-breaking).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I also think at this stage self-consistent is more important.

[ghstack-poisoned]
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change sounds good. Doc build needs fixing

[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@guangyey
Copy link
Collaborator Author

The change sounds good. Doc build needs fixing

Thanks very much, doc has been fixed.

@guangyey
Copy link
Collaborator Author

"Unrelated failure"
@pytorchbot merge -i

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 1 checks: xpu / win-vs2022-xpu-py3 / build

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@guangyey
Copy link
Collaborator Author

@pytorchbot merge -f "unrelated failure, Macos job was queuing"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

return torch._C._accelerator_getAccelerator()


def current_device_idx() -> int:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why not use index instead?

Copy link
Collaborator Author

@guangyey guangyey Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @XuehaiPan , I will discuss with Alban to confirm if it is OK to change to index.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/mps Run MPS tests (subset of trunk) ciflow/rocm Trigger "default" config CI on ROCm ciflow/trunk Trigger trunk jobs on your pull request ciflow/xpu Run XPU CI tasks intel This tag is for PR from Intel Merged module: python frontend For issues relating to PyTorch's Python frontend open source release notes: python_frontend python frontend release notes category

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.