-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Introduce a device-agnostic runtime API design #132204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/132204
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 1 Unrelated FailureAs of commit 202d5de with merge base 0efa590 ( NEW FAILURE - The following job has failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Thanks for taking the time to chime in @rwightman ! These are definitely very valid points! I think I agree with you that device capability-based device module is a good global solution.
I do think there is a need for each layer there, mainly 3) will always be needed as there are always device-specific stuff we'll need, 2) is needed because we have a whole set of device that have a shared set of capabilities and having a single concept for them allows us to a) unify the stack for them all the way down (less code, less bugs, better consistency), b) have a common language to talk for each of these devices and c) expose a simpler PrivateUse1 API for out-of-tree backends that want to opt-in to this. I also think that a lot of library code we have today (fsdp, ac, offloading, etc) have a strong "cpu vs accelerator" idea, thus explicitly defining these concepts helps.
I'm sure you've seen part of the discussion in the issues above about device module vs custom namespace. |
albanD
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default to cpu is the last contentious point from my point of view.
| current_stream | ||
| device_count | ||
| is_available | ||
| set_device_idx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: put set_device_idx next to the current_device_idx above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
torch/csrc/DeviceAccelerator.cpp
Outdated
|
|
||
| m.def("_accelerator_getAccelerator", []() { | ||
| // If no accelerator is currently available, return CPU. | ||
| return c10::Device(at::getAccelerator(false).value_or(c10::kCPU)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From discussion, feels like we shouldn't do or cpu here and fail.
We can easily add this back later if it is feels really needed (going from error -> cpu is not BC-breaking).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I also think at this stage self-consistent is more important.
albanD
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change sounds good. Doc build needs fixing
Thanks very much, doc has been fixed. |
|
"Unrelated failure" |
Merge startedYour change will be merged while ignoring the following 1 checks: xpu / win-vs2022-xpu-py3 / build Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command |
|
@pytorchbot merge -f "unrelated failure, Macos job was queuing" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
| return torch._C._accelerator_getAccelerator() | ||
|
|
||
|
|
||
| def current_device_idx() -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why not use index instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @XuehaiPan , I will discuss with Alban to confirm if it is OK to change to index.
Stack from ghstack (oldest at bottom):
Motivation
According to [RFC]A device-agnostic Python runtime API design for stream-based accelerators, this PR intends to introduce a device-agnostic runtime API design.
I personally prefer the Simple Version APIs that no longer accept the device type as an input argument. It means we will leverage
getAcceleratorto fetch the current accelerator. And it is flexible to expand these APIs to handle multiple types of accelerator scenarios. The design does NOT break the previous design philosophies.I also believe that namespace torch.accelerator is better. It lets users know that the APIs they are calling are running on an accelerator rather than CPU. This is important. Meanwhile, we can follow a simple API design principle:
Also, I list the pros and cons of Simple Version here:
Pros:
torch.accelerator.foowill have the same input argument astorch.xxx.foo, bringing a better user experience;Cons:
Additional Context
I list the new APIs here:
According to the discussion with Alban, we decide to change the API name
set_devicetoset_device_idxandcurrent_devicetocurrent_device_idxfor more explicit. And will submit other PR to support device and stream context manager.cc @albanD @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10