-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[Intel GPU] Allow XPU device in copy, cdist, index_put_impl #130088
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/130088
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 89c3b89 with merge base 6cbb143 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@ZhiweiYan-96 , please add test cases. |
|
hi @EikanWang The test case depends on the structure codegen change in |
Thanks for the information. @atalman FYI |
| device_type = kHIP; | ||
| } else if (iter.device_type(1) == kMPS) { | ||
| device_type = kMPS; | ||
| } else if (iter.device_type(1) == kXPU){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make this accelerator device check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @albanD We may need keep this device check as this code is intended to use accelerator kernel once any of tensor(src/dst) is on accelerator device. I think XPU logic also should have same logic like cuda/mps here. Otherwise, we may fail to propagate the device into copy_stub following. If my understanding here is wrong, could you please correct me? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a closer look, I retract my statemnet. The only difference with handling any accelerator here would be that PrivateUse1 would be handled as well. But that mens that this would force the copy kernel to be done via copy_stub for privateuse1 as opposed to today where _copy_from() is being called and can be overwritten as any other dispatcher op.
| at::OptionalIntArrayRef dim, const std::optional<Scalar>& correction_opt, | ||
| bool keepdim, bool take_sqrt) { | ||
| TORCH_CHECK(self.device().is_cpu() || self.device().is_cuda(), | ||
| TORCH_CHECK(self.device().is_cpu() || self.device().is_cuda() || self.device().is_xpu(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update error message here and below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your kindly reminding! I would add the XPU into error messages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has updated the error message, thank you so much for the comments!
aten/src/ATen/native/ReduceOps.cpp
Outdated
| TORCH_CHECK(self.device().is_cpu() || self.device().is_cuda(), | ||
| "std and var only supports tensors on a CPU or CUDA device, but got: ", | ||
| TORCH_CHECK(self.device().is_cpu() || self.device().is_cuda() || self.device().is_xpu(), | ||
| "std and var only supports tensors on a CPU or CUDA/XPU device, but got: ", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about?
| "std and var only supports tensors on a CPU or CUDA/XPU device, but got: ", | |
| "std and var supports tensors on CPU, CUDA, or XPU devices only, but got: ", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have modified the message here, thanks for the suggestions!
albanD
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok!
Thanks for the update
|
@pytorchbot merge |
Merge failedReason: This PR needs a If not, please add the To add a label, you can comment to pytorchbot, for example For more information, see Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Motivation
copy,cdist,index_put_imploperators useop_stubfor runtime dispatching inside operators. Extra device list is inside them to assure the accuracy, while XPU is not in them. This PRs make them allow XPU as a supported device.Stack from ghstack (oldest at bottom):
cc @gujinghui @EikanWang @fengyuan14 @guangyey