-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[Intel GPU] Dispatch Stub support #130019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/130019
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit d2e8c68 with merge base 6cbb143 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
albanD
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Only small improvements.
albanD
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
|
@albanD Thank for your approval! |
|
@pytorchbot merge |
Merge failedReason: This PR needs a If not, please add the To add a label, you can comment to pytorchbot, for example For more information, see Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot label "topic: not user facing" |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 3 jobs have failed, first few of them are: xpu / linux-jammy-xpu-py3.8 / test (default, 1, 4, linux.idc.xpu), xpu / linux-jammy-xpu-py3.8 / test (default, 2, 4, linux.idc.xpu), trunk / linux-focal-rocm6.1-py3.8 / test (default, 1, 2, linux.rocm.gpu) Details for Dev Infra teamRaised by workflow job |
|
Meet new ci failure, I am fixing |
|
The new failure in CI is caused by the fact that Considering that, XPU does not support cppextension at present, would it acceptable to your that we remove the |
|
I think it is an alternative to skip CPP_Eextension test cases as Intel GPU has not supported it well. @albanD , may I know which approach is your prefer? Approach 1: Skip CPP_Extension test cases for Intel GPU Thanks, |
ghstack-source-id: 439dc40 Pull Request resolved: pytorch/pytorch#130019
|
Skipping cpp_extensions test is definitely an option but that is a widely used features by third party repos. So you most likely want to fix that sooner rather than later. |
Yes, we plan to support cpp_extension and a community contributor (not from Intel) jas just submitted a PR to support the feature. But we still have some a open for the solution. we need some time to conclude the open. So, I'm wondering if we can skip the case first just like ARM to land this PR and enable it for Intel GPU in the future as long as we support it. @albanD
For XPU CI, |
Why not? Don't we re-use the torch compilation flag during cpp_extension build usually? I would expect that's how we transfer cuda/rocm/mps related flags?
Yes you can add a decorator to the test to skip it on xpu for now. That sounds ok. |
Yes, we should re-use the torch compilation flag. And we did it in another PR(#131276) as |
|
@ZhiweiYan-96 , please help refine the PR a little bit. |
|
The failed cppextension uts has been skipped, thanks for the help :) |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
# Motivation This PR intends to enhance the codegen to allow generate codes for XPU backend. XPU operators need be registered in an hand-written way currently. Developers have no chance to take the advantage of shared code to handle tensor meta setting (like strides, proxy output, structured kernels). Manually porting code is erro-prone and may lead to high maintaining efforts. We utilize the backend_whitelist argument in `gen.py` to generate XPU needed headers and source codes. # Usage XPU ops lie in `third_pary/torch-xpu-ops`, the codegen process is triggered before the complation of `torch-xpu-ops` We use the following commands to generate XPU operators ` python -m torchgen.gen --source-path path/to/yaml/of/xpu --install-dir build/xpu --per-operator-headers --static-dispatch-backend --backend-whitelist=XPU` The diff lies at `backend-whitelist=XPU`. The backend-whitelist key is an existent argument in torchgen. The input of `gen.py` are code templates and operators yaml. We share the same templates in `aten`. A simplified yaml lies in `third_party/torch-xpu-ops`, which only includes the supported xpu operators. This yaml is a copy-and-modify of `native_functions.yaml`. No extra entry is added, the format is same as the one in `aten` # Result All operators headers are generated in `build/xpu/ATen/ops` independently, which would not affect operators declared/defined by CPU/CUDA or any other backend. XPU operators only include headers in this folder. # Verification * In `third-party/torch-xpu-ops`, we migrate all supported kernels to structured kernels style, where they are registered through `REGISTER_XPU_DISPATCH` or `TORCH_IMPL_FUNC`, and we have UT verification based on `test_ops.py` Pull Request resolved: #130082 Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/atalman ghstack dependencies: #130019
# Motivation `copy`, `cdist`, `index_put_impl` operators use `op_stub` for runtime dispatching inside operators. Extra device list is inside them to assure the accuracy, while XPU is not in them. This PRs make them allow XPU as a supported device. Pull Request resolved: #130088 Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/albanD ghstack dependencies: #130019, #130082
Motivation
Structured codegen is beneficial for easier decoupling tensor meta setting and kernel implementation. At present, XPU operators need to handle tensor metas in hand-written way.
We plan to leverage the codegen system for auto generate structured operators. This PR facilitate the
DispatchStubsupport for Intel GPUs. Based on that, XPU operators would have possibility to register kernel functor to operator stubs.This is a prerequisite of PR #130082, where we will modify the codegen system to generate XPU needed source files and headers.
Stack from ghstack (oldest at bottom):
cc @gujinghui @EikanWang @fengyuan14 @guangyey