-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[Intel GPU] Support RegisterXPU.cpp codegen and compile for the in-tree XPU structured GEMM OPs. #139025
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139025
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 3912761 with merge base 2ede4c9 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Attention! native_functions.yaml was changedIf you are adding a new function or defaulted argument to native_functions.yaml, you cannot use it from pre-existing Python frontend code until our FC window passes (two weeks). Split your PR into two PRs, one which adds the new C++ functionality, and one that makes use of it from Python, and land them two weeks apart. See https://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy#forwards-compatibility-fc for more info. Caused by: |
torchgen/gen.py
Outdated
| if options.backend_whitelist and str(DispatchKey.XPU) in options.backend_whitelist: | ||
| options.xpu = True | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invalid code changes.
| xpu_in_whitelist = ( | ||
| options.backend_whitelist and str(DispatchKey.XPU) in options.backend_whitelist | ||
| ) | ||
| if (not options.xpu) and (not xpu_in_whitelist): | ||
| ignore_keys.add(DispatchKey.XPU) | ||
|
|
||
| if DispatchKey.XPU in dispatch_keys: | ||
| del dispatch_keys[dispatch_keys.index(DispatchKey.XPU)] | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pls. add some comments to elaborate on the motivation.
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
… make it easy to produce c shims for out-of-tree OP kernels as well. Add c_shim for XPU. (#136742) [AOTI] Introduce an extensibility mechanism for the c shim codegen to make it easy to produce c shims for out-of-tree OP kernels as well. Add c shim for XPU. ### Motivation Since the current c shim codegen will only produce C wrappers for Op's registered in `aten/src/ATen/native/native_functions.yaml`, for the same backend, when a portion of out-of-tree OP's are not registered in that file, but are registered externally. For example, `third_party/torch-xpu-ops/yaml/native_functions.yaml` , in this case, the existing codegen can't fulfill the need to do extensions for the c shims from the out-of-tree OPs for the in-tree that has already been produced. ### Design To extend the c shim with more OP for a backend from out-of-tree. The PR provided a bool option `--aoti-extend` to indicate the codegen is to extend c shim from out-of-tree. The generated c shim is stored in the `extend` subdirectory , for example: ``` torch/include/torch/csrc/inductor/aoti_torch/generated/c_shim_xpu.h torch/include/torch/csrc/inductor/aoti_torch/generated/c_shim_xpu.cpp torch/include/torch/csrc/inductor/aoti_torch/generated/extend/c_shim_xpu.h torch/include/torch/csrc/inductor/aoti_torch/generated/extend/c_shim_xpu.cpp ``` example usage: `python -m torchgen.gen --source-path third_party/torch-xpu-ops/yaml/ --xpu --aoti-extend --update-aoti-c-shim ` `--xpu`: generate c shim for XPU `--aoti-extend `: this is an out-of-tree OPs(defined in `third_party/torch-xpu-ops/yaml/native_functions.yaml`) extend for in-tree ops(defined in `aten/src/ATen/native/native_functions.yaml`) `--update-aoti-c-shim`: always generate c_shim_xpu.h for the extend c_shim. Pull Request resolved: #136742 Approved by: https://github.com/EikanWang, https://github.com/desertfire ghstack dependencies: #139025
Before this change, if one builds PyTorch without XPU build process will be perpetually regenerating code because of the reference to non-existing file, that will make autograd codegened files always out of date, see part of the `ninja -d explain torch_cpu` output: ``` ninja explain: output ../torch/csrc/inductor/aoti_torch/generated/c_shim_xpu.cpp doesn't exist ninja explain: output third_party/kineto/libkineto/CMakeFiles/libkineto_defs.bzl of phony edge with no inputs doesn't exist ninja explain: third_party/kineto/libkineto/CMakeFiles/libkineto_defs.bzl is dirty ninja explain: /Users/malfet/git/pytorch/pytorch/torch/csrc/autograd/generated/Functions.cpp is dirty ``` This is a regression introduced by #139025. After this change, incremental rebuilds with no changes cause no build actions: ``` % ninja -j1 -v -d explain -n torch_cpu ninja explain: output third_party/kineto/libkineto/CMakeFiles/libkineto_defs.bzl of phony edge with no inputs doesn't exist ninja explain: third_party/kineto/libkineto/CMakeFiles/libkineto_defs.bzl is dirty ninja: no work to do. ``` Test plan: Wait for at least on XPU build to finish... Fixes #140432 Pull Request resolved: #140438 Approved by: https://github.com/kit1980, https://github.com/huydhn
…ee XPU structured GEMM OPs. (pytorch#139025) [Intel GPU] Support RegisterXPU.cpp codegen and compile for the in-tree XPU structured GEMM ops. Motivation: There are two parts of aten ops for XPU, one is in-tree ops like GEMM related OPs and the other is out-off-tree ops in torch-xpu-ops. For the in-tree part,since Pytorch uses native_functions.yaml registration and is equipped with convenient codegen capabilities, we want to take advantage of these benefits as well. At the same time, since AOT Inductor also uses native_functions.yaml to generate c shim wrappers, we also need to enable this mechanism for XPU. Pull Request resolved: pytorch#139025 Approved by: https://github.com/EikanWang, https://github.com/jansel, https://github.com/desertfire
… make it easy to produce c shims for out-of-tree OP kernels as well. Add c_shim for XPU. (pytorch#136742) [AOTI] Introduce an extensibility mechanism for the c shim codegen to make it easy to produce c shims for out-of-tree OP kernels as well. Add c shim for XPU. ### Motivation Since the current c shim codegen will only produce C wrappers for Op's registered in `aten/src/ATen/native/native_functions.yaml`, for the same backend, when a portion of out-of-tree OP's are not registered in that file, but are registered externally. For example, `third_party/torch-xpu-ops/yaml/native_functions.yaml` , in this case, the existing codegen can't fulfill the need to do extensions for the c shims from the out-of-tree OPs for the in-tree that has already been produced. ### Design To extend the c shim with more OP for a backend from out-of-tree. The PR provided a bool option `--aoti-extend` to indicate the codegen is to extend c shim from out-of-tree. The generated c shim is stored in the `extend` subdirectory , for example: ``` torch/include/torch/csrc/inductor/aoti_torch/generated/c_shim_xpu.h torch/include/torch/csrc/inductor/aoti_torch/generated/c_shim_xpu.cpp torch/include/torch/csrc/inductor/aoti_torch/generated/extend/c_shim_xpu.h torch/include/torch/csrc/inductor/aoti_torch/generated/extend/c_shim_xpu.cpp ``` example usage: `python -m torchgen.gen --source-path third_party/torch-xpu-ops/yaml/ --xpu --aoti-extend --update-aoti-c-shim ` `--xpu`: generate c shim for XPU `--aoti-extend `: this is an out-of-tree OPs(defined in `third_party/torch-xpu-ops/yaml/native_functions.yaml`) extend for in-tree ops(defined in `aten/src/ATen/native/native_functions.yaml`) `--update-aoti-c-shim`: always generate c_shim_xpu.h for the extend c_shim. Pull Request resolved: pytorch#136742 Approved by: https://github.com/EikanWang, https://github.com/desertfire ghstack dependencies: pytorch#139025
Before this change, if one builds PyTorch without XPU build process will be perpetually regenerating code because of the reference to non-existing file, that will make autograd codegened files always out of date, see part of the `ninja -d explain torch_cpu` output: ``` ninja explain: output ../torch/csrc/inductor/aoti_torch/generated/c_shim_xpu.cpp doesn't exist ninja explain: output third_party/kineto/libkineto/CMakeFiles/libkineto_defs.bzl of phony edge with no inputs doesn't exist ninja explain: third_party/kineto/libkineto/CMakeFiles/libkineto_defs.bzl is dirty ninja explain: /Users/malfet/git/pytorch/pytorch/torch/csrc/autograd/generated/Functions.cpp is dirty ``` This is a regression introduced by pytorch#139025. After this change, incremental rebuilds with no changes cause no build actions: ``` % ninja -j1 -v -d explain -n torch_cpu ninja explain: output third_party/kineto/libkineto/CMakeFiles/libkineto_defs.bzl of phony edge with no inputs doesn't exist ninja explain: third_party/kineto/libkineto/CMakeFiles/libkineto_defs.bzl is dirty ninja: no work to do. ``` Test plan: Wait for at least on XPU build to finish... Fixes pytorch#140432 Pull Request resolved: pytorch#140438 Approved by: https://github.com/kit1980, https://github.com/huydhn
Stack from ghstack (oldest at bottom):
[Intel GPU] Support RegisterXPU.cpp codegen and compile for the in-tree XPU structured GEMM ops.
Motivation: There are two parts of aten ops for XPU, one is in-tree ops like GEMM related OPs and the other is out-off-tree ops in torch-xpu-ops. For the in-tree part,since Pytorch uses native_functions.yaml registration and is equipped with convenient codegen capabilities, we want to take advantage of these benefits as well.
At the same time, since AOT Inductor also uses native_functions.yaml to generate c shim wrappers, we also need to enable this mechanism for XPU.
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov