Skip to content

Conversation

@etaf
Copy link
Collaborator

@etaf etaf commented Oct 28, 2024

Stack from ghstack (oldest at bottom):

[Intel GPU] Support RegisterXPU.cpp codegen and compile for the in-tree XPU structured GEMM ops.

Motivation: There are two parts of aten ops for XPU, one is in-tree ops like GEMM related OPs and the other is out-off-tree ops in torch-xpu-ops. For the in-tree part,since Pytorch uses native_functions.yaml registration and is equipped with convenient codegen capabilities, we want to take advantage of these benefits as well.
At the same time, since AOT Inductor also uses native_functions.yaml to generate c shim wrappers, we also need to enable this mechanism for XPU.

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

[ghstack-poisoned]
@pytorch-bot pytorch-bot bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Oct 28, 2024
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 28, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139025

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 3912761 with merge base 2ede4c9 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions
Copy link
Contributor

Attention! native_functions.yaml was changed

If you are adding a new function or defaulted argument to native_functions.yaml, you cannot use it from pre-existing Python frontend code until our FC window passes (two weeks). Split your PR into two PRs, one which adds the new C++ functionality, and one that makes use of it from Python, and land them two weeks apart. See https://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy#forwards-compatibility-fc for more info.


Caused by:

@etaf etaf changed the title Support RegisterXPU.cpp codegen and compile for the in-tree XPU structured GEMM OPs. [Intel GPU] Support RegisterXPU.cpp codegen and compile for the in-tree XPU structured GEMM OPs. Oct 28, 2024
[ghstack-poisoned]
@etaf etaf added ciflow/xpu Run XPU CI tasks topic: not user facing topic category labels Oct 28, 2024
[ghstack-poisoned]
@etaf etaf marked this pull request as draft October 28, 2024 03:36
@etaf etaf added the suppress-bc-linter Suppresses the failures of API backward-compatibility linter (Lint/bc_linter) label Oct 28, 2024
etaf added 2 commits October 27, 2024 23:08
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@etaf etaf marked this pull request as ready for review October 29, 2024 12:29
@etaf etaf requested a review from EikanWang October 29, 2024 12:30
@etaf etaf added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 30, 2024
torchgen/gen.py Outdated
Comment on lines 2856 to 2858
if options.backend_whitelist and str(DispatchKey.XPU) in options.backend_whitelist:
options.xpu = True

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid code changes.

Comment on lines 2877 to 2885
xpu_in_whitelist = (
options.backend_whitelist and str(DispatchKey.XPU) in options.backend_whitelist
)
if (not options.xpu) and (not xpu_in_whitelist):
ignore_keys.add(DispatchKey.XPU)

if DispatchKey.XPU in dispatch_keys:
del dispatch_keys[dispatch_keys.index(DispatchKey.XPU)]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls. add some comments to elaborate on the motivation.

etaf added 3 commits October 30, 2024 03:43
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@etaf
Copy link
Collaborator Author

etaf commented Nov 9, 2024

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Nov 9, 2024
… make it easy to produce c shims for out-of-tree OP kernels as well. Add c_shim for XPU. (#136742)

[AOTI] Introduce an extensibility mechanism for the c shim codegen to make it easy to produce c shims for out-of-tree OP kernels as well. Add c shim for XPU.

### Motivation
Since the current c shim codegen will only produce C wrappers for Op's registered in `aten/src/ATen/native/native_functions.yaml`, for the same backend, when a portion of out-of-tree OP's are not registered in that file, but are registered externally. For example, `third_party/torch-xpu-ops/yaml/native_functions.yaml` , in this case, the existing codegen can't fulfill the need to do extensions for the c shims from the out-of-tree OPs for the in-tree that has already been produced.

### Design
To extend the c shim with more OP for a backend from out-of-tree.
The PR provided a bool option `--aoti-extend` to indicate the codegen is to extend c shim from out-of-tree.
The generated c shim is stored in the `extend` subdirectory , for example:
```
torch/include/torch/csrc/inductor/aoti_torch/generated/c_shim_xpu.h
torch/include/torch/csrc/inductor/aoti_torch/generated/c_shim_xpu.cpp
torch/include/torch/csrc/inductor/aoti_torch/generated/extend/c_shim_xpu.h
torch/include/torch/csrc/inductor/aoti_torch/generated/extend/c_shim_xpu.cpp
```
example usage:
`python -m torchgen.gen --source-path third_party/torch-xpu-ops/yaml/ --xpu --aoti-extend --update-aoti-c-shim  `
`--xpu`:  generate c shim for XPU
`--aoti-extend `: this is an out-of-tree OPs(defined in `third_party/torch-xpu-ops/yaml/native_functions.yaml`)  extend for in-tree ops(defined in `aten/src/ATen/native/native_functions.yaml`)
`--update-aoti-c-shim`: always generate c_shim_xpu.h for the extend c_shim.

Pull Request resolved: #136742
Approved by: https://github.com/EikanWang, https://github.com/desertfire
ghstack dependencies: #139025
malfet added a commit that referenced this pull request Nov 12, 2024
Before this change, if one builds PyTorch without XPU build process will
be perpetually regenrating code because of the reference to non-existing
file, that will make autograd codegened files always out of date

THis is a regression introduced by #139025

Fixes #140432
pytorchmergebot pushed a commit that referenced this pull request Nov 12, 2024
Before this change, if one builds PyTorch without XPU build process will
be perpetually regenerating code because of the reference to non-existing
file, that will make autograd codegened files always out of date, see part of the `ninja -d explain torch_cpu` output:
```
ninja explain: output ../torch/csrc/inductor/aoti_torch/generated/c_shim_xpu.cpp doesn't exist
ninja explain: output third_party/kineto/libkineto/CMakeFiles/libkineto_defs.bzl of phony edge with no inputs doesn't exist
ninja explain: third_party/kineto/libkineto/CMakeFiles/libkineto_defs.bzl is dirty
ninja explain: /Users/malfet/git/pytorch/pytorch/torch/csrc/autograd/generated/Functions.cpp is dirty
```

This is a regression introduced by #139025.

After this change, incremental rebuilds with no changes cause no build actions:
```
% ninja -j1 -v -d explain -n torch_cpu
ninja explain: output third_party/kineto/libkineto/CMakeFiles/libkineto_defs.bzl of phony edge with no inputs doesn't exist
ninja explain: third_party/kineto/libkineto/CMakeFiles/libkineto_defs.bzl is dirty
ninja: no work to do.
```

Test plan: Wait for at least on XPU build to finish...

Fixes #140432

Pull Request resolved: #140438
Approved by: https://github.com/kit1980, https://github.com/huydhn
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
…ee XPU structured GEMM OPs. (pytorch#139025)

[Intel GPU] Support RegisterXPU.cpp codegen and compile for the in-tree XPU structured GEMM ops.

Motivation: There are two parts of aten ops for XPU, one is in-tree ops like GEMM related OPs and the other is out-off-tree ops in torch-xpu-ops. For the in-tree part,since Pytorch uses native_functions.yaml registration and is equipped with convenient codegen capabilities, we want to take advantage of these benefits as well.
At the same time, since AOT Inductor also uses native_functions.yaml to generate c shim wrappers, we also need to enable this mechanism for XPU.

Pull Request resolved: pytorch#139025
Approved by: https://github.com/EikanWang, https://github.com/jansel, https://github.com/desertfire
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
… make it easy to produce c shims for out-of-tree OP kernels as well. Add c_shim for XPU. (pytorch#136742)

[AOTI] Introduce an extensibility mechanism for the c shim codegen to make it easy to produce c shims for out-of-tree OP kernels as well. Add c shim for XPU.

### Motivation
Since the current c shim codegen will only produce C wrappers for Op's registered in `aten/src/ATen/native/native_functions.yaml`, for the same backend, when a portion of out-of-tree OP's are not registered in that file, but are registered externally. For example, `third_party/torch-xpu-ops/yaml/native_functions.yaml` , in this case, the existing codegen can't fulfill the need to do extensions for the c shims from the out-of-tree OPs for the in-tree that has already been produced.

### Design
To extend the c shim with more OP for a backend from out-of-tree.
The PR provided a bool option `--aoti-extend` to indicate the codegen is to extend c shim from out-of-tree.
The generated c shim is stored in the `extend` subdirectory , for example:
```
torch/include/torch/csrc/inductor/aoti_torch/generated/c_shim_xpu.h
torch/include/torch/csrc/inductor/aoti_torch/generated/c_shim_xpu.cpp
torch/include/torch/csrc/inductor/aoti_torch/generated/extend/c_shim_xpu.h
torch/include/torch/csrc/inductor/aoti_torch/generated/extend/c_shim_xpu.cpp
```
example usage:
`python -m torchgen.gen --source-path third_party/torch-xpu-ops/yaml/ --xpu --aoti-extend --update-aoti-c-shim  `
`--xpu`:  generate c shim for XPU
`--aoti-extend `: this is an out-of-tree OPs(defined in `third_party/torch-xpu-ops/yaml/native_functions.yaml`)  extend for in-tree ops(defined in `aten/src/ATen/native/native_functions.yaml`)
`--update-aoti-c-shim`: always generate c_shim_xpu.h for the extend c_shim.

Pull Request resolved: pytorch#136742
Approved by: https://github.com/EikanWang, https://github.com/desertfire
ghstack dependencies: pytorch#139025
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
Before this change, if one builds PyTorch without XPU build process will
be perpetually regenerating code because of the reference to non-existing
file, that will make autograd codegened files always out of date, see part of the `ninja -d explain torch_cpu` output:
```
ninja explain: output ../torch/csrc/inductor/aoti_torch/generated/c_shim_xpu.cpp doesn't exist
ninja explain: output third_party/kineto/libkineto/CMakeFiles/libkineto_defs.bzl of phony edge with no inputs doesn't exist
ninja explain: third_party/kineto/libkineto/CMakeFiles/libkineto_defs.bzl is dirty
ninja explain: /Users/malfet/git/pytorch/pytorch/torch/csrc/autograd/generated/Functions.cpp is dirty
```

This is a regression introduced by pytorch#139025.

After this change, incremental rebuilds with no changes cause no build actions:
```
% ninja -j1 -v -d explain -n torch_cpu
ninja explain: output third_party/kineto/libkineto/CMakeFiles/libkineto_defs.bzl of phony edge with no inputs doesn't exist
ninja explain: third_party/kineto/libkineto/CMakeFiles/libkineto_defs.bzl is dirty
ninja: no work to do.
```

Test plan: Wait for at least on XPU build to finish...

Fixes pytorch#140432

Pull Request resolved: pytorch#140438
Approved by: https://github.com/kit1980, https://github.com/huydhn
@github-actions github-actions bot deleted the gh/etaf/53/head branch December 10, 2024 02:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request ciflow/xpu Run XPU CI tasks Merged module: cpu CPU specific problem (e.g., perf, algorithm) module: inductor open source suppress-bc-linter Suppresses the failures of API backward-compatibility linter (Lint/bc_linter) topic: not user facing topic category

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants