[Intel GPU] Dispatch Stub support #130019

ZhiweiYan-96 · 2024-07-03T09:24:30Z

Motivation

Structured codegen is beneficial for easier decoupling tensor meta setting and kernel implementation. At present, XPU operators need to handle tensor metas in hand-written way.

We plan to leverage the codegen system for auto generate structured operators. This PR facilitate the DispatchStub support for Intel GPUs. Based on that, XPU operators would have possibility to register kernel functor to operator stubs.

This is a prerequisite of PR #130082, where we will modify the codegen system to generate XPU needed source files and headers.

Stack from ghstack (oldest at bottom):

cc @gujinghui @EikanWang @fengyuan14 @guangyey

[ghstack-poisoned]

pytorch-bot · 2024-07-03T09:24:33Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/130019

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d2e8c68 with merge base 6cbb143 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

albanD

Sounds good. Only small improvements.

CMakeLists.txt

aten/src/ATen/native/DispatchStub.h

[ghstack-poisoned]

albanD

Thanks!

ZhiweiYan-96 · 2024-07-20T08:20:08Z

@albanD Thank for your approval!

gujinghui · 2024-07-22T01:30:32Z

@pytorchbot merge

pytorchmergebot · 2024-07-22T01:32:17Z

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

gujinghui · 2024-07-22T01:36:59Z

@pytorchbot label "topic: not user facing"

pytorchmergebot · 2024-07-22T01:39:02Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-07-22T01:39:14Z

Merge failed

Reason: 3 jobs have failed, first few of them are: xpu / linux-jammy-xpu-py3.8 / test (default, 1, 4, linux.idc.xpu), xpu / linux-jammy-xpu-py3.8 / test (default, 2, 4, linux.idc.xpu), trunk / linux-focal-rocm6.1-py3.8 / test (default, 1, 2, linux.rocm.gpu)

Details for Dev Infra team

Raised by workflow job

ZhiweiYan-96 · 2024-07-23T01:57:04Z

Meet new ci failure, I am fixing

[ghstack-poisoned]

ZhiweiYan-96 · 2024-07-23T06:38:42Z

hi @albanD @EikanWang

The new failure in CI is caused by the fact that CPPExtension related UTs compile source files without -DUSE_XPU while it links to the libtorch built with such build variable in XPU ci flow. This results in the runtime failure. We verify this by adding extra_cflags=["-g", "-DUSE_XPU"] in uts and it passed.

Considering that, XPU does not support cppextension at present, would it acceptable to your that we remove the USE_XPU marco in source file and let it default exists, just like cuda/mps?

EikanWang · 2024-07-23T06:46:29Z

I think it is an alternative to skip CPP_Eextension test cases as Intel GPU has not supported it well. @albanD , may I know which approach is your prefer?

Approach 1: Skip CPP_Extension test cases for Intel GPU
Approach 2: Remove the macro block and add it back when the CPP_Extension is available for Intel GPU.

Thanks,
Eikan

ghstack-source-id: 439dc40 Pull Request resolved: pytorch/pytorch#130019

albanD · 2024-07-25T13:24:48Z

Skipping cpp_extensions test is definitely an option but that is a widely used features by third party repos. So you most likely want to fix that sooner rather than later.
I'm not sure to understand what is being done differently in USE_XPU flag handling compared to other backends that lead to this though.

EikanWang · 2024-07-25T14:30:18Z

Skipping cpp_extensions test is definitely an option but that is a widely used features by third party repos. So you most likely want to fix that sooner rather than later.

Yes, we plan to support cpp_extension and a community contributor (not from Intel) jas just submitted a PR to support the feature. But we still have some a open for the solution. we need some time to conclude the open.

So, I'm wondering if we can skip the case first just like ARM to land this PR and enable it for Intel GPU in the future as long as we support it. @albanD
https://github.com/pytorch/pytorch/blob/main/test/test_cpp_extensions_open_device_registration.py#L70

I'm not sure to understand what is being done differently in USE_XPU flag handling compared to other backends that lead to this though.

For XPU CI, USE_XPU is True to build all torch libraries. But it is FALSE in cpp_extesnion because we do not set it explicitly for XPU. The conflict brings the ABI combability issue. We need to keep the USE_XPU being consistent just like:

https://github.com/pytorch/pytorch/blob/main/torch/utils/cpp_extension.py#L235

albanD · 2024-07-25T15:11:59Z

For XPU CI, USE_XPU is True to build all torch libraries. But it is FALSE in cpp_extesnion because we do not set it explicitly for XPU.

Why not? Don't we re-use the torch compilation flag during cpp_extension build usually? I would expect that's how we transfer cuda/rocm/mps related flags?

So, I'm wondering if we can skip the case first just like ARM to land this PR and enable it for Intel GPU in the future as long as we support it.

Yes you can add a decorator to the test to skip it on xpu for now. That sounds ok.

EikanWang · 2024-07-25T15:48:58Z

For XPU CI, USE_XPU is True to build all torch libraries. But it is FALSE in cpp_extesnion because we do not set it explicitly for XPU.

Why not? Don't we re-use the torch compilation flag during cpp_extension build usually? I would expect that's how we transfer cuda/rocm/mps related flags?

Yes, we should re-use the torch compilation flag. And we did it in another PR(#131276) as cpp_extension is an independent feature for XPU.

EikanWang · 2024-07-25T15:49:18Z

@ZhiweiYan-96 , please help refine the PR a little bit.

[ghstack-poisoned]

ZhiweiYan-96 · 2024-07-29T01:54:56Z

The failed cppextension uts has been skipped, thanks for the help :)

EikanWang · 2024-07-29T02:11:40Z

@pytorchbot merge

pytorchmergebot · 2024-07-29T02:13:28Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

# Motivation This PR intends to enhance the codegen to allow generate codes for XPU backend. XPU operators need be registered in an hand-written way currently. Developers have no chance to take the advantage of shared code to handle tensor meta setting (like strides, proxy output, structured kernels). Manually porting code is erro-prone and may lead to high maintaining efforts. We utilize the backend_whitelist argument in `gen.py` to generate XPU needed headers and source codes. # Usage XPU ops lie in `third_pary/torch-xpu-ops`, the codegen process is triggered before the complation of `torch-xpu-ops` We use the following commands to generate XPU operators ` python -m torchgen.gen --source-path path/to/yaml/of/xpu --install-dir build/xpu --per-operator-headers --static-dispatch-backend --backend-whitelist=XPU` The diff lies at `backend-whitelist=XPU`. The backend-whitelist key is an existent argument in torchgen. The input of `gen.py` are code templates and operators yaml. We share the same templates in `aten`. A simplified yaml lies in `third_party/torch-xpu-ops`, which only includes the supported xpu operators. This yaml is a copy-and-modify of `native_functions.yaml`. No extra entry is added, the format is same as the one in `aten` # Result All operators headers are generated in `build/xpu/ATen/ops` independently, which would not affect operators declared/defined by CPU/CUDA or any other backend. XPU operators only include headers in this folder. # Verification * In `third-party/torch-xpu-ops`, we migrate all supported kernels to structured kernels style, where they are registered through `REGISTER_XPU_DISPATCH` or `TORCH_IMPL_FUNC`, and we have UT verification based on `test_ops.py` Pull Request resolved: #130082 Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/atalman ghstack dependencies: #130019

# Motivation `copy`, `cdist`, `index_put_impl` operators use `op_stub` for runtime dispatching inside operators. Extra device list is inside them to assure the accuracy, while XPU is not in them. This PRs make them allow XPU as a supported device. Pull Request resolved: #130088 Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/albanD ghstack dependencies: #130019, #130082

Update

749c395

[ghstack-poisoned]

pytorchbot added the open source label Jul 3, 2024

ZhiweiYan-96 marked this pull request as draft July 3, 2024 10:05

This was referenced Jul 4, 2024

[Intel GPU] xpu-ops codegen via backend whitelist #130082

Closed

[Intel GPU] Add XPU into device list of copy_impl #130083

Closed

[Intel GPU] Allow XPU device in copy, cdist, index_put_impl #130088

Closed

Update

b4fd75d

[ghstack-poisoned]

ZhiweiYan-96 added ciflow/xpu Run XPU CI tasks ciflow/trunk Trigger trunk jobs on your pull request labels Jul 5, 2024

This was referenced Jul 10, 2024

[Intel GPU]Add XPU into device list of cdist_impl #130411

Closed

[Intel GPU]Add XPU into device list of cdist_impl #130412

Closed

ZhiweiYan-96 requested a review from EikanWang July 10, 2024 06:01

ZhiweiYan-96 added the module: xpu Intel XPU related issues label Jul 11, 2024

ZhiweiYan-96 marked this pull request as ready for review July 12, 2024 06:17

EikanWang approved these changes Jul 15, 2024

View reviewed changes

EikanWang requested review from albanD, atalman and malfet July 17, 2024 02:21

gujinghui approved these changes Jul 17, 2024

View reviewed changes

albanD reviewed Jul 18, 2024

View reviewed changes

CMakeLists.txt Show resolved Hide resolved

aten/src/ATen/native/DispatchStub.h Show resolved Hide resolved

Update

ddd033b

[ghstack-poisoned]

albanD approved these changes Jul 19, 2024

View reviewed changes

pytorchmergebot added the merging label Jul 22, 2024

pytorchmergebot removed the merging label Jul 22, 2024

pytorchmergebot added the merging label Jul 22, 2024

pytorchmergebot removed the merging label Jul 22, 2024

EikanWang approved these changes Jul 23, 2024

View reviewed changes

Update

e9b7a69

[ghstack-poisoned]

francograndegmailcom pushed a commit to francograndegmailcom/pytorch-pytorch that referenced this pull request Jul 23, 2024

[Intel GPU] Dispatch Stub support

3f069f4

ghstack-source-id: 439dc40 Pull Request resolved: pytorch/pytorch#130019

ZhiweiYan-96 added 3 commits July 26, 2024 02:52

Update

6f3e03c

[ghstack-poisoned]

Update

0840487

[ghstack-poisoned]

Update

d2e8c68

[ghstack-poisoned]

pytorchmergebot added the merging label Jul 29, 2024

pytorchmergebot closed this in 2a02b5c Jul 29, 2024

pytorchmergebot added Merged and removed merging labels Jul 29, 2024

henrylhtsang mentioned this pull request Jul 31, 2024

[BE][typing] fix types in common pruning #132309

Closed

github-actions bot deleted the gh/ZhiweiYan-96/15/head branch August 29, 2024 02:01

[Intel GPU] Dispatch Stub support #130019

[Intel GPU] Dispatch Stub support #130019

Uh oh!

Conversation

ZhiweiYan-96 commented Jul 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Uh oh!

pytorch-bot bot commented Jul 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/130019

✅ No Failures

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

ZhiweiYan-96 commented Jul 20, 2024

Uh oh!

gujinghui commented Jul 22, 2024

Uh oh!

pytorchmergebot commented Jul 22, 2024

Merge failed

Uh oh!

gujinghui commented Jul 22, 2024

Uh oh!

pytorchmergebot commented Jul 22, 2024

Merge started

Uh oh!

pytorchmergebot commented Jul 22, 2024

Merge failed

Uh oh!

ZhiweiYan-96 commented Jul 23, 2024

Uh oh!

ZhiweiYan-96 commented Jul 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EikanWang commented Jul 23, 2024

Uh oh!

albanD commented Jul 25, 2024

Uh oh!

EikanWang commented Jul 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

albanD commented Jul 25, 2024

Uh oh!

EikanWang commented Jul 25, 2024

Uh oh!

EikanWang commented Jul 25, 2024

Uh oh!

ZhiweiYan-96 commented Jul 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EikanWang commented Jul 29, 2024

Uh oh!

pytorchmergebot commented Jul 29, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

ZhiweiYan-96 commented Jul 3, 2024 •

edited

Loading

pytorch-bot bot commented Jul 3, 2024 •

edited

Loading

ZhiweiYan-96 commented Jul 23, 2024 •

edited

Loading

EikanWang commented Jul 25, 2024 •

edited

Loading

ZhiweiYan-96 commented Jul 29, 2024 •

edited

Loading