Skip to content

Conversation

@ZhiweiYan-96
Copy link
Collaborator

@ZhiweiYan-96 ZhiweiYan-96 commented Jul 3, 2024

Motivation

Structured codegen is beneficial for easier decoupling tensor meta setting and kernel implementation. At present, XPU operators need to handle tensor metas in hand-written way.

We plan to leverage the codegen system for auto generate structured operators. This PR facilitate the DispatchStub support for Intel GPUs. Based on that, XPU operators would have possibility to register kernel functor to operator stubs.

This is a prerequisite of PR #130082, where we will modify the codegen system to generate XPU needed source files and headers.

Stack from ghstack (oldest at bottom):

cc @gujinghui @EikanWang @fengyuan14 @guangyey

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Jul 3, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/130019

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d2e8c68 with merge base 6cbb143 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]
@ZhiweiYan-96 ZhiweiYan-96 added ciflow/xpu Run XPU CI tasks ciflow/trunk Trigger trunk jobs on your pull request labels Jul 5, 2024
@ZhiweiYan-96 ZhiweiYan-96 requested a review from EikanWang July 10, 2024 06:01
@ZhiweiYan-96 ZhiweiYan-96 added the module: xpu Intel XPU related issues label Jul 11, 2024
@ZhiweiYan-96 ZhiweiYan-96 marked this pull request as ready for review July 12, 2024 06:17
@EikanWang EikanWang requested review from albanD, atalman and malfet July 17, 2024 02:21
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Only small improvements.

[ghstack-poisoned]
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@ZhiweiYan-96
Copy link
Collaborator Author

@albanD Thank for your approval!

@gujinghui
Copy link
Collaborator

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

@gujinghui
Copy link
Collaborator

@pytorchbot label "topic: not user facing"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

@ZhiweiYan-96
Copy link
Collaborator Author

Meet new ci failure, I am fixing

[ghstack-poisoned]
@ZhiweiYan-96
Copy link
Collaborator Author

ZhiweiYan-96 commented Jul 23, 2024

hi @albanD @EikanWang

The new failure in CI is caused by the fact that CPPExtension related UTs compile source files without -DUSE_XPU while it links to the libtorch built with such build variable in XPU ci flow. This results in the runtime failure. We verify this by adding extra_cflags=["-g", "-DUSE_XPU"] in uts and it passed.

Considering that, XPU does not support cppextension at present, would it acceptable to your that we remove the USE_XPU marco in source file and let it default exists, just like cuda/mps?

@EikanWang
Copy link
Collaborator

I think it is an alternative to skip CPP_Eextension test cases as Intel GPU has not supported it well. @albanD , may I know which approach is your prefer?

Approach 1: Skip CPP_Extension test cases for Intel GPU
Approach 2: Remove the macro block and add it back when the CPP_Extension is available for Intel GPU.

Thanks,
Eikan

francograndegmailcom pushed a commit to francograndegmailcom/pytorch-pytorch that referenced this pull request Jul 23, 2024
ghstack-source-id: 439dc40
Pull Request resolved: pytorch/pytorch#130019
@albanD
Copy link
Collaborator

albanD commented Jul 25, 2024

Skipping cpp_extensions test is definitely an option but that is a widely used features by third party repos. So you most likely want to fix that sooner rather than later.
I'm not sure to understand what is being done differently in USE_XPU flag handling compared to other backends that lead to this though.

@EikanWang
Copy link
Collaborator

EikanWang commented Jul 25, 2024

Skipping cpp_extensions test is definitely an option but that is a widely used features by third party repos. So you most likely want to fix that sooner rather than later.

Yes, we plan to support cpp_extension and a community contributor (not from Intel) jas just submitted a PR to support the feature. But we still have some a open for the solution. we need some time to conclude the open.

So, I'm wondering if we can skip the case first just like ARM to land this PR and enable it for Intel GPU in the future as long as we support it. @albanD
https://github.com/pytorch/pytorch/blob/main/test/test_cpp_extensions_open_device_registration.py#L70

I'm not sure to understand what is being done differently in USE_XPU flag handling compared to other backends that lead to this though.

For XPU CI, USE_XPU is True to build all torch libraries. But it is FALSE in cpp_extesnion because we do not set it explicitly for XPU. The conflict brings the ABI combability issue. We need to keep the USE_XPU being consistent just like:

https://github.com/pytorch/pytorch/blob/main/torch/utils/cpp_extension.py#L235

@albanD
Copy link
Collaborator

albanD commented Jul 25, 2024

For XPU CI, USE_XPU is True to build all torch libraries. But it is FALSE in cpp_extesnion because we do not set it explicitly for XPU.

Why not? Don't we re-use the torch compilation flag during cpp_extension build usually? I would expect that's how we transfer cuda/rocm/mps related flags?

So, I'm wondering if we can skip the case first just like ARM to land this PR and enable it for Intel GPU in the future as long as we support it.

Yes you can add a decorator to the test to skip it on xpu for now. That sounds ok.

@EikanWang
Copy link
Collaborator

For XPU CI, USE_XPU is True to build all torch libraries. But it is FALSE in cpp_extesnion because we do not set it explicitly for XPU.

Why not? Don't we re-use the torch compilation flag during cpp_extension build usually? I would expect that's how we transfer cuda/rocm/mps related flags?

Yes, we should re-use the torch compilation flag. And we did it in another PR(#131276) as cpp_extension is an independent feature for XPU.

@EikanWang
Copy link
Collaborator

@ZhiweiYan-96 , please help refine the PR a little bit.

[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@ZhiweiYan-96
Copy link
Collaborator Author

ZhiweiYan-96 commented Jul 29, 2024

The failed cppextension uts has been skipped, thanks for the help :)

@EikanWang
Copy link
Collaborator

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Jul 31, 2024
# Motivation

This PR intends to enhance the codegen to allow generate codes for XPU backend.

XPU operators need be registered in an hand-written way currently. Developers have no chance to take the advantage of shared code to handle tensor meta setting (like strides, proxy output, structured kernels).  Manually porting code is erro-prone and may lead to high maintaining efforts.

We utilize the backend_whitelist argument in `gen.py` to generate XPU needed headers and source codes.

# Usage
XPU ops lie in `third_pary/torch-xpu-ops`, the codegen process is triggered before the complation of `torch-xpu-ops`

We use the following commands to generate XPU operators

` python -m torchgen.gen --source-path path/to/yaml/of/xpu   --install-dir  build/xpu    --per-operator-headers    --static-dispatch-backend     --backend-whitelist=XPU`

The diff lies at `backend-whitelist=XPU`.  The backend-whitelist key is an existent argument in torchgen.

The input of `gen.py` are code templates and operators yaml. We share the same templates in `aten`. A simplified yaml lies in `third_party/torch-xpu-ops`, which only includes the supported xpu operators. This yaml is a copy-and-modify of `native_functions.yaml`. No extra entry is added, the format is same as the one in `aten`

# Result

All operators headers are generated in `build/xpu/ATen/ops` independently, which would not affect operators declared/defined by CPU/CUDA or any other backend.  XPU operators only include headers in this folder.

# Verification

* In `third-party/torch-xpu-ops`, we migrate all supported kernels to structured kernels style, where they are registered through `REGISTER_XPU_DISPATCH` or `TORCH_IMPL_FUNC`, and we have UT verification based on `test_ops.py`

Pull Request resolved: #130082
Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/atalman
ghstack dependencies: #130019
pytorchmergebot pushed a commit that referenced this pull request Aug 6, 2024
# Motivation
`copy`, `cdist`, `index_put_impl` operators use `op_stub` for runtime dispatching inside operators.  Extra device list is inside them to assure the accuracy, while XPU is not in them. This PRs make them allow XPU as a supported device.

Pull Request resolved: #130088
Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/albanD
ghstack dependencies: #130019, #130082
@github-actions github-actions bot deleted the gh/ZhiweiYan-96/15/head branch August 29, 2024 02:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request ciflow/xpu Run XPU CI tasks Merged module: xpu Intel XPU related issues open source topic: not user facing topic category

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

7 participants