[Intel GPU] xpu-ops codegen via backend whitelist #130082

ZhiweiYan-96 · 2024-07-04T02:14:17Z

Motivation

This PR intends to enhance the codegen to allow generate codes for XPU backend.

XPU operators need be registered in an hand-written way currently. Developers have no chance to take the advantage of shared code to handle tensor meta setting (like strides, proxy output, structured kernels). Manually porting code is erro-prone and may lead to high maintaining efforts.

We utilize the backend_whitelist argument in gen.py to generate XPU needed headers and source codes.

Usage

XPU ops lie in third_pary/torch-xpu-ops, the codegen process is triggered before the complation of torch-xpu-ops

We use the following commands to generate XPU operators

python -m torchgen.gen --source-path path/to/yaml/of/xpu --install-dir build/xpu --per-operator-headers --static-dispatch-backend --backend-whitelist=XPU

The diff lies at backend-whitelist=XPU. The backend-whitelist key is an existent argument in torchgen.

The input of gen.py are code templates and operators yaml. We share the same templates in aten. A simplified yaml lies in third_party/torch-xpu-ops, which only includes the supported xpu operators. This yaml is a copy-and-modify of native_functions.yaml. No extra entry is added, the format is same as the one in aten

Result

All operators headers are generated in build/xpu/ATen/ops independently, which would not affect operators declared/defined by CPU/CUDA or any other backend. XPU operators only include headers in this folder.

Verification

In third-party/torch-xpu-ops, we migrate all supported kernels to structured kernels style, where they are registered through REGISTER_XPU_DISPATCH or TORCH_IMPL_FUNC, and we have UT verification based on test_ops.py

Stack from ghstack (oldest at bottom):

cc @gujinghui @EikanWang @fengyuan14 @guangyey

[ghstack-poisoned]

pytorch-bot · 2024-07-04T02:14:20Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/130082

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 55aa37a with merge base 6cbb143 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

tools/test/test_codegen.py

[ghstack-poisoned]

ghstack-source-id: 3629701 Pull Request resolved: pytorch/pytorch#130082

[ghstack-poisoned]

# Motivation Structured codegen is beneficial for easier decoupling tensor meta setting and kernel implementation. At present, XPU operators need to handle tensor metas in hand-written way. We plan to leverage the codegen system for auto generate structured operators. This PR facilitate the `DispatchStub` support for Intel GPUs. Based on that, XPU operators would have possibility to register kernel functor to operator stubs. This is a prerequisite of PR #130082, where we will modify the codegen system to generate XPU needed source files and headers. Pull Request resolved: #130019 Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/albanD

atalman

lgtm

albanD · 2024-07-29T22:04:16Z

torchgen/dest/register_dispatch_key.py

    elif backend_index.dispatch_key == DispatchKey.MPS:
        headers.append("#include <ATen/mps/EmptyTensor.h>")
+    elif backend_index.dispatch_key == DispatchKey.XPU:
+        # XPU specific, this header resides in third_party/torch-xpu-ops


This is not a real folder in this repo?

@albanD , it is not a real folder. FYI: #120891 (review)

ZhiweiYan-96 · 2024-07-31T06:04:30Z

hi @albanD, great thanks for your comments. The header here resides in the thirdy_party/torch-xpu-ops, where most XPU operators are in it. The header is exposed to PyTorch build process by cmake change here https://github.com/pytorch/pytorch/pull/130082/files#diff-c5ee05f1e918772792ff6f2a3f579fc2f182e57b1709fd786ef6dc711fd68b27R1054.

During the compilation process, the operators in third_party/torch-xpu-ops and generated source files could both see these headers. We have some verification this with lots uts. Here is a log fraction we've tested in torch-xpu-ops repo accompained with the change in current PR.

test_ops_xpu.py::TestSelfKwarg::test_self_kwargs PASSED                                                                                                                                                                                          [  0%]
test_ops_xpu.py::TestCommonXPU::test_compare_cpu_H_xpu_float32 PASSED                                                                                                                                                                            [  0%]
test_ops_xpu.py::TestCommonXPU::test_compare_cpu_T_xpu_float32 PASSED                                                                                                                                                                            [  0%]
test_ops_xpu.py::TestCommonXPU::test_compare_cpu___getitem___xpu_float32

ZhiweiYan-96 · 2024-07-31T16:23:18Z

@pytorchbot merge

pytorchmergebot · 2024-07-31T16:26:13Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

ZhiweiYan-96 · 2024-07-31T16:30:32Z

hi @albanD I may merge the change PR first. If you need me have further change, please give me your valuable suggestions, and I would append commit to refractor them, great thanks!

EikanWang · 2024-08-01T01:40:43Z

@ZhiweiYan-96 , please add document to describe the usage of this PR.

# Motivation `copy`, `cdist`, `index_put_impl` operators use `op_stub` for runtime dispatching inside operators. Extra device list is inside them to assure the accuracy, while XPU is not in them. This PRs make them allow XPU as a supported device. Pull Request resolved: #130088 Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/albanD ghstack dependencies: #130019, #130082

This PR is a supplement to #130082. The previous PR #130082 fulfill the basic functionality of codegen, while we found it fails to handle the device sameness check in lots of uts. Current PR is aimed to facilitate the XPU device guard code generation. With current PR, the code snippet in `RegisterXPU.cpp` is as follows, where we can see the device guard is successfully generated. ```c++ namespace { at::Tensor & wrapper_XPU_Tensor_float_out_normal_out(const at::Tensor & mean, double std, ::std::optional<at::Generator> generator, at::Tensor & out) { std::optional<Device> common_device = std::nullopt; (void)common_device; // Suppress unused variable warning c10::impl::check_and_update_common_device(common_device, out, "wrapper_XPU_Tensor_float_out_normal_out", "out"); c10::impl::check_and_update_common_device(common_device, mean, "wrapper_XPU_Tensor_float_out_normal_out", "mean"); const OptionalDeviceGuard device_guard(device_of(out)); return at::native::normal_out(mean, std, generator, out); } } // anonymous namespace ``` Nevertheless, without current change, the generated code is ```c++ namespace { at::Tensor & wrapper_XPU_Tensor_float_out_normal_out(const at::Tensor & mean, double std, ::std::optional<at::Generator> generator, at::Tensor & out) { // No device check // DeviceGuard omitted return at::native::normal_out(mean, std, generator, out); } } // anonymous namespace ``` Pull Request resolved: #133980 Approved by: https://github.com/EikanWang, https://github.com/malfet

This PR is a supplement to pytorch#130082. The previous PR pytorch#130082 fulfill the basic functionality of codegen, while we found it fails to handle the device sameness check in lots of uts. Current PR is aimed to facilitate the XPU device guard code generation. With current PR, the code snippet in `RegisterXPU.cpp` is as follows, where we can see the device guard is successfully generated. ```c++ namespace { at::Tensor & wrapper_XPU_Tensor_float_out_normal_out(const at::Tensor & mean, double std, ::std::optional<at::Generator> generator, at::Tensor & out) { std::optional<Device> common_device = std::nullopt; (void)common_device; // Suppress unused variable warning c10::impl::check_and_update_common_device(common_device, out, "wrapper_XPU_Tensor_float_out_normal_out", "out"); c10::impl::check_and_update_common_device(common_device, mean, "wrapper_XPU_Tensor_float_out_normal_out", "mean"); const OptionalDeviceGuard device_guard(device_of(out)); return at::native::normal_out(mean, std, generator, out); } } // anonymous namespace ``` Nevertheless, without current change, the generated code is ```c++ namespace { at::Tensor & wrapper_XPU_Tensor_float_out_normal_out(const at::Tensor & mean, double std, ::std::optional<at::Generator> generator, at::Tensor & out) { // No device check // DeviceGuard omitted return at::native::normal_out(mean, std, generator, out); } } // anonymous namespace ``` Pull Request resolved: pytorch#133980 Approved by: https://github.com/EikanWang, https://github.com/malfet

Update

d664cf2

[ghstack-poisoned]

ZhiweiYan-96 requested review from EikanWang and gujinghui as code owners July 4, 2024 02:14

ZhiweiYan-96 mentioned this pull request Jul 4, 2024

[Intel GPU] Dispatch Stub support #130019

Closed

pytorch-bot bot added the topic: not user facing topic category label Jul 4, 2024

ZhiweiYan-96 marked this pull request as draft July 4, 2024 02:14

pytorchbot added the open source label Jul 4, 2024

This was referenced Jul 4, 2024

[Intel GPU] Add XPU into device list of copy_impl #130083

Closed

[Intel GPU] Allow XPU device in copy, cdist, index_put_impl #130088

Closed

ZhiweiYan-96 added 2 commits July 4, 2024 03:23

Update

1af7c88

[ghstack-poisoned]

Update

fa75cff

[ghstack-poisoned]

ZhiweiYan-96 added the module: xpu Intel XPU related issues label Jul 4, 2024

Update

a838cba

[ghstack-poisoned]

ZhiweiYan-96 added ciflow/xpu Run XPU CI tasks ciflow/trunk Trigger trunk jobs on your pull request labels Jul 5, 2024

ZhiweiYan-96 added 2 commits July 9, 2024 07:13

Update

73214b0

[ghstack-poisoned]

Update

b4d3f73

[ghstack-poisoned]

This was referenced Jul 10, 2024

[Intel GPU]Add XPU into device list of cdist_impl #130411

Closed

[Intel GPU]Add XPU into device list of cdist_impl #130412

Closed

Update

52f1df8

[ghstack-poisoned]

ZhiweiYan-96 marked this pull request as ready for review July 12, 2024 06:17

EikanWang approved these changes Jul 15, 2024

View reviewed changes

tools/test/test_codegen.py Outdated Show resolved Hide resolved

tools/test/test_codegen.py Outdated Show resolved Hide resolved

Update

16e1d0f

[ghstack-poisoned]

EikanWang requested review from albanD, atalman and malfet July 17, 2024 02:22

gujinghui approved these changes Jul 17, 2024

View reviewed changes

ZhiweiYan-96 added 2 commits July 19, 2024 03:10

Update

1b1b56e

[ghstack-poisoned]

Update

474539d

[ghstack-poisoned]

francograndegmailcom pushed a commit to francograndegmailcom/pytorch-pytorch that referenced this pull request Jul 23, 2024

[Intel GPU] xpu-ops codegen via backend whitelist

9e8c782

ghstack-source-id: 3629701 Pull Request resolved: pytorch/pytorch#130082

ZhiweiYan-96 added 3 commits July 26, 2024 02:52

Update

6d64739

[ghstack-poisoned]

Update

58b5a0f

[ghstack-poisoned]

Update

55aa37a

[ghstack-poisoned]

atalman approved these changes Jul 29, 2024

View reviewed changes

albanD reviewed Jul 29, 2024

View reviewed changes

pytorchmergebot added the merging label Jul 31, 2024

pytorchmergebot added the Merged label Jul 31, 2024

pytorchmergebot closed this in fe4f8e9 Jul 31, 2024

pytorchmergebot removed the merging label Jul 31, 2024

henrylhtsang mentioned this pull request Jul 31, 2024

[BE][typing] fix types in common pruning #132309

Closed

ZhiweiYan-96 mentioned this pull request Aug 23, 2024

[Intel GPU]device guard codegen for XPU #133980

Closed

github-actions bot deleted the gh/ZhiweiYan-96/16/head branch August 31, 2024 02:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Intel GPU] xpu-ops codegen via backend whitelist #130082

[Intel GPU] xpu-ops codegen via backend whitelist #130082

Uh oh!

ZhiweiYan-96 commented Jul 4, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jul 4, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

atalman left a comment

Uh oh!

albanD Jul 29, 2024

Uh oh!

EikanWang Jul 30, 2024

Uh oh!

ZhiweiYan-96 commented Jul 31, 2024 •

edited

Loading

Uh oh!

ZhiweiYan-96 commented Jul 31, 2024

Uh oh!

pytorchmergebot commented Jul 31, 2024

Uh oh!

ZhiweiYan-96 commented Jul 31, 2024

Uh oh!

EikanWang commented Aug 1, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

[Intel GPU] xpu-ops codegen via backend whitelist #130082

[Intel GPU] xpu-ops codegen via backend whitelist #130082

Uh oh!

Conversation

ZhiweiYan-96 commented Jul 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Usage

Result

Verification

Uh oh!

pytorch-bot bot commented Jul 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/130082

✅ No Failures

Uh oh!

Uh oh!

Uh oh!

atalman left a comment

Choose a reason for hiding this comment

Uh oh!

albanD Jul 29, 2024

Choose a reason for hiding this comment

Uh oh!

EikanWang Jul 30, 2024

Choose a reason for hiding this comment

Uh oh!

ZhiweiYan-96 commented Jul 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ZhiweiYan-96 commented Jul 31, 2024

Uh oh!

pytorchmergebot commented Jul 31, 2024

Merge started

Uh oh!

ZhiweiYan-96 commented Jul 31, 2024

Uh oh!

EikanWang commented Aug 1, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

ZhiweiYan-96 commented Jul 4, 2024 •

edited

Loading

pytorch-bot bot commented Jul 4, 2024 •

edited

Loading

ZhiweiYan-96 commented Jul 31, 2024 •

edited

Loading