[Inductor XPU GEMM] Step 9/N: Support generating XPU cutlass gemm kernel by etaf · Pull Request #161939 · pytorch/pytorch

etaf · 2025-09-02T03:10:05Z

Stack from ghstack (oldest at bottom):

This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation.
The test cases is enabled in #161940 .

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

kernel. [ghstack-poisoned]

pytorch-bot · 2025-09-02T03:10:08Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161939

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (4 Unrelated Failures)

As of commit 21b0c92 with merge base 98a4d7b ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

inductor / unit-test / inductor-test / test (inductor, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (similar failure)
test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lu_cuda_float32

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

inductor / inductor-cpu-test / test (cpu_inductor_torchbench, 2, 2, linux.2xlarge.amx, unstable) (gh) (#174929)
pytorch_CycleGAN_and_pix2pix
inductor / inductor-test / test (inductor_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu, unstable) (gh) (#174919)
pytorch_CycleGAN_and_pix2pix
inductor / inductor-test-cuda13 / test (inductor_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (#174930)
pytorch_CycleGAN_and_pix2pix

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…ss gemm" kernel. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

…ss gemm kernel" kernel. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

…ss gemm kernel" This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

kernel. ghstack-source-id: aeda23a Pull Request resolved: pytorch/pytorch#161939

…ss gemm kernel" This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

kernel. ghstack-source-id: 615d21f Pull Request resolved: pytorch#161939

[ghstack-poisoned]

kernel. ghstack-source-id: d42bae6 Pull Request resolved: pytorch#161939

kernel. ghstack-source-id: d42bae6 Pull Request resolved: pytorch/pytorch#161939

…ss gemm kernel" This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

kernel. ghstack-source-id: 1e0ec7d Pull Request resolved: pytorch#161939

…ss gemm kernel" This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

kernel. ghstack-source-id: 09b2dab Pull Request resolved: pytorch#161939

…ss gemm kernel" This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

kernel. ghstack-source-id: 030c597 Pull Request resolved: pytorch#161939

…ss gemm kernel" This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

kernel. ghstack-source-id: edd90fd Pull Request resolved: pytorch#161939

…ss gemm kernel" This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

kernel. ghstack-source-id: 74a6550 Pull Request resolved: pytorch#161939

[Inductor XPU GEMM] Step 9/N: Support generating XPU cutlass gemm

4124804

kernel. [ghstack-poisoned]

pytorch-bot bot added ciflow/inductor module: inductor labels Sep 2, 2025

pytorchbot added the open source label Sep 2, 2025

etaf changed the title ~~[Inductor XPU GEMM] Step 9/N: Support generating XPU cutlass gemm~~ [Inductor XPU GEMM] Step 9/N: Support generating XPU cutlass gemm kernel Sep 3, 2025

etaf added the topic: not user facing topic category label Sep 3, 2025

etaf marked this pull request as draft September 3, 2025 03:11

etaf mentioned this pull request Sep 1, 2025

[RFC] Enable cutlass to support Intel GPU into PyTorch Inductor. #160175

Open

11 tasks

etaf added 10 commits September 4, 2025 08:53

etaf added 3 commits November 23, 2025 07:16

tiendatngcs pushed a commit to tiendatngcs/pytorch-Dec25 that referenced this pull request Dec 10, 2025

[Inductor XPU GEMM] Step 9/N: Support generating XPU cutlass gemm

fa4bd8a

kernel. ghstack-source-id: aeda23a Pull Request resolved: pytorch/pytorch#161939

etaf added 3 commits January 7, 2026 02:37

etaf added a commit to etaf/pytorch-inductor-xpu that referenced this pull request Jan 12, 2026

[Inductor XPU GEMM] Step 9/N: Support generating XPU cutlass gemm

5f98301

kernel. ghstack-source-id: 615d21f Pull Request resolved: pytorch#161939

Update

a698897

[ghstack-poisoned]

etaf added a commit to etaf/pytorch-inductor-xpu that referenced this pull request Jan 16, 2026

[Inductor XPU GEMM] Step 9/N: Support generating XPU cutlass gemm

2d0c9b7

kernel. ghstack-source-id: d42bae6 Pull Request resolved: pytorch#161939

SergeyTyshkevich pushed a commit to SergeyTyshkevich/chart2 that referenced this pull request Jan 19, 2026

[Inductor XPU GEMM] Step 9/N: Support generating XPU cutlass gemm

d5753b5

kernel. ghstack-source-id: d42bae6 Pull Request resolved: pytorch/pytorch#161939

etaf added 3 commits January 26, 2026 17:37

etaf mentioned this pull request Jan 29, 2026

[xpu][feature][Inductor] Support epilogue fusion for sycl-tla backend. #173779

Draft

etaf added a commit to etaf/pytorch-inductor-xpu that referenced this pull request Jan 29, 2026

[Inductor XPU GEMM] Step 9/N: Support generating XPU cutlass gemm

f8e5373

kernel. ghstack-source-id: 1e0ec7d Pull Request resolved: pytorch#161939

etaf added a commit to etaf/pytorch-inductor-xpu that referenced this pull request Feb 11, 2026

[Inductor XPU GEMM] Step 9/N: Support generating XPU cutlass gemm

7f10b78

kernel. ghstack-source-id: 09b2dab Pull Request resolved: pytorch#161939

etaf added a commit to etaf/pytorch-inductor-xpu that referenced this pull request Feb 12, 2026

[Inductor XPU GEMM] Step 9/N: Support generating XPU cutlass gemm

5719f86

kernel. ghstack-source-id: 030c597 Pull Request resolved: pytorch#161939

etaf mentioned this pull request Feb 13, 2026

[xpu][feature] Enable Inductor sycl-tla standalone runner. #174958

Draft

etaf added 3 commits February 13, 2026 17:26

etaf added a commit to etaf/pytorch-inductor-xpu that referenced this pull request Feb 15, 2026

[Inductor XPU GEMM] Step 9/N: Support generating XPU cutlass gemm

100e016

kernel. ghstack-source-id: edd90fd Pull Request resolved: pytorch#161939

etaf added a commit to etaf/pytorch-inductor-xpu that referenced this pull request Feb 15, 2026

[Inductor XPU GEMM] Step 9/N: Support generating XPU cutlass gemm

44d1f04

kernel. ghstack-source-id: edd90fd Pull Request resolved: pytorch#161939

etaf added a commit to etaf/pytorch-inductor-xpu that referenced this pull request Feb 16, 2026

[Inductor XPU GEMM] Step 9/N: Support generating XPU cutlass gemm

64316e9

kernel. ghstack-source-id: 74a6550 Pull Request resolved: pytorch#161939

jansel approved these changes Feb 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Inductor XPU GEMM] Step 9/N: Support generating XPU cutlass gemm kernel#161939

[Inductor XPU GEMM] Step 9/N: Support generating XPU cutlass gemm kernel#161939
etaf wants to merge 41 commits intogh/etaf/167/basefrom
gh/etaf/167/head

etaf commented Sep 2, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 2, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

etaf commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161939

✅ You can merge normally! (4 Unrelated Failures)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

etaf commented Sep 2, 2025 •

edited

Loading

pytorch-bot bot commented Sep 2, 2025 •

edited

Loading