Skip to content

[Inductor XPU GEMM] Step 9/N: Support generating XPU cutlass gemm kernel#161939

Open
etaf wants to merge 41 commits intogh/etaf/167/basefrom
gh/etaf/167/head
Open

[Inductor XPU GEMM] Step 9/N: Support generating XPU cutlass gemm kernel#161939
etaf wants to merge 41 commits intogh/etaf/167/basefrom
gh/etaf/167/head

Conversation

@etaf
Copy link
Collaborator

@etaf etaf commented Sep 2, 2025

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 2, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161939

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (4 Unrelated Failures)

As of commit 21b0c92 with merge base 98a4d7b (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…ss gemm"

kernel.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
@etaf etaf changed the title [Inductor XPU GEMM] Step 9/N: Support generating XPU cutlass gemm [Inductor XPU GEMM] Step 9/N: Support generating XPU cutlass gemm kernel Sep 3, 2025
…ss gemm kernel"

kernel.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
@etaf etaf added the topic: not user facing topic category label Sep 3, 2025
@etaf etaf marked this pull request as draft September 3, 2025 03:11
etaf added 10 commits September 4, 2025 08:53
…ss gemm kernel"


This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation.



cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
…ss gemm kernel"


This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation.



cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
…ss gemm kernel"


This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation.



cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
…ss gemm kernel"


This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation.



cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
…ss gemm kernel"


This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation.



cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
…ss gemm kernel"


This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation.



cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
…ss gemm kernel"


This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation.



cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
…ss gemm kernel"


This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation.



cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
…ss gemm kernel"


This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation.



cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
…ss gemm kernel"


This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation.



cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
etaf added 3 commits November 23, 2025 07:16
…ss gemm kernel"


This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation.



cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
…ss gemm kernel"


This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation.



cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
…ss gemm kernel"


This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation.



cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
tiendatngcs pushed a commit to tiendatngcs/pytorch-Dec25 that referenced this pull request Dec 10, 2025
etaf added 3 commits January 7, 2026 02:37
…ss gemm kernel"


This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation.



cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
…ss gemm kernel"


This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation.



cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
…ss gemm kernel"


This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation.



cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
etaf added a commit to etaf/pytorch-inductor-xpu that referenced this pull request Jan 12, 2026
[ghstack-poisoned]
etaf added a commit to etaf/pytorch-inductor-xpu that referenced this pull request Jan 16, 2026
SergeyTyshkevich pushed a commit to SergeyTyshkevich/chart2 that referenced this pull request Jan 19, 2026
etaf added 3 commits January 26, 2026 17:37
…ss gemm kernel"


This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation.



cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
…ss gemm kernel"


This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation.



cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
…ss gemm kernel"


This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation.



cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
etaf added a commit to etaf/pytorch-inductor-xpu that referenced this pull request Jan 29, 2026
…ss gemm kernel"


This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation.



cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
etaf added a commit to etaf/pytorch-inductor-xpu that referenced this pull request Feb 11, 2026
…ss gemm kernel"


This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation.



cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
etaf added a commit to etaf/pytorch-inductor-xpu that referenced this pull request Feb 12, 2026
…ss gemm kernel"


This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation.



cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
etaf added 3 commits February 13, 2026 17:26
…ss gemm kernel"


This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation.



cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
…ss gemm kernel"


This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation.



cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
…ss gemm kernel"


This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation.



cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
etaf added a commit to etaf/pytorch-inductor-xpu that referenced this pull request Feb 15, 2026
etaf added a commit to etaf/pytorch-inductor-xpu that referenced this pull request Feb 15, 2026
…ss gemm kernel"


This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation.



cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
etaf added a commit to etaf/pytorch-inductor-xpu that referenced this pull request Feb 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants