[Inductor XPU GEMM] Step 9/N: Support generating XPU cutlass gemm kernel#161939
Open
etaf wants to merge 41 commits intogh/etaf/167/basefrom
Open
[Inductor XPU GEMM] Step 9/N: Support generating XPU cutlass gemm kernel#161939etaf wants to merge 41 commits intogh/etaf/167/basefrom
etaf wants to merge 41 commits intogh/etaf/167/basefrom
Conversation
kernel. [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161939
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (4 Unrelated Failures)As of commit 21b0c92 with merge base 98a4d7b ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This was referenced Aug 15, 2025
Closed
…ss gemm" kernel. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]
…ss gemm kernel" kernel. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]
11 tasks
…ss gemm kernel" This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]
…ss gemm kernel" This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]
…ss gemm kernel" This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]
…ss gemm kernel" This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]
…ss gemm kernel" This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]
…ss gemm kernel" This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]
…ss gemm kernel" This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]
…ss gemm kernel" This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]
…ss gemm kernel" This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]
…ss gemm kernel" This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]
…ss gemm kernel" This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]
…ss gemm kernel" This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]
…ss gemm kernel" This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]
tiendatngcs
pushed a commit
to tiendatngcs/pytorch-Dec25
that referenced
this pull request
Dec 10, 2025
kernel. ghstack-source-id: aeda23a Pull Request resolved: pytorch/pytorch#161939
…ss gemm kernel" This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]
…ss gemm kernel" This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]
…ss gemm kernel" This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]
etaf
added a commit
to etaf/pytorch-inductor-xpu
that referenced
this pull request
Jan 12, 2026
kernel. ghstack-source-id: 615d21f Pull Request resolved: pytorch#161939
etaf
added a commit
to etaf/pytorch-inductor-xpu
that referenced
this pull request
Jan 16, 2026
kernel. ghstack-source-id: d42bae6 Pull Request resolved: pytorch#161939
SergeyTyshkevich
pushed a commit
to SergeyTyshkevich/chart2
that referenced
this pull request
Jan 19, 2026
kernel. ghstack-source-id: d42bae6 Pull Request resolved: pytorch/pytorch#161939
…ss gemm kernel" This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]
…ss gemm kernel" This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]
…ss gemm kernel" This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]
etaf
added a commit
to etaf/pytorch-inductor-xpu
that referenced
this pull request
Jan 29, 2026
kernel. ghstack-source-id: 1e0ec7d Pull Request resolved: pytorch#161939
…ss gemm kernel" This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]
etaf
added a commit
to etaf/pytorch-inductor-xpu
that referenced
this pull request
Feb 11, 2026
kernel. ghstack-source-id: 09b2dab Pull Request resolved: pytorch#161939
…ss gemm kernel" This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]
etaf
added a commit
to etaf/pytorch-inductor-xpu
that referenced
this pull request
Feb 12, 2026
kernel. ghstack-source-id: 030c597 Pull Request resolved: pytorch#161939
…ss gemm kernel" This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]
…ss gemm kernel" This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]
…ss gemm kernel" This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]
…ss gemm kernel" This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]
etaf
added a commit
to etaf/pytorch-inductor-xpu
that referenced
this pull request
Feb 15, 2026
kernel. ghstack-source-id: edd90fd Pull Request resolved: pytorch#161939
etaf
added a commit
to etaf/pytorch-inductor-xpu
that referenced
this pull request
Feb 15, 2026
kernel. ghstack-source-id: edd90fd Pull Request resolved: pytorch#161939
…ss gemm kernel" This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]
etaf
added a commit
to etaf/pytorch-inductor-xpu
that referenced
this pull request
Feb 16, 2026
kernel. ghstack-source-id: 74a6550 Pull Request resolved: pytorch#161939
jansel
approved these changes
Feb 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stack from ghstack (oldest at bottom):
This PR implements the CUTLASS XPU backend kernel generation as proposed in RFC #160175. It reuses most of the CUTLASS CUDA kernel generation code, with only minor adjustments made to handle XPU-specific code generation.
The test cases is enabled in #161940 .
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben