Skip to content

[Inductor XPU GEMM] Step 3/N: Refactor CUDATempalte to CUTLASSTemplate.#160686

Closed
etaf wants to merge 39 commits intogh/etaf/157/basefrom
gh/etaf/157/head
Closed

[Inductor XPU GEMM] Step 3/N: Refactor CUDATempalte to CUTLASSTemplate.#160686
etaf wants to merge 39 commits intogh/etaf/157/basefrom
gh/etaf/157/head

Conversation

@etaf
Copy link
Collaborator

@etaf etaf commented Aug 14, 2025

@pytorch-bot
Copy link

pytorch-bot bot commented Aug 14, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/160686

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (4 Unrelated Failures)

As of commit 9eab51d with merge base fe964c3 (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…LASSTemplate."

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
@etaf etaf changed the title [Inductor XPU GEMM] Step 4/N Refactor CUDATempalte to CUTLASSTemplate. [Inductor XPU GEMM] Step 4/N Refactor CUDATempalte to CUTLASSTemplate. Aug 14, 2025
@etaf etaf changed the title [Inductor XPU GEMM] Step 4/N Refactor CUDATempalte to CUTLASSTemplate. [Inductor XPU GEMM] Step 4/N: Refactor CUDATempalte to CUTLASSTemplate. Aug 14, 2025
…LASSTemplate."

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
etaf added 3 commits August 15, 2025 02:57
…LASSTemplate."

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
…LASSTemplate."

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
@etaf etaf changed the title [Inductor XPU GEMM] Step 4/N: Refactor CUDATempalte to CUTLASSTemplate. [Inductor XPU GEMM] Step 3/N: Refactor CUDATempalte to CUTLASSTemplate. Sep 2, 2025
@etaf etaf added topic: not user facing topic category and removed topic: not user facing topic category labels Sep 2, 2025
…LASSTemplate."


This PR is the third step of #160175. It refactors `CUDATemplate` into`CUTLASSTemplate` and introduces a `device_type` attribute, which allows handling minor differences between the CUDA and XPU backends.




cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
@etaf etaf marked this pull request as ready for review September 3, 2025 01:06
etaf added 4 commits November 21, 2025 03:28
…LASSTemplate."


This PR is the third step of #160175. It refactors `CUDATemplate` into`CUTLASSTemplate` and introduces a `device_type` attribute, which allows handling minor differences between the CUDA and XPU backends.




cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
…LASSTemplate."


This PR is the third step of #160175. It refactors `CUDATemplate` into`CUTLASSTemplate` and introduces a `device_type` attribute, which allows handling minor differences between the CUDA and XPU backends.




cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
…LASSTemplate."


This PR is the third step of #160175. It refactors `CUDATemplate` into`CUTLASSTemplate` and introduces a `device_type` attribute, which allows handling minor differences between the CUDA and XPU backends.




cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
…LASSTemplate."


This PR is the third step of #160175. It refactors `CUDATemplate` into`CUTLASSTemplate` and introduces a `device_type` attribute, which allows handling minor differences between the CUDA and XPU backends.




cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
…LASSTemplate."


This PR is the third step of #160175. It refactors `CUDATemplate` into`CUTLASSTemplate` and introduces a `device_type` attribute, which allows handling minor differences between the CUDA and XPU backends.




cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
tiendatngcs pushed a commit to tiendatngcs/pytorch-Dec25 that referenced this pull request Dec 10, 2025
etaf added 3 commits January 7, 2026 02:37
…LASSTemplate."


This PR is the third step of #160175. It refactors `CUDATemplate` into`CUTLASSTemplate` and introduces a `device_type` attribute, which allows handling minor differences between the CUDA and XPU backends.




cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
…LASSTemplate."


This PR is the third step of #160175. It refactors `CUDATemplate` into`CUTLASSTemplate` and introduces a `device_type` attribute, which allows handling minor differences between the CUDA and XPU backends.




cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
…LASSTemplate."


This PR is the third step of #160175. It refactors `CUDATemplate` into`CUTLASSTemplate` and introduces a `device_type` attribute, which allows handling minor differences between the CUDA and XPU backends.




cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
etaf added a commit to etaf/pytorch-inductor-xpu that referenced this pull request Jan 12, 2026
[ghstack-poisoned]
etaf added a commit to etaf/pytorch-inductor-xpu that referenced this pull request Jan 16, 2026
SergeyTyshkevich pushed a commit to SergeyTyshkevich/chart2 that referenced this pull request Jan 19, 2026
etaf added 3 commits January 26, 2026 17:37
…LASSTemplate."


This PR is the third step of #160175. It refactors `CUDATemplate` into`CUTLASSTemplate` and introduces a `device_type` attribute, which allows handling minor differences between the CUDA and XPU backends.




cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
…LASSTemplate."


This PR is the third step of #160175. It refactors `CUDATemplate` into`CUTLASSTemplate` and introduces a `device_type` attribute, which allows handling minor differences between the CUDA and XPU backends.




cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
…LASSTemplate."


This PR is the third step of #160175. It refactors `CUDATemplate` into`CUTLASSTemplate` and introduces a `device_type` attribute, which allows handling minor differences between the CUDA and XPU backends.




cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
etaf added a commit to etaf/pytorch-inductor-xpu that referenced this pull request Jan 29, 2026
@etaf
Copy link
Collaborator Author

etaf commented Feb 6, 2026

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 6, 2026
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Feb 9, 2026
…160687)

This PR is the fourth step of #160175. It refactors the following:
CUDAKernel → CUTLASSKernel
CUDATemplateKernel → CUTLASSTemplateKernel
CUDATemplateCaller → CUTLASSTemplateCaller
CUDATemplateBuffer → CUTLASSTemplateBuffer

Pull Request resolved: #160687
Approved by: https://github.com/EikanWang, https://github.com/mlazos
ghstack dependencies: #160685, #160686
radeksm pushed a commit to radeksm/pytorch that referenced this pull request Feb 20, 2026
…e. (pytorch#160686)

This PR is the third step of pytorch#160175. It refactors `CUDATemplate` into`CUTLASSTemplate` and introduces a `device_type` attribute, which allows handling minor differences between the CUDA and XPU backends.

Pull Request resolved: pytorch#160686
Approved by: https://github.com/EikanWang, https://github.com/mlazos
ghstack dependencies: pytorch#160685
radeksm pushed a commit to radeksm/pytorch that referenced this pull request Feb 20, 2026
…ytorch#160687)

This PR is the fourth step of pytorch#160175. It refactors the following:
CUDAKernel → CUTLASSKernel
CUDATemplateKernel → CUTLASSTemplateKernel
CUDATemplateCaller → CUTLASSTemplateCaller
CUDATemplateBuffer → CUTLASSTemplateBuffer

Pull Request resolved: pytorch#160687
Approved by: https://github.com/EikanWang, https://github.com/mlazos
ghstack dependencies: pytorch#160685, pytorch#160686
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants