[Inductor][Intel GPU] Save `threads_per_warp` from tirton compiled kernel for launching kernel correctly in cpp wrapper. #163315

etaf · 2025-09-19T03:20:26Z

Stack from ghstack (oldest at bottom):

-> [Inductor][Intel GPU] Save threads_per_warp from tirton compiled kernel for launching kernel correctly in cpp wrapper. #163315

On the Inductor XPU backend, threads_per_warp is not always 32. For Intel GEMM Triton kernels, it can be 16. This information must be preserved for XPU so that the Cpp wrapper can launch the kernel with the correct configuration.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

…rnel for launching kernel correctly in cpp wrapper. [ghstack-poisoned]

pytorch-bot · 2025-09-19T03:20:30Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163315

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 51c2cfc with merge base ed3438f ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…rnel for launching kernel correctly in cpp wrapper. ghstack-source-id: 54b133b Pull Request resolved: #163315

chengjunlu · 2025-09-19T03:57:29Z

torch/csrc/inductor/aoti_runtime/sycl_runtime_wrappers.h

    void** params,
-    sycl::queue* queuePtr) {
+    sycl::queue* queuePtr,
+    uint32_t threadsPerWarp) {


This order looks better to me.

uint32_t gridX, uint32_t gridY, uint32_t gridZ, uint32_t numWarps, uint32_t threadsPerWarp, uint32_t sharedMemory, void** params,

Yes, the order is better, but the code generation of these parameters are shared with cuda, so I would prefer keep the theadPerWarp as an extra paramter here.

etaf · 2025-09-19T08:42:01Z

Hi @jansel @desertfire, could you please take a look at this PR when you have time? We’d like to get this fix cherry-picked into the 2.9 release. Thanks!

desertfire · 2025-09-19T13:22:26Z

torch/_inductor/runtime/triton_heuristics.py

+            # can launch the kernel with the correct configuration.
+            threads_per_warp = 32
+            if hasattr(launcher.bin.metadata, "threads_per_warp"):
+                threads_per_warp = launcher.bin.metadata.threads_per_warp


Nit: You can drop if-checking and use getattr with 32 as the default value.

@desertfire : Thank you very much for your suggestion. I’ve simplified this piece of code into a single line.

…compiled kernel for launching kernel correctly in cpp wrapper." On the Inductor XPU backend, `threads_per_warp` is not always 32. For Intel GEMM Triton kernels, it can be 16. This information must be preserved for XPU so that the Cpp wrapper can launch the kernel with the correct configuration. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

…rnel for launching kernel correctly in cpp wrapper. ghstack-source-id: c56d814 Pull Request resolved: #163315

EikanWang · 2025-09-19T18:36:58Z

@pytorchbot merge

pytorchmergebot · 2025-09-19T18:38:45Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

etaf · 2025-09-19T23:56:42Z

@pytorchbot cherry-pick --onto release/2.9 -c "Critical - Critical fixes to new features"

pytorch-bot · 2025-09-19T23:56:44Z

❌ 🤖 pytorchbot command failed:

@pytorchbot cherry-pick: error: argument -c/--classification: invalid choice: 'Critical - Critical fixes to new features' (choose from 'regression', 'critical', 'fixnewfeature', 'docs', 'release')

usage: @pytorchbot cherry-pick --onto ONTO [--fixes FIXES] -c
                               {regression,critical,fixnewfeature,docs,release}

Try @pytorchbot --help for more info.

etaf · 2025-09-19T23:57:18Z

@pytorchbot cherry-pick --onto release/2.9 -c fixnewfeature

…rnel for launching kernel correctly in cpp wrapper. (#163315) On the Inductor XPU backend, `threads_per_warp` is not always 32. For Intel GEMM Triton kernels, it can be 16. This information must be preserved for XPU so that the Cpp wrapper can launch the kernel with the correct configuration. Pull Request resolved: #163315 Approved by: https://github.com/EikanWang, https://github.com/desertfire (cherry picked from commit 9f8a311)

pytorchbot · 2025-09-20T00:02:51Z

Cherry picking #163315

The cherry pick PR is at #163388 and it is recommended to link a fixnewfeature cherry pick PR with an issue. The following tracker issues are updated:

[v.2.9.0] Release Tracker #162497 (comment)

Details for Dev Infra team

Raised by workflow job

…rnel for launching kernel correctly in cpp wrapper. (pytorch#163315) On the Inductor XPU backend, `threads_per_warp` is not always 32. For Intel GEMM Triton kernels, it can be 16. This information must be preserved for XPU so that the Cpp wrapper can launch the kernel with the correct configuration. Pull Request resolved: pytorch#163315 Approved by: https://github.com/EikanWang, https://github.com/desertfire

…rnel for launching kernel correctly in cpp wrapper. (#163388) [Inductor][Intel GPU] Save `threads_per_warp` from tirton compiled kernel for launching kernel correctly in cpp wrapper. (#163315) On the Inductor XPU backend, `threads_per_warp` is not always 32. For Intel GEMM Triton kernels, it can be 16. This information must be preserved for XPU so that the Cpp wrapper can launch the kernel with the correct configuration. Pull Request resolved: #163315 Approved by: https://github.com/EikanWang, https://github.com/desertfire (cherry picked from commit 9f8a311) Co-authored-by: xinan.lin <[email protected]>

…rnel for launching kernel correctly in cpp wrapper. (pytorch#163315) On the Inductor XPU backend, `threads_per_warp` is not always 32. For Intel GEMM Triton kernels, it can be 16. This information must be preserved for XPU so that the Cpp wrapper can launch the kernel with the correct configuration. Pull Request resolved: pytorch#163315 Approved by: https://github.com/EikanWang, https://github.com/desertfire

[Inductor][Intel GPU] Save threads_per_warp from tirton compiled ke…

1c09fe4

…rnel for launching kernel correctly in cpp wrapper. [ghstack-poisoned]

pytorch-bot bot added ciflow/inductor module: inductor release notes: inductor (aoti) labels Sep 19, 2025

etaf added a commit that referenced this pull request Sep 19, 2025

[Inductor][Intel GPU] Save threads_per_warp from tirton compiled ke…

7503088

…rnel for launching kernel correctly in cpp wrapper. ghstack-source-id: 54b133b Pull Request resolved: #163315

etaf added the ciflow/xpu Run XPU CI tasks label Sep 19, 2025

etaf changed the title ~~[Inductor][Intel GPU] Save threads_per_warp from tirton compiled kernel~~ [Inductor][Intel GPU] Save threads_per_warp from tirton compiled kernel for launching kernel correctly in cpp wrapper. Sep 19, 2025

EikanWang approved these changes Sep 19, 2025

View reviewed changes

pytorchbot added the open source label Sep 19, 2025

chengjunlu reviewed Sep 19, 2025

View reviewed changes

etaf added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 19, 2025

etaf requested review from desertfire and jansel September 19, 2025 08:40

desertfire approved these changes Sep 19, 2025

View reviewed changes

etaf added a commit that referenced this pull request Sep 19, 2025

[Inductor][Intel GPU] Save threads_per_warp from tirton compiled ke…

4e32c27

…rnel for launching kernel correctly in cpp wrapper. ghstack-source-id: c56d814 Pull Request resolved: #163315

pytorchmergebot added the merging label Sep 19, 2025

pytorchmergebot added the Merged label Sep 19, 2025

pytorchmergebot closed this in 9f8a311 Sep 19, 2025

pytorchmergebot removed the merging label Sep 19, 2025

pytorchbot mentioned this pull request Sep 20, 2025

[Inductor][Intel GPU] Save threads_per_warp from tirton compiled kernel for launching kernel correctly in cpp wrapper. #163388

Merged

pytorchbot mentioned this pull request Sep 20, 2025

[v.2.9.0] Release Tracker #162497

Closed

github-actions bot deleted the gh/etaf/170/head branch October 20, 2025 02:17

[Inductor][Intel GPU] Save threads_per_warp from tirton compiled kernel for launching kernel correctly in cpp wrapper. #163315

[Inductor][Intel GPU] Save threads_per_warp from tirton compiled kernel for launching kernel correctly in cpp wrapper. #163315

Uh oh!

Conversation

etaf commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163315

✅ No Failures

Uh oh!

chengjunlu Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

etaf Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

etaf commented Sep 19, 2025

Uh oh!

desertfire Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

etaf Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

EikanWang commented Sep 19, 2025

Uh oh!

pytorchmergebot commented Sep 19, 2025

Merge started

Uh oh!

etaf commented Sep 19, 2025

Uh oh!

pytorch-bot bot commented Sep 19, 2025

Uh oh!

etaf commented Sep 19, 2025

Uh oh!

pytorchbot commented Sep 20, 2025

Cherry picking #163315

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

[Inductor][Intel GPU] Save `threads_per_warp` from tirton compiled kernel for launching kernel correctly in cpp wrapper. #163315

[Inductor][Intel GPU] Save `threads_per_warp` from tirton compiled kernel for launching kernel correctly in cpp wrapper. #163315

etaf commented Sep 19, 2025 •

edited

Loading

pytorch-bot bot commented Sep 19, 2025 •

edited

Loading