[inductor] leverage template stacking in V.choices.get_mm_configs #161350

coconutruben · 2025-08-23T02:58:16Z

Stack from ghstack (oldest at bottom):

why

now everything is in place to just gather templates and run
the V.choices.get_mm_configs once per op
enables any overrides inside V.choices.get_mm_configs to
have a full view of the options for an op, not just for
one template

what

replace multiple calls to V.choices.get_mm_configs with
calls to gather the active templates, and then using those
in a single call

testing

python3 -bb -m pytest test/inductor/test_max_autotune.py -v

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

Differential Revision: D81520571

\# why - now everything is in place to just gather templates and run the V.choices.get_mm_configs once per op - enables any overrides inside V.choices.get_mm_configs to have a full view of the options for an op, not just for one template \# what - replace multiple calls to V.choices.get_mm_configs with calls to gather the active templates, and then using those in a single call - note: mm and addmm had an optimization where if there is no max-autotune, we use a FlexibleLayout. This optimization is now temporarily gone, and coming back in the next commit, as a generic optimization for everything that uses V.choices.get_mm_configs \# testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` [ghstack-poisoned]

pytorch-bot · 2025-08-23T02:58:20Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161350

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b4f77b0 with merge base 468c1f9 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…configs" \# why - now everything is in place to just gather templates and run the V.choices.get_mm_configs once per op - enables any overrides inside V.choices.get_mm_configs to have a full view of the options for an op, not just for one template \# what - replace multiple calls to V.choices.get_mm_configs with calls to gather the active templates, and then using those in a single call - note: mm and addmm had an optimization where if there is no max-autotune, we use a FlexibleLayout. This optimization is now temporarily gone, and coming back in the next commit, as a generic optimization for everything that uses V.choices.get_mm_configs \# testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]

…62293) # why - eventually we want all templates to go through this - we're exposing this through diode as a sort of interface/API - avoid later renaming # what - rename get_mm_configs to get_template_configs - rename _finalize_mm_configs to _finalize_template_configs # testing - lintrunner - ci Differential Revision: [D81820641](https://our.internmc.facebook.com/intern/diff/D81820641) Pull Request resolved: #162293 Approved by: https://github.com/eellison ghstack dependencies: #161351, #161350

# why enable caching/overriding/filtering based on src hash later # what - KernelTemplate has a src_hash that is None by default - sha256 on TritonTemplate of the template src code - None on ExternKernelChoice to have same API # testing n/a (not in use in this change) Differential Revision: [](https://our.internmc.facebook.com/intern/diff/) Differential Revision: [D81821149](https://our.internmc.facebook.com/intern/diff/D81821149) Pull Request resolved: #161468 Approved by: https://github.com/eellison ghstack dependencies: #161351, #161350, #162293

…torch#161350) # why - now everything is in place to just gather templates and run the V.choices.get_mm_configs once per op - enables any overrides inside V.choices.get_mm_configs to have a full view of the options for an op, not just for one template # what - replace multiple calls to V.choices.get_mm_configs with calls to gather the active templates, and then using those in a single call # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520571](https://our.internmc.facebook.com/intern/diff/D81520571) Pull Request resolved: pytorch#161350 Approved by: https://github.com/eellison, https://github.com/jansel ghstack dependencies: pytorch#161351

…torch#162293) # why - eventually we want all templates to go through this - we're exposing this through diode as a sort of interface/API - avoid later renaming # what - rename get_mm_configs to get_template_configs - rename _finalize_mm_configs to _finalize_template_configs # testing - lintrunner - ci Differential Revision: [D81820641](https://our.internmc.facebook.com/intern/diff/D81820641) Pull Request resolved: pytorch#162293 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350

@coconutruben

…figs (pytorch#161350)" This reverts commit 623e623. Reverted pytorch#161350 on behalf of https://github.com/huydhn due to Check with @coconutruben and the internal failures look real ([comment](pytorch#161351 (comment)))

…torch#161350) # why - now everything is in place to just gather templates and run the V.choices.get_mm_configs once per op - enables any overrides inside V.choices.get_mm_configs to have a full view of the options for an op, not just for one template # what - replace multiple calls to V.choices.get_mm_configs with calls to gather the active templates, and then using those in a single call # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520571](https://our.internmc.facebook.com/intern/diff/D81520571) Pull Request resolved: pytorch#161350 Approved by: https://github.com/eellison, https://github.com/jansel ghstack dependencies: pytorch#161351

…torch#162293) # why - eventually we want all templates to go through this - we're exposing this through diode as a sort of interface/API - avoid later renaming # what - rename get_mm_configs to get_template_configs - rename _finalize_mm_configs to _finalize_template_configs # testing - lintrunner - ci Differential Revision: [D81820641](https://our.internmc.facebook.com/intern/diff/D81820641) Pull Request resolved: pytorch#162293 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350

# why enable caching/overriding/filtering based on src hash later # what - KernelTemplate has a src_hash that is None by default - sha256 on TritonTemplate of the template src code - None on ExternKernelChoice to have same API # testing n/a (not in use in this change) Differential Revision: [](https://our.internmc.facebook.com/intern/diff/) Differential Revision: [D81821149](https://our.internmc.facebook.com/intern/diff/D81821149) Pull Request resolved: pytorch#161468 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350, pytorch#162293

…torch#161350) # why - now everything is in place to just gather templates and run the V.choices.get_mm_configs once per op - enables any overrides inside V.choices.get_mm_configs to have a full view of the options for an op, not just for one template # what - replace multiple calls to V.choices.get_mm_configs with calls to gather the active templates, and then using those in a single call # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520571](https://our.internmc.facebook.com/intern/diff/D81520571) Pull Request resolved: pytorch#161350 Approved by: https://github.com/eellison, https://github.com/jansel ghstack dependencies: pytorch#161351

…torch#162293) # why - eventually we want all templates to go through this - we're exposing this through diode as a sort of interface/API - avoid later renaming # what - rename get_mm_configs to get_template_configs - rename _finalize_mm_configs to _finalize_template_configs # testing - lintrunner - ci Differential Revision: [D81820641](https://our.internmc.facebook.com/intern/diff/D81820641) Pull Request resolved: pytorch#162293 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350

@coconutruben

…figs (pytorch#161350)" This reverts commit 623e623. Reverted pytorch#161350 on behalf of https://github.com/huydhn due to Check with @coconutruben and the internal failures look real ([comment](pytorch#161351 (comment)))

…torch#161350) # why - now everything is in place to just gather templates and run the V.choices.get_mm_configs once per op - enables any overrides inside V.choices.get_mm_configs to have a full view of the options for an op, not just for one template # what - replace multiple calls to V.choices.get_mm_configs with calls to gather the active templates, and then using those in a single call # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520571](https://our.internmc.facebook.com/intern/diff/D81520571) Pull Request resolved: pytorch#161350 Approved by: https://github.com/eellison, https://github.com/jansel ghstack dependencies: pytorch#161351

…torch#162293) # why - eventually we want all templates to go through this - we're exposing this through diode as a sort of interface/API - avoid later renaming # what - rename get_mm_configs to get_template_configs - rename _finalize_mm_configs to _finalize_template_configs # testing - lintrunner - ci Differential Revision: [D81820641](https://our.internmc.facebook.com/intern/diff/D81820641) Pull Request resolved: pytorch#162293 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350

# why enable caching/overriding/filtering based on src hash later # what - KernelTemplate has a src_hash that is None by default - sha256 on TritonTemplate of the template src code - None on ExternKernelChoice to have same API # testing n/a (not in use in this change) Differential Revision: [](https://our.internmc.facebook.com/intern/diff/) Differential Revision: [D81821149](https://our.internmc.facebook.com/intern/diff/D81821149) Pull Request resolved: pytorch#161468 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350, pytorch#162293

…torch#161350) # why - now everything is in place to just gather templates and run the V.choices.get_mm_configs once per op - enables any overrides inside V.choices.get_mm_configs to have a full view of the options for an op, not just for one template # what - replace multiple calls to V.choices.get_mm_configs with calls to gather the active templates, and then using those in a single call # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520571](https://our.internmc.facebook.com/intern/diff/D81520571) Pull Request resolved: pytorch#161350 Approved by: https://github.com/eellison, https://github.com/jansel ghstack dependencies: pytorch#161351

…torch#162293) # why - eventually we want all templates to go through this - we're exposing this through diode as a sort of interface/API - avoid later renaming # what - rename get_mm_configs to get_template_configs - rename _finalize_mm_configs to _finalize_template_configs # testing - lintrunner - ci Differential Revision: [D81820641](https://our.internmc.facebook.com/intern/diff/D81820641) Pull Request resolved: pytorch#162293 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350

@coconutruben

…figs (pytorch#161350)" This reverts commit 623e623. Reverted pytorch#161350 on behalf of https://github.com/huydhn due to Check with @coconutruben and the internal failures look real ([comment](pytorch#161351 (comment)))

…torch#161350) # why - now everything is in place to just gather templates and run the V.choices.get_mm_configs once per op - enables any overrides inside V.choices.get_mm_configs to have a full view of the options for an op, not just for one template # what - replace multiple calls to V.choices.get_mm_configs with calls to gather the active templates, and then using those in a single call # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520571](https://our.internmc.facebook.com/intern/diff/D81520571) Pull Request resolved: pytorch#161350 Approved by: https://github.com/eellison, https://github.com/jansel ghstack dependencies: pytorch#161351

…torch#162293) # why - eventually we want all templates to go through this - we're exposing this through diode as a sort of interface/API - avoid later renaming # what - rename get_mm_configs to get_template_configs - rename _finalize_mm_configs to _finalize_template_configs # testing - lintrunner - ci Differential Revision: [D81820641](https://our.internmc.facebook.com/intern/diff/D81820641) Pull Request resolved: pytorch#162293 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350

# why enable caching/overriding/filtering based on src hash later # what - KernelTemplate has a src_hash that is None by default - sha256 on TritonTemplate of the template src code - None on ExternKernelChoice to have same API # testing n/a (not in use in this change) Differential Revision: [](https://our.internmc.facebook.com/intern/diff/) Differential Revision: [D81821149](https://our.internmc.facebook.com/intern/diff/D81821149) Pull Request resolved: pytorch#161468 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350, pytorch#162293

…torch#161350) # why - now everything is in place to just gather templates and run the V.choices.get_mm_configs once per op - enables any overrides inside V.choices.get_mm_configs to have a full view of the options for an op, not just for one template # what - replace multiple calls to V.choices.get_mm_configs with calls to gather the active templates, and then using those in a single call # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520571](https://our.internmc.facebook.com/intern/diff/D81520571) Pull Request resolved: pytorch#161350 Approved by: https://github.com/eellison, https://github.com/jansel ghstack dependencies: pytorch#161351

…torch#162293) # why - eventually we want all templates to go through this - we're exposing this through diode as a sort of interface/API - avoid later renaming # what - rename get_mm_configs to get_template_configs - rename _finalize_mm_configs to _finalize_template_configs # testing - lintrunner - ci Differential Revision: [D81820641](https://our.internmc.facebook.com/intern/diff/D81820641) Pull Request resolved: pytorch#162293 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350

@coconutruben

…figs (pytorch#161350)" This reverts commit 623e623. Reverted pytorch#161350 on behalf of https://github.com/huydhn due to Check with @coconutruben and the internal failures look real ([comment](pytorch#161351 (comment)))

…torch#161350) # why - now everything is in place to just gather templates and run the V.choices.get_mm_configs once per op - enables any overrides inside V.choices.get_mm_configs to have a full view of the options for an op, not just for one template # what - replace multiple calls to V.choices.get_mm_configs with calls to gather the active templates, and then using those in a single call # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520571](https://our.internmc.facebook.com/intern/diff/D81520571) Pull Request resolved: pytorch#161350 Approved by: https://github.com/eellison, https://github.com/jansel ghstack dependencies: pytorch#161351

…torch#162293) # why - eventually we want all templates to go through this - we're exposing this through diode as a sort of interface/API - avoid later renaming # what - rename get_mm_configs to get_template_configs - rename _finalize_mm_configs to _finalize_template_configs # testing - lintrunner - ci Differential Revision: [D81820641](https://our.internmc.facebook.com/intern/diff/D81820641) Pull Request resolved: pytorch#162293 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350

# why enable caching/overriding/filtering based on src hash later # what - KernelTemplate has a src_hash that is None by default - sha256 on TritonTemplate of the template src code - None on ExternKernelChoice to have same API # testing n/a (not in use in this change) Differential Revision: [](https://our.internmc.facebook.com/intern/diff/) Differential Revision: [D81821149](https://our.internmc.facebook.com/intern/diff/D81821149) Pull Request resolved: pytorch#161468 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350, pytorch#162293

\# why - now everything is in place to just gather templates and run the V.choices.get_mm_configs once per op - enables any overrides inside V.choices.get_mm_configs to have a full view of the options for an op, not just for one template \# what - replace multiple calls to V.choices.get_mm_configs with calls to gather the active templates, and then using those in a single call - turn on TRITON for FlexibleLayout again. This is designed to work when running max-autotune but for some reason e.g. through overriding, we end up autotuning only over ATEN choices. In that case, we still want to use a FlexibleLayout instead of making it Fixed \# testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` ghstack-source-id: fe05a43 Pull Request resolved: pytorch/pytorch#161350

pytorch-bot bot added ciflow/inductor module: inductor labels Aug 23, 2025

coconutruben added the topic: not user facing topic category label Aug 23, 2025

coconutruben added 2 commits August 22, 2025 20:05

pytorchmergebot closed this in a326ef3 Sep 12, 2025

github-actions bot deleted the gh/coconutruben/51/head branch October 13, 2025 02:15

Lucaskabela mentioned this pull request Nov 4, 2025

[Inductor] No longer throw error in bmm out_dtype lowering due to tem… #166922

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[inductor] leverage template stacking in V.choices.get_mm_configs #161350

[inductor] leverage template stacking in V.choices.get_mm_configs #161350

Uh oh!

coconutruben commented Aug 23, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Aug 23, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[inductor] leverage template stacking in V.choices.get_mm_configs #161350

[inductor] leverage template stacking in V.choices.get_mm_configs #161350

Uh oh!

Conversation

coconutruben commented Aug 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

why

what

testing

Uh oh!

pytorch-bot bot commented Aug 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161350

✅ No Failures

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

coconutruben commented Aug 23, 2025 •

edited

Loading

pytorch-bot bot commented Aug 23, 2025 •

edited

Loading