-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[inductor] FlexibleLayout for ExternKernelChoice for mms #161351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
\# why - if we only use ExternKernelChoice we're not doing any codegen - if we're not doing any codegen, we can use a FlexibleLayout here, and provide deeper passes more chances to change it \# what - if all the kernel template choices (KTC) are with a ExternKernelChoice template, we switch to a FlexibleLayout before generating the choice - add a test to make sure that works as intended (FlexibleLayout for only extern, and FixedLayout if Triton is involved) - caveats: because CPP, CUTLASS, and CK are not using V.choices.get_mm_configs yet, we turn off the optimization if either of those backends are in use. This will be relaxed once they support this too \# testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` [ghstack-poisoned]
This was referenced Aug 23, 2025
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161351
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit e08c924 with merge base 468c1f9 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This was referenced Aug 21, 2025
coconutruben
added a commit
that referenced
this pull request
Aug 23, 2025
\# why - if we only use ExternKernelChoice we're not doing any codegen - if we're not doing any codegen, we can use a FlexibleLayout here, and provide deeper passes more chances to change it \# what - if all the kernel template choices (KTC) are with a ExternKernelChoice template, we switch to a FlexibleLayout before generating the choice - add a test to make sure that works as intended (FlexibleLayout for only extern, and FixedLayout if Triton is involved) - caveats: because CPP, CUTLASS, and CK are not using V.choices.get_mm_configs yet, we turn off the optimization if either of those backends are in use. This will be relaxed once they support this too \# testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` ghstack-source-id: 725eb11 Pull Request resolved: #161351
\# why - if we only use ExternKernelChoice we're not doing any codegen - if we're not doing any codegen, we can use a FlexibleLayout here, and provide deeper passes more chances to change it \# what - if all the kernel template choices (KTC) are with a ExternKernelChoice template, we switch to a FlexibleLayout before generating the choice - add a test to make sure that works as intended (FlexibleLayout for only extern, and FixedLayout if Triton is involved) - caveats: because CPP, CUTLASS, and CK are not using V.choices.get_mm_configs yet, we turn off the optimization if either of those backends are in use. This will be relaxed once they support this too \# testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
mansiag05
pushed a commit
to mansiag05/pytorch
that referenced
this pull request
Sep 22, 2025
…torch#162293) # why - eventually we want all templates to go through this - we're exposing this through diode as a sort of interface/API - avoid later renaming # what - rename get_mm_configs to get_template_configs - rename _finalize_mm_configs to _finalize_template_configs # testing - lintrunner - ci Differential Revision: [D81820641](https://our.internmc.facebook.com/intern/diff/D81820641) Pull Request resolved: pytorch#162293 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350
mansiag05
pushed a commit
to mansiag05/pytorch
that referenced
this pull request
Sep 22, 2025
…figs (pytorch#162293)" This reverts commit 30191fc. Reverted pytorch#162293 on behalf of https://github.com/huydhn due to Check with @coconutruben and the internal failures look real ([comment](pytorch#161351 (comment)))
mansiag05
pushed a commit
to mansiag05/pytorch
that referenced
this pull request
Sep 22, 2025
…figs (pytorch#161350)" This reverts commit 623e623. Reverted pytorch#161350 on behalf of https://github.com/huydhn due to Check with @coconutruben and the internal failures look real ([comment](pytorch#161351 (comment)))
mansiag05
pushed a commit
to mansiag05/pytorch
that referenced
this pull request
Sep 22, 2025
…orch#161351)" This reverts commit f08487a. Reverted pytorch#161351 on behalf of https://github.com/huydhn due to Check with @coconutruben and the internal failures look real ([comment](pytorch#161351 (comment)))
mansiag05
pushed a commit
to mansiag05/pytorch
that referenced
this pull request
Sep 22, 2025
) # why - if we only use ExternKernelChoice we're not doing any codegen - if we're not doing any codegen, we can use a FlexibleLayout here, and provide deeper passes more chances to change it # what - if all the kernel template choices (KTC) are with a ExternKernelChoice template, we switch to a FlexibleLayout before generating the choice - add a test to make sure that works as intended (FlexibleLayout for only extern, and FixedLayout if Triton is involved) - caveats: - because CPP, CUTLASS, and CK are not using V.choices.get_mm_configs yet, we turn off the optimization if either of those backends are in use. This will be relaxed once they support this too - because Triton templates are still using their own calls (not a single call) to get_mm_configs, it's also turned off there. The next diff unifies Triton + ATEN to a single call to get_mm_configs and that in turn allows the optimization there too # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520584](https://our.internmc.facebook.com/intern/diff/D81520584) Pull Request resolved: pytorch#161351 Approved by: https://github.com/eellison, https://github.com/jansel
mansiag05
pushed a commit
to mansiag05/pytorch
that referenced
this pull request
Sep 22, 2025
…torch#161350) # why - now everything is in place to just gather templates and run the V.choices.get_mm_configs once per op - enables any overrides inside V.choices.get_mm_configs to have a full view of the options for an op, not just for one template # what - replace multiple calls to V.choices.get_mm_configs with calls to gather the active templates, and then using those in a single call # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520571](https://our.internmc.facebook.com/intern/diff/D81520571) Pull Request resolved: pytorch#161350 Approved by: https://github.com/eellison, https://github.com/jansel ghstack dependencies: pytorch#161351
mansiag05
pushed a commit
to mansiag05/pytorch
that referenced
this pull request
Sep 22, 2025
…torch#162293) # why - eventually we want all templates to go through this - we're exposing this through diode as a sort of interface/API - avoid later renaming # what - rename get_mm_configs to get_template_configs - rename _finalize_mm_configs to _finalize_template_configs # testing - lintrunner - ci Differential Revision: [D81820641](https://our.internmc.facebook.com/intern/diff/D81820641) Pull Request resolved: pytorch#162293 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350
mansiag05
pushed a commit
to mansiag05/pytorch
that referenced
this pull request
Sep 22, 2025
# why enable caching/overriding/filtering based on src hash later # what - KernelTemplate has a src_hash that is None by default - sha256 on TritonTemplate of the template src code - None on ExternKernelChoice to have same API # testing n/a (not in use in this change) Differential Revision: [](https://our.internmc.facebook.com/intern/diff/) Differential Revision: [D81821149](https://our.internmc.facebook.com/intern/diff/D81821149) Pull Request resolved: pytorch#161468 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350, pytorch#162293
cleonard530
pushed a commit
to cleonard530/pytorch
that referenced
this pull request
Sep 22, 2025
) # why - if we only use ExternKernelChoice we're not doing any codegen - if we're not doing any codegen, we can use a FlexibleLayout here, and provide deeper passes more chances to change it # what - if all the kernel template choices (KTC) are with a ExternKernelChoice template, we switch to a FlexibleLayout before generating the choice - add a test to make sure that works as intended (FlexibleLayout for only extern, and FixedLayout if Triton is involved) - caveats: - because CPP, CUTLASS, and CK are not using V.choices.get_mm_configs yet, we turn off the optimization if either of those backends are in use. This will be relaxed once they support this too - because Triton templates are still using their own calls (not a single call) to get_mm_configs, it's also turned off there. The next diff unifies Triton + ATEN to a single call to get_mm_configs and that in turn allows the optimization there too # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520584](https://our.internmc.facebook.com/intern/diff/D81520584) Pull Request resolved: pytorch#161351 Approved by: https://github.com/eellison, https://github.com/jansel
cleonard530
pushed a commit
to cleonard530/pytorch
that referenced
this pull request
Sep 22, 2025
…torch#161350) # why - now everything is in place to just gather templates and run the V.choices.get_mm_configs once per op - enables any overrides inside V.choices.get_mm_configs to have a full view of the options for an op, not just for one template # what - replace multiple calls to V.choices.get_mm_configs with calls to gather the active templates, and then using those in a single call # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520571](https://our.internmc.facebook.com/intern/diff/D81520571) Pull Request resolved: pytorch#161350 Approved by: https://github.com/eellison, https://github.com/jansel ghstack dependencies: pytorch#161351
cleonard530
pushed a commit
to cleonard530/pytorch
that referenced
this pull request
Sep 22, 2025
…torch#162293) # why - eventually we want all templates to go through this - we're exposing this through diode as a sort of interface/API - avoid later renaming # what - rename get_mm_configs to get_template_configs - rename _finalize_mm_configs to _finalize_template_configs # testing - lintrunner - ci Differential Revision: [D81820641](https://our.internmc.facebook.com/intern/diff/D81820641) Pull Request resolved: pytorch#162293 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350
cleonard530
pushed a commit
to cleonard530/pytorch
that referenced
this pull request
Sep 22, 2025
…figs (pytorch#162293)" This reverts commit 30191fc. Reverted pytorch#162293 on behalf of https://github.com/huydhn due to Check with @coconutruben and the internal failures look real ([comment](pytorch#161351 (comment)))
cleonard530
pushed a commit
to cleonard530/pytorch
that referenced
this pull request
Sep 22, 2025
…figs (pytorch#161350)" This reverts commit 623e623. Reverted pytorch#161350 on behalf of https://github.com/huydhn due to Check with @coconutruben and the internal failures look real ([comment](pytorch#161351 (comment)))
cleonard530
pushed a commit
to cleonard530/pytorch
that referenced
this pull request
Sep 22, 2025
…orch#161351)" This reverts commit f08487a. Reverted pytorch#161351 on behalf of https://github.com/huydhn due to Check with @coconutruben and the internal failures look real ([comment](pytorch#161351 (comment)))
cleonard530
pushed a commit
to cleonard530/pytorch
that referenced
this pull request
Sep 22, 2025
) # why - if we only use ExternKernelChoice we're not doing any codegen - if we're not doing any codegen, we can use a FlexibleLayout here, and provide deeper passes more chances to change it # what - if all the kernel template choices (KTC) are with a ExternKernelChoice template, we switch to a FlexibleLayout before generating the choice - add a test to make sure that works as intended (FlexibleLayout for only extern, and FixedLayout if Triton is involved) - caveats: - because CPP, CUTLASS, and CK are not using V.choices.get_mm_configs yet, we turn off the optimization if either of those backends are in use. This will be relaxed once they support this too - because Triton templates are still using their own calls (not a single call) to get_mm_configs, it's also turned off there. The next diff unifies Triton + ATEN to a single call to get_mm_configs and that in turn allows the optimization there too # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520584](https://our.internmc.facebook.com/intern/diff/D81520584) Pull Request resolved: pytorch#161351 Approved by: https://github.com/eellison, https://github.com/jansel
cleonard530
pushed a commit
to cleonard530/pytorch
that referenced
this pull request
Sep 22, 2025
…torch#161350) # why - now everything is in place to just gather templates and run the V.choices.get_mm_configs once per op - enables any overrides inside V.choices.get_mm_configs to have a full view of the options for an op, not just for one template # what - replace multiple calls to V.choices.get_mm_configs with calls to gather the active templates, and then using those in a single call # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520571](https://our.internmc.facebook.com/intern/diff/D81520571) Pull Request resolved: pytorch#161350 Approved by: https://github.com/eellison, https://github.com/jansel ghstack dependencies: pytorch#161351
cleonard530
pushed a commit
to cleonard530/pytorch
that referenced
this pull request
Sep 22, 2025
…torch#162293) # why - eventually we want all templates to go through this - we're exposing this through diode as a sort of interface/API - avoid later renaming # what - rename get_mm_configs to get_template_configs - rename _finalize_mm_configs to _finalize_template_configs # testing - lintrunner - ci Differential Revision: [D81820641](https://our.internmc.facebook.com/intern/diff/D81820641) Pull Request resolved: pytorch#162293 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350
cleonard530
pushed a commit
to cleonard530/pytorch
that referenced
this pull request
Sep 22, 2025
# why enable caching/overriding/filtering based on src hash later # what - KernelTemplate has a src_hash that is None by default - sha256 on TritonTemplate of the template src code - None on ExternKernelChoice to have same API # testing n/a (not in use in this change) Differential Revision: [](https://our.internmc.facebook.com/intern/diff/) Differential Revision: [D81821149](https://our.internmc.facebook.com/intern/diff/D81821149) Pull Request resolved: pytorch#161468 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350, pytorch#162293
dsashidh
pushed a commit
to dsashidh/pytorch
that referenced
this pull request
Sep 26, 2025
) # why - if we only use ExternKernelChoice we're not doing any codegen - if we're not doing any codegen, we can use a FlexibleLayout here, and provide deeper passes more chances to change it # what - if all the kernel template choices (KTC) are with a ExternKernelChoice template, we switch to a FlexibleLayout before generating the choice - add a test to make sure that works as intended (FlexibleLayout for only extern, and FixedLayout if Triton is involved) - caveats: - because CPP, CUTLASS, and CK are not using V.choices.get_mm_configs yet, we turn off the optimization if either of those backends are in use. This will be relaxed once they support this too - because Triton templates are still using their own calls (not a single call) to get_mm_configs, it's also turned off there. The next diff unifies Triton + ATEN to a single call to get_mm_configs and that in turn allows the optimization there too # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520584](https://our.internmc.facebook.com/intern/diff/D81520584) Pull Request resolved: pytorch#161351 Approved by: https://github.com/eellison, https://github.com/jansel
dsashidh
pushed a commit
to dsashidh/pytorch
that referenced
this pull request
Sep 26, 2025
…torch#161350) # why - now everything is in place to just gather templates and run the V.choices.get_mm_configs once per op - enables any overrides inside V.choices.get_mm_configs to have a full view of the options for an op, not just for one template # what - replace multiple calls to V.choices.get_mm_configs with calls to gather the active templates, and then using those in a single call # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520571](https://our.internmc.facebook.com/intern/diff/D81520571) Pull Request resolved: pytorch#161350 Approved by: https://github.com/eellison, https://github.com/jansel ghstack dependencies: pytorch#161351
dsashidh
pushed a commit
to dsashidh/pytorch
that referenced
this pull request
Sep 26, 2025
…torch#162293) # why - eventually we want all templates to go through this - we're exposing this through diode as a sort of interface/API - avoid later renaming # what - rename get_mm_configs to get_template_configs - rename _finalize_mm_configs to _finalize_template_configs # testing - lintrunner - ci Differential Revision: [D81820641](https://our.internmc.facebook.com/intern/diff/D81820641) Pull Request resolved: pytorch#162293 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350
dsashidh
pushed a commit
to dsashidh/pytorch
that referenced
this pull request
Sep 26, 2025
…figs (pytorch#162293)" This reverts commit 30191fc. Reverted pytorch#162293 on behalf of https://github.com/huydhn due to Check with @coconutruben and the internal failures look real ([comment](pytorch#161351 (comment)))
dsashidh
pushed a commit
to dsashidh/pytorch
that referenced
this pull request
Sep 26, 2025
…figs (pytorch#161350)" This reverts commit 623e623. Reverted pytorch#161350 on behalf of https://github.com/huydhn due to Check with @coconutruben and the internal failures look real ([comment](pytorch#161351 (comment)))
dsashidh
pushed a commit
to dsashidh/pytorch
that referenced
this pull request
Sep 26, 2025
…orch#161351)" This reverts commit f08487a. Reverted pytorch#161351 on behalf of https://github.com/huydhn due to Check with @coconutruben and the internal failures look real ([comment](pytorch#161351 (comment)))
dsashidh
pushed a commit
to dsashidh/pytorch
that referenced
this pull request
Sep 26, 2025
) # why - if we only use ExternKernelChoice we're not doing any codegen - if we're not doing any codegen, we can use a FlexibleLayout here, and provide deeper passes more chances to change it # what - if all the kernel template choices (KTC) are with a ExternKernelChoice template, we switch to a FlexibleLayout before generating the choice - add a test to make sure that works as intended (FlexibleLayout for only extern, and FixedLayout if Triton is involved) - caveats: - because CPP, CUTLASS, and CK are not using V.choices.get_mm_configs yet, we turn off the optimization if either of those backends are in use. This will be relaxed once they support this too - because Triton templates are still using their own calls (not a single call) to get_mm_configs, it's also turned off there. The next diff unifies Triton + ATEN to a single call to get_mm_configs and that in turn allows the optimization there too # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520584](https://our.internmc.facebook.com/intern/diff/D81520584) Pull Request resolved: pytorch#161351 Approved by: https://github.com/eellison, https://github.com/jansel
dsashidh
pushed a commit
to dsashidh/pytorch
that referenced
this pull request
Sep 26, 2025
…torch#161350) # why - now everything is in place to just gather templates and run the V.choices.get_mm_configs once per op - enables any overrides inside V.choices.get_mm_configs to have a full view of the options for an op, not just for one template # what - replace multiple calls to V.choices.get_mm_configs with calls to gather the active templates, and then using those in a single call # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520571](https://our.internmc.facebook.com/intern/diff/D81520571) Pull Request resolved: pytorch#161350 Approved by: https://github.com/eellison, https://github.com/jansel ghstack dependencies: pytorch#161351
dsashidh
pushed a commit
to dsashidh/pytorch
that referenced
this pull request
Sep 26, 2025
…torch#162293) # why - eventually we want all templates to go through this - we're exposing this through diode as a sort of interface/API - avoid later renaming # what - rename get_mm_configs to get_template_configs - rename _finalize_mm_configs to _finalize_template_configs # testing - lintrunner - ci Differential Revision: [D81820641](https://our.internmc.facebook.com/intern/diff/D81820641) Pull Request resolved: pytorch#162293 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350
dsashidh
pushed a commit
to dsashidh/pytorch
that referenced
this pull request
Sep 26, 2025
# why enable caching/overriding/filtering based on src hash later # what - KernelTemplate has a src_hash that is None by default - sha256 on TritonTemplate of the template src code - None on ExternKernelChoice to have same API # testing n/a (not in use in this change) Differential Revision: [](https://our.internmc.facebook.com/intern/diff/) Differential Revision: [D81821149](https://our.internmc.facebook.com/intern/diff/D81821149) Pull Request resolved: pytorch#161468 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350, pytorch#162293
Khanaksahu
pushed a commit
to Khanaksahu/pytorch-fork
that referenced
this pull request
Nov 17, 2025
\# why
- if we only use ExternKernelChoice we're not doing any codegen
- if we're not doing any codegen, we can use a FlexibleLayout
here, and provide deeper passes more chances to change it
\# what
- if all the kernel template choices (KTC) are with a ExternKernelChoice
template, we switch to a FlexibleLayout before generating the choice
- add a test to make sure that works as intended (FlexibleLayout for
only extern, and FixedLayout if Triton is involved)
- caveats:
- because CPP, CUTLASS, and CK are not using
V.choices.get_mm_configs yet, we turn off the optimization
if either of those backends are in use. This will be relaxed
once they support this too
- because Triton templates are still using their own calls
(not a single call) to get_mm_configs, it's also turned
off there. The next diff unifies Triton + ATEN to a single
call to get_mm_configs and that in turn allows the optimization
there too
\# testing
```
python3 -bb -m pytest test/inductor/test_max_autotune.py -v
python3 -bb -m pytest test/inductor/test_torchinductor.py::TritonCodeGenTests::test_donated_buffer_inplace_gpt
```
ghstack-source-id: 93a4209
Pull Request resolved: pytorch/pytorch#161351
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
ci-no-td
Do not run TD on this PR
ciflow/inductor
ciflow/trunk
Trigger trunk jobs on your pull request
Merged
module: inductor
Reverted
topic: not user facing
topic category
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Stack from ghstack (oldest at bottom):
why
here, and provide deeper passes more chances to change it
what
if all the kernel template choices (KTC) are with a ExternKernelChoice
template, we switch to a FlexibleLayout before generating the choice
add a test to make sure that works as intended (FlexibleLayout for
only extern, and FixedLayout if Triton is involved)
caveats:
V.choices.get_mm_configs yet, we turn off the optimization
if either of those backends are in use. This will be relaxed
once they support this too
(not a single call) to get_mm_configs, it's also turned
off there. The next diff unifies Triton + ATEN to a single
call to get_mm_configs and that in turn allows the optimization
there too
testing
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov
Differential Revision: D81520584