Skip to content

Conversation

@coconutruben
Copy link
Contributor

@coconutruben coconutruben commented Aug 23, 2025

Stack from ghstack (oldest at bottom):

why

  • if we only use ExternKernelChoice we're not doing any codegen
  • if we're not doing any codegen, we can use a FlexibleLayout
    here, and provide deeper passes more chances to change it

what

  • if all the kernel template choices (KTC) are with a ExternKernelChoice
    template, we switch to a FlexibleLayout before generating the choice

  • add a test to make sure that works as intended (FlexibleLayout for
    only extern, and FixedLayout if Triton is involved)

  • caveats:

    • because CPP, CUTLASS, and CK are not using
      V.choices.get_mm_configs yet, we turn off the optimization
      if either of those backends are in use. This will be relaxed
      once they support this too
    • because Triton templates are still using their own calls
      (not a single call) to get_mm_configs, it's also turned
      off there. The next diff unifies Triton + ATEN to a single
      call to get_mm_configs and that in turn allows the optimization
      there too

testing

python3 -bb -m pytest test/inductor/test_max_autotune.py -v

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

Differential Revision: D81520584

\# why

- if we only use ExternKernelChoice we're not doing any codegen
- if we're not doing any codegen, we can use a FlexibleLayout
  here, and provide deeper passes more chances to change it

\# what

- if all the kernel template choices (KTC) are with a ExternKernelChoice
  template, we switch to a FlexibleLayout before generating the choice
- add a test to make sure that works as intended (FlexibleLayout for
  only extern, and FixedLayout if Triton is involved)

- caveats: because CPP, CUTLASS, and CK are not using
  V.choices.get_mm_configs yet, we turn off the optimization
  if either of those backends are in use. This will be relaxed
  once they support this too

\# testing

```
python3 -bb -m pytest test/inductor/test_max_autotune.py -v
```

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 23, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161351

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e08c924 with merge base 468c1f9 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

coconutruben added a commit that referenced this pull request Aug 23, 2025
\# why

- if we only use ExternKernelChoice we're not doing any codegen
- if we're not doing any codegen, we can use a FlexibleLayout
  here, and provide deeper passes more chances to change it

\# what

- if all the kernel template choices (KTC) are with a ExternKernelChoice
  template, we switch to a FlexibleLayout before generating the choice
- add a test to make sure that works as intended (FlexibleLayout for
  only extern, and FixedLayout if Triton is involved)

- caveats: because CPP, CUTLASS, and CK are not using
  V.choices.get_mm_configs yet, we turn off the optimization
  if either of those backends are in use. This will be relaxed
  once they support this too

\# testing

```
python3 -bb -m pytest test/inductor/test_max_autotune.py -v
```

ghstack-source-id: 725eb11
Pull Request resolved: #161351
@coconutruben coconutruben added the topic: not user facing topic category label Aug 23, 2025
\# why

- if we only use ExternKernelChoice we're not doing any codegen
- if we're not doing any codegen, we can use a FlexibleLayout
  here, and provide deeper passes more chances to change it

\# what

- if all the kernel template choices (KTC) are with a ExternKernelChoice
  template, we switch to a FlexibleLayout before generating the choice
- add a test to make sure that works as intended (FlexibleLayout for
  only extern, and FixedLayout if Triton is involved)

- caveats: because CPP, CUTLASS, and CK are not using
  V.choices.get_mm_configs yet, we turn off the optimization
  if either of those backends are in use. This will be relaxed
  once they support this too

\# testing

```
python3 -bb -m pytest test/inductor/test_max_autotune.py -v
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
…torch#162293)

# why

- eventually we want all templates to go through this
- we're exposing this through diode as a sort of interface/API
- avoid later renaming

# what

- rename get_mm_configs to get_template_configs
- rename _finalize_mm_configs to _finalize_template_configs

# testing

- lintrunner
- ci

Differential Revision: [D81820641](https://our.internmc.facebook.com/intern/diff/D81820641)
Pull Request resolved: pytorch#162293
Approved by: https://github.com/eellison
ghstack dependencies: pytorch#161351, pytorch#161350
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
)

# why

- if we only use ExternKernelChoice we're not doing any codegen
- if we're not doing any codegen, we can use a FlexibleLayout
  here, and provide deeper passes more chances to change it

# what

- if all the kernel template choices (KTC) are with a ExternKernelChoice
  template, we switch to a FlexibleLayout before generating the choice
- add a test to make sure that works as intended (FlexibleLayout for
  only extern, and FixedLayout if Triton is involved)

- caveats:
    - because CPP, CUTLASS, and CK are not using
       V.choices.get_mm_configs yet, we turn off the optimization
       if either of those backends are in use. This will be relaxed
       once they support this too
    - because Triton templates are still using their own calls
       (not a single call) to get_mm_configs, it's also turned
       off there. The next diff unifies Triton + ATEN to a single
       call to get_mm_configs and that in turn allows the optimization
       there too

# testing

```
python3 -bb -m pytest test/inductor/test_max_autotune.py -v
```

Differential Revision: [D81520584](https://our.internmc.facebook.com/intern/diff/D81520584)
Pull Request resolved: pytorch#161351
Approved by: https://github.com/eellison, https://github.com/jansel
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
…torch#161350)

# why

- now everything is in place to just gather templates and run
  the V.choices.get_mm_configs once per op
- enables any overrides inside V.choices.get_mm_configs to
  have a full view of the options for an op, not just for
  one template

# what

- replace multiple calls to V.choices.get_mm_configs with
  calls to gather the active templates, and then using those
  in a single call

# testing

```
python3 -bb -m pytest test/inductor/test_max_autotune.py -v
```

Differential Revision: [D81520571](https://our.internmc.facebook.com/intern/diff/D81520571)
Pull Request resolved: pytorch#161350
Approved by: https://github.com/eellison, https://github.com/jansel
ghstack dependencies: pytorch#161351
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
…torch#162293)

# why

- eventually we want all templates to go through this
- we're exposing this through diode as a sort of interface/API
- avoid later renaming

# what

- rename get_mm_configs to get_template_configs
- rename _finalize_mm_configs to _finalize_template_configs

# testing

- lintrunner
- ci

Differential Revision: [D81820641](https://our.internmc.facebook.com/intern/diff/D81820641)
Pull Request resolved: pytorch#162293
Approved by: https://github.com/eellison
ghstack dependencies: pytorch#161351, pytorch#161350
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
# why

enable caching/overriding/filtering based on src hash later

# what

- KernelTemplate has a src_hash that is None by default
- sha256 on TritonTemplate of the template src code
- None on ExternKernelChoice to have same API

# testing

n/a (not in use in this change)

Differential Revision: [](https://our.internmc.facebook.com/intern/diff/)

Differential Revision: [D81821149](https://our.internmc.facebook.com/intern/diff/D81821149)
Pull Request resolved: pytorch#161468
Approved by: https://github.com/eellison
ghstack dependencies: pytorch#161351, pytorch#161350, pytorch#162293
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
)

# why

- if we only use ExternKernelChoice we're not doing any codegen
- if we're not doing any codegen, we can use a FlexibleLayout
  here, and provide deeper passes more chances to change it

# what

- if all the kernel template choices (KTC) are with a ExternKernelChoice
  template, we switch to a FlexibleLayout before generating the choice
- add a test to make sure that works as intended (FlexibleLayout for
  only extern, and FixedLayout if Triton is involved)

- caveats:
    - because CPP, CUTLASS, and CK are not using
       V.choices.get_mm_configs yet, we turn off the optimization
       if either of those backends are in use. This will be relaxed
       once they support this too
    - because Triton templates are still using their own calls
       (not a single call) to get_mm_configs, it's also turned
       off there. The next diff unifies Triton + ATEN to a single
       call to get_mm_configs and that in turn allows the optimization
       there too

# testing

```
python3 -bb -m pytest test/inductor/test_max_autotune.py -v
```

Differential Revision: [D81520584](https://our.internmc.facebook.com/intern/diff/D81520584)
Pull Request resolved: pytorch#161351
Approved by: https://github.com/eellison, https://github.com/jansel
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
…torch#161350)

# why

- now everything is in place to just gather templates and run
  the V.choices.get_mm_configs once per op
- enables any overrides inside V.choices.get_mm_configs to
  have a full view of the options for an op, not just for
  one template

# what

- replace multiple calls to V.choices.get_mm_configs with
  calls to gather the active templates, and then using those
  in a single call

# testing

```
python3 -bb -m pytest test/inductor/test_max_autotune.py -v
```

Differential Revision: [D81520571](https://our.internmc.facebook.com/intern/diff/D81520571)
Pull Request resolved: pytorch#161350
Approved by: https://github.com/eellison, https://github.com/jansel
ghstack dependencies: pytorch#161351
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
…torch#162293)

# why

- eventually we want all templates to go through this
- we're exposing this through diode as a sort of interface/API
- avoid later renaming

# what

- rename get_mm_configs to get_template_configs
- rename _finalize_mm_configs to _finalize_template_configs

# testing

- lintrunner
- ci

Differential Revision: [D81820641](https://our.internmc.facebook.com/intern/diff/D81820641)
Pull Request resolved: pytorch#162293
Approved by: https://github.com/eellison
ghstack dependencies: pytorch#161351, pytorch#161350
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
)

# why

- if we only use ExternKernelChoice we're not doing any codegen
- if we're not doing any codegen, we can use a FlexibleLayout
  here, and provide deeper passes more chances to change it

# what

- if all the kernel template choices (KTC) are with a ExternKernelChoice
  template, we switch to a FlexibleLayout before generating the choice
- add a test to make sure that works as intended (FlexibleLayout for
  only extern, and FixedLayout if Triton is involved)

- caveats:
    - because CPP, CUTLASS, and CK are not using
       V.choices.get_mm_configs yet, we turn off the optimization
       if either of those backends are in use. This will be relaxed
       once they support this too
    - because Triton templates are still using their own calls
       (not a single call) to get_mm_configs, it's also turned
       off there. The next diff unifies Triton + ATEN to a single
       call to get_mm_configs and that in turn allows the optimization
       there too

# testing

```
python3 -bb -m pytest test/inductor/test_max_autotune.py -v
```

Differential Revision: [D81520584](https://our.internmc.facebook.com/intern/diff/D81520584)
Pull Request resolved: pytorch#161351
Approved by: https://github.com/eellison, https://github.com/jansel
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
…torch#161350)

# why

- now everything is in place to just gather templates and run
  the V.choices.get_mm_configs once per op
- enables any overrides inside V.choices.get_mm_configs to
  have a full view of the options for an op, not just for
  one template

# what

- replace multiple calls to V.choices.get_mm_configs with
  calls to gather the active templates, and then using those
  in a single call

# testing

```
python3 -bb -m pytest test/inductor/test_max_autotune.py -v
```

Differential Revision: [D81520571](https://our.internmc.facebook.com/intern/diff/D81520571)
Pull Request resolved: pytorch#161350
Approved by: https://github.com/eellison, https://github.com/jansel
ghstack dependencies: pytorch#161351
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
…torch#162293)

# why

- eventually we want all templates to go through this
- we're exposing this through diode as a sort of interface/API
- avoid later renaming

# what

- rename get_mm_configs to get_template_configs
- rename _finalize_mm_configs to _finalize_template_configs

# testing

- lintrunner
- ci

Differential Revision: [D81820641](https://our.internmc.facebook.com/intern/diff/D81820641)
Pull Request resolved: pytorch#162293
Approved by: https://github.com/eellison
ghstack dependencies: pytorch#161351, pytorch#161350
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
# why

enable caching/overriding/filtering based on src hash later

# what

- KernelTemplate has a src_hash that is None by default
- sha256 on TritonTemplate of the template src code
- None on ExternKernelChoice to have same API

# testing

n/a (not in use in this change)

Differential Revision: [](https://our.internmc.facebook.com/intern/diff/)

Differential Revision: [D81821149](https://our.internmc.facebook.com/intern/diff/D81821149)
Pull Request resolved: pytorch#161468
Approved by: https://github.com/eellison
ghstack dependencies: pytorch#161351, pytorch#161350, pytorch#162293
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
)

# why

- if we only use ExternKernelChoice we're not doing any codegen
- if we're not doing any codegen, we can use a FlexibleLayout
  here, and provide deeper passes more chances to change it

# what

- if all the kernel template choices (KTC) are with a ExternKernelChoice
  template, we switch to a FlexibleLayout before generating the choice
- add a test to make sure that works as intended (FlexibleLayout for
  only extern, and FixedLayout if Triton is involved)

- caveats:
    - because CPP, CUTLASS, and CK are not using
       V.choices.get_mm_configs yet, we turn off the optimization
       if either of those backends are in use. This will be relaxed
       once they support this too
    - because Triton templates are still using their own calls
       (not a single call) to get_mm_configs, it's also turned
       off there. The next diff unifies Triton + ATEN to a single
       call to get_mm_configs and that in turn allows the optimization
       there too

# testing

```
python3 -bb -m pytest test/inductor/test_max_autotune.py -v
```

Differential Revision: [D81520584](https://our.internmc.facebook.com/intern/diff/D81520584)
Pull Request resolved: pytorch#161351
Approved by: https://github.com/eellison, https://github.com/jansel
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
…torch#161350)

# why

- now everything is in place to just gather templates and run
  the V.choices.get_mm_configs once per op
- enables any overrides inside V.choices.get_mm_configs to
  have a full view of the options for an op, not just for
  one template

# what

- replace multiple calls to V.choices.get_mm_configs with
  calls to gather the active templates, and then using those
  in a single call

# testing

```
python3 -bb -m pytest test/inductor/test_max_autotune.py -v
```

Differential Revision: [D81520571](https://our.internmc.facebook.com/intern/diff/D81520571)
Pull Request resolved: pytorch#161350
Approved by: https://github.com/eellison, https://github.com/jansel
ghstack dependencies: pytorch#161351
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
…torch#162293)

# why

- eventually we want all templates to go through this
- we're exposing this through diode as a sort of interface/API
- avoid later renaming

# what

- rename get_mm_configs to get_template_configs
- rename _finalize_mm_configs to _finalize_template_configs

# testing

- lintrunner
- ci

Differential Revision: [D81820641](https://our.internmc.facebook.com/intern/diff/D81820641)
Pull Request resolved: pytorch#162293
Approved by: https://github.com/eellison
ghstack dependencies: pytorch#161351, pytorch#161350
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
)

# why

- if we only use ExternKernelChoice we're not doing any codegen
- if we're not doing any codegen, we can use a FlexibleLayout
  here, and provide deeper passes more chances to change it

# what

- if all the kernel template choices (KTC) are with a ExternKernelChoice
  template, we switch to a FlexibleLayout before generating the choice
- add a test to make sure that works as intended (FlexibleLayout for
  only extern, and FixedLayout if Triton is involved)

- caveats:
    - because CPP, CUTLASS, and CK are not using
       V.choices.get_mm_configs yet, we turn off the optimization
       if either of those backends are in use. This will be relaxed
       once they support this too
    - because Triton templates are still using their own calls
       (not a single call) to get_mm_configs, it's also turned
       off there. The next diff unifies Triton + ATEN to a single
       call to get_mm_configs and that in turn allows the optimization
       there too

# testing

```
python3 -bb -m pytest test/inductor/test_max_autotune.py -v
```

Differential Revision: [D81520584](https://our.internmc.facebook.com/intern/diff/D81520584)
Pull Request resolved: pytorch#161351
Approved by: https://github.com/eellison, https://github.com/jansel
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
…torch#161350)

# why

- now everything is in place to just gather templates and run
  the V.choices.get_mm_configs once per op
- enables any overrides inside V.choices.get_mm_configs to
  have a full view of the options for an op, not just for
  one template

# what

- replace multiple calls to V.choices.get_mm_configs with
  calls to gather the active templates, and then using those
  in a single call

# testing

```
python3 -bb -m pytest test/inductor/test_max_autotune.py -v
```

Differential Revision: [D81520571](https://our.internmc.facebook.com/intern/diff/D81520571)
Pull Request resolved: pytorch#161350
Approved by: https://github.com/eellison, https://github.com/jansel
ghstack dependencies: pytorch#161351
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
…torch#162293)

# why

- eventually we want all templates to go through this
- we're exposing this through diode as a sort of interface/API
- avoid later renaming

# what

- rename get_mm_configs to get_template_configs
- rename _finalize_mm_configs to _finalize_template_configs

# testing

- lintrunner
- ci

Differential Revision: [D81820641](https://our.internmc.facebook.com/intern/diff/D81820641)
Pull Request resolved: pytorch#162293
Approved by: https://github.com/eellison
ghstack dependencies: pytorch#161351, pytorch#161350
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
# why

enable caching/overriding/filtering based on src hash later

# what

- KernelTemplate has a src_hash that is None by default
- sha256 on TritonTemplate of the template src code
- None on ExternKernelChoice to have same API

# testing

n/a (not in use in this change)

Differential Revision: [](https://our.internmc.facebook.com/intern/diff/)

Differential Revision: [D81821149](https://our.internmc.facebook.com/intern/diff/D81821149)
Pull Request resolved: pytorch#161468
Approved by: https://github.com/eellison
ghstack dependencies: pytorch#161351, pytorch#161350, pytorch#162293
@github-actions github-actions bot deleted the gh/coconutruben/52/head branch October 13, 2025 02:15
Khanaksahu pushed a commit to Khanaksahu/pytorch-fork that referenced this pull request Nov 17, 2025
\# why

- if we only use ExternKernelChoice we're not doing any codegen
- if we're not doing any codegen, we can use a FlexibleLayout
  here, and provide deeper passes more chances to change it

\# what

- if all the kernel template choices (KTC) are with a ExternKernelChoice
  template, we switch to a FlexibleLayout before generating the choice
- add a test to make sure that works as intended (FlexibleLayout for
  only extern, and FixedLayout if Triton is involved)

- caveats:
  - because CPP, CUTLASS, and CK are not using
    V.choices.get_mm_configs yet, we turn off the optimization
    if either of those backends are in use. This will be relaxed
    once they support this too
  - because Triton templates are still using their own calls
    (not a single call) to get_mm_configs, it's also turned
    off there. The next diff unifies Triton + ATEN to a single
    call to get_mm_configs and that in turn allows the optimization
    there too

\# testing

```
python3 -bb -m pytest test/inductor/test_max_autotune.py -v
python3 -bb -m pytest test/inductor/test_torchinductor.py::TritonCodeGenTests::test_donated_buffer_inplace_gpt
```

ghstack-source-id: 93a4209
Pull Request resolved: pytorch/pytorch#161351
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged module: inductor Reverted topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants