[inductor][addmm] incorporate into new get_mm_configs properly #161534

coconutruben · 2025-08-26T19:48:54Z

Stack from ghstack (oldest at bottom):

-> [inductor][addmm] incorporate into new get_mm_configs properly #161534

why

addmm aten running with an expanded version of bias vs the regular
bias sometimes causes numerics differences
to avoid this for now, we make addmm aten use inp vs inp_expanded
depending on if we're in max-autotune or not, matching the previous
logic

what

pass unexpanded bias (inp)
let template (heuristics) that it to be expanded (ATen in not max-autotune, Triton always) expand it

testing

python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @jataylo @chenyang78

Differential Revision: D81520581

\# why - addmm aten running with an expanded version of bias vs the regular bias sometimes causes numerics differences - to avoid this for now, we make addmm aten use inp vs inp_expanded depending on if we're in max-autotune or not, matching the previous logic \# what - expand KernelInputs to also store views of specific nodes, by names - use that view (inp, the unexpanded version) in the heuristics to adjust it depending on whether we're in max-autotune or not \# testing ``` python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu ``` [ghstack-poisoned]

pytorch-bot · 2025-08-26T19:48:58Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161534

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures, 7 Cancelled Jobs

As of commit 66e7441 with merge base d25c35d ():

NEW FAILURES - The following jobs have failed:

inductor / inductor-cpu-test / test (dynamic_cpu_inductor_torchbench, 2, 2, linux.8xlarge.amx) (gh)
timm_nfnet
inductor / unit-test / inductor-test / test (inductor_cpp_wrapper, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
'test/inductor/test_torchinductor.py::GPUTests::test_linear_dynamic_maxautotune_cuda'
inductor / unit-test / inductor-test / test (inductor, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
'test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_linear_dynamic_maxautotune_cuda'
inductor / unit-test / inductor-test / test (inductor, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
inductor/test_torchinductor.py::GPUTests::test_linear_dynamic_maxautotune_cuda
trunk / linux-jammy-rocm-py3.10 / test (default, 1, 2, linux.rocm.gpu.gfx942.1) (gh)
inductor/test_torchinductor.py::GPUTests::test_linear_dynamic_maxautotune_cuda

CANCELLED JOBS - The following jobs were cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

coconutruben · 2025-09-05T22:23:53Z

@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…erly" # why - addmm aten running with an expanded version of bias vs the regular bias sometimes causes numerics differences - to avoid this for now, we make addmm aten use inp vs inp_expanded depending on if we're in max-autotune or not, matching the previous logic # what - remove the view from inp_expanded when running not in max-autotune # testing ``` python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov Differential Revision: [D81520581](https://our.internmc.facebook.com/intern/diff/D81520581) [ghstack-poisoned]

coconutruben · 2025-09-05T22:31:27Z

@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…erly" # why - addmm aten running with an expanded version of bias vs the regular bias sometimes causes numerics differences - to avoid this for now, we make addmm aten use inp vs inp_expanded depending on if we're in max-autotune or not, matching the previous logic # what - pass unexpanded bias (inp) - let template (heuristics) that it to be expanded (ATen in not max-autotune, Triton always) expand it # testing ``` python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov Differential Revision: [D81520581](https://our.internmc.facebook.com/intern/diff/D81520581) [ghstack-poisoned]

coconutruben · 2025-09-09T16:13:17Z

@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…erly" # why - addmm aten running with an expanded version of bias vs the regular bias sometimes causes numerics differences - to avoid this for now, we make addmm aten use inp vs inp_expanded depending on if we're in max-autotune or not, matching the previous logic # what - pass unexpanded bias (inp) - let template (heuristics) that it to be expanded (ATen in not max-autotune, Triton always) expand it # testing ``` python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov Differential Revision: [D81520581](https://our.internmc.facebook.com/intern/diff/D81520581) [ghstack-poisoned]

coconutruben · 2025-09-10T18:01:46Z

@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…erly" # why - addmm aten running with an expanded version of bias vs the regular bias sometimes causes numerics differences - to avoid this for now, we make addmm aten use inp vs inp_expanded depending on if we're in max-autotune or not, matching the previous logic # what - pass unexpanded bias (inp) - let template (heuristics) that it to be expanded (ATen in not max-autotune, Triton always) expand it # testing ``` python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov Differential Revision: [D81520581](https://our.internmc.facebook.com/intern/diff/D81520581) [ghstack-poisoned]

jansel

Failing tests?

torch/_inductor/template_heuristics/aten.py

…erly" # why - addmm aten running with an expanded version of bias vs the regular bias sometimes causes numerics differences - to avoid this for now, we make addmm aten use inp vs inp_expanded depending on if we're in max-autotune or not, matching the previous logic # what - pass unexpanded bias (inp) - let template (heuristics) that it to be expanded (ATen in not max-autotune, Triton always) expand it # testing ``` python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov Differential Revision: [D81520581](https://our.internmc.facebook.com/intern/diff/D81520581) [ghstack-poisoned]

coconutruben · 2025-09-11T22:53:09Z

@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…erly" # why - addmm aten running with an expanded version of bias vs the regular bias sometimes causes numerics differences - to avoid this for now, we make addmm aten use inp vs inp_expanded depending on if we're in max-autotune or not, matching the previous logic # what - pass unexpanded bias (inp) - let template (heuristics) that it to be expanded (ATen in not max-autotune, Triton always) expand it # testing ``` python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov Differential Revision: [D81520581](https://our.internmc.facebook.com/intern/diff/D81520581) [ghstack-poisoned]

\# why - addmm aten running with an expanded version of bias vs the regular bias sometimes causes numerics differences - to avoid this for now, we make addmm aten use inp vs inp_expanded depending on if we're in max-autotune or not, matching the previous logic \# what - remove the view from inp when not in max-autotune for addmm aten \# testing ``` python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu ``` ghstack-source-id: 4399549 Pull Request resolved: #161534

…erly" # why - addmm aten running with an expanded version of bias vs the regular bias sometimes causes numerics differences - to avoid this for now, we make addmm aten use inp vs inp_expanded depending on if we're in max-autotune or not, matching the previous logic # what - pass unexpanded bias (inp) - let template (heuristics) that it to be expanded (ATen in not max-autotune, Triton always) expand it # testing ``` python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov Differential Revision: [D81520581](https://our.internmc.facebook.com/intern/diff/D81520581) [ghstack-poisoned]

\# why - addmm aten running with an expanded version of bias vs the regular bias sometimes causes numerics differences - to avoid this for now, we make addmm aten use inp vs inp_expanded depending on if we're in max-autotune or not, matching the previous logic \# what - remove the view from inp when not in max-autotune for addmm aten \# testing ``` python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu ``` ghstack-source-id: 208c906 Pull Request resolved: #161534

github-actions · 2025-11-17T17:36:59Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

\# why - addmm aten running with an expanded version of bias vs the regular bias sometimes causes numerics differences - to avoid this for now, we make addmm aten use inp vs inp_expanded depending on if we're in max-autotune or not, matching the previous logic \# what - expand KernelInputs to also store views of specific nodes, by names - use that view (inp, the unexpanded version) in the heuristics to adjust it depending on whether we're in max-autotune or not \# testing ``` python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu ``` ghstack-source-id: 1182f39 Pull Request resolved: pytorch/pytorch#161534

This was referenced Aug 26, 2025

[inductor][mm] restructure decompose k #161026

Closed

[inductor][ez] move template heuristics into dir #161097

Closed

pytorch-bot bot added ciflow/inductor module: inductor labels Aug 26, 2025

coconutruben added the topic: not user facing topic category label Aug 26, 2025

coconutruben added 2 commits September 8, 2025 17:11

jansel approved these changes Sep 9, 2025

View reviewed changes

coconutruben added 2 commits September 9, 2025 15:41

jansel requested changes Sep 11, 2025

View reviewed changes

torch/_inductor/template_heuristics/aten.py Outdated Show resolved Hide resolved

coconutruben added 4 commits September 11, 2025 17:16

coconutruben marked this pull request as draft September 18, 2025 17:13

github-actions bot added the Stale label Nov 17, 2025

pytorchbot added the open source label Dec 4, 2025

pytorch-bot bot added ciflow/b200 ciflow/h100 ciflow/rocm Trigger "default" config CI on ROCm labels Dec 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[inductor][addmm] incorporate into new get_mm_configs properly #161534

[inductor][addmm] incorporate into new get_mm_configs properly #161534

Uh oh!

coconutruben commented Aug 26, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Aug 26, 2025 •

edited

Loading

Uh oh!

coconutruben commented Sep 5, 2025

Uh oh!

coconutruben commented Sep 5, 2025

Uh oh!

coconutruben commented Sep 9, 2025

Uh oh!

coconutruben commented Sep 10, 2025

Uh oh!

jansel left a comment

Uh oh!

Uh oh!

coconutruben commented Sep 11, 2025

Uh oh!

github-actions bot commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[inductor][addmm] incorporate into new get_mm_configs properly #161534

Are you sure you want to change the base?

[inductor][addmm] incorporate into new get_mm_configs properly #161534

Uh oh!

Conversation

coconutruben commented Aug 26, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

why

what

testing

Uh oh!

pytorch-bot bot commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161534

❌ 5 New Failures, 7 Cancelled Jobs

Uh oh!

coconutruben commented Sep 5, 2025

Uh oh!

coconutruben commented Sep 5, 2025

Uh oh!

coconutruben commented Sep 9, 2025

Uh oh!

coconutruben commented Sep 10, 2025

Uh oh!

jansel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coconutruben commented Sep 11, 2025

Uh oh!

github-actions bot commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

coconutruben commented Aug 26, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Aug 26, 2025 •

edited

Loading