Skip to content

Conversation

@coconutruben
Copy link
Contributor

@coconutruben coconutruben commented Aug 26, 2025

Stack from ghstack (oldest at bottom):

why

  • addmm aten running with an expanded version of bias vs the regular
    bias sometimes causes numerics differences
  • to avoid this for now, we make addmm aten use inp vs inp_expanded
    depending on if we're in max-autotune or not, matching the previous
    logic

what

  • pass unexpanded bias (inp)
  • let template (heuristics) that it to be expanded (ATen in not max-autotune, Triton always) expand it

testing

python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @jataylo @chenyang78

Differential Revision: D81520581

\# why

- addmm aten running with an expanded version of bias vs the regular
  bias sometimes causes numerics differences
- to avoid this for now, we make addmm aten use inp vs inp_expanded
  depending on if we're in max-autotune or not, matching the previous
  logic

\# what

- expand KernelInputs to also store views of specific nodes, by names
- use that view (inp, the unexpanded version) in the heuristics to
  adjust it depending on whether we're in max-autotune or not

\# testing

```
python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
```

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 26, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161534

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures, 7 Cancelled Jobs

As of commit 66e7441 with merge base d25c35d (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOBS - The following jobs were cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

This was referenced Aug 26, 2025
@coconutruben coconutruben added the topic: not user facing topic category label Aug 26, 2025
@coconutruben
Copy link
Contributor Author

@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…erly"


# why

- addmm aten running with an expanded version of bias vs the regular
  bias sometimes causes numerics differences
- to avoid this for now, we make addmm aten use inp vs inp_expanded
  depending on if we're in max-autotune or not, matching the previous
  logic

# what

- remove the view from inp_expanded when running not in max-autotune

# testing

```
python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D81520581](https://our.internmc.facebook.com/intern/diff/D81520581)

[ghstack-poisoned]
@coconutruben
Copy link
Contributor Author

@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…erly"


# why

- addmm aten running with an expanded version of bias vs the regular
  bias sometimes causes numerics differences
- to avoid this for now, we make addmm aten use inp vs inp_expanded
  depending on if we're in max-autotune or not, matching the previous
  logic

# what

- pass unexpanded bias (inp)
- let template (heuristics) that it to be expanded (ATen in not max-autotune, Triton always) expand it

# testing

```
python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D81520581](https://our.internmc.facebook.com/intern/diff/D81520581)

[ghstack-poisoned]
…erly"


# why

- addmm aten running with an expanded version of bias vs the regular
  bias sometimes causes numerics differences
- to avoid this for now, we make addmm aten use inp vs inp_expanded
  depending on if we're in max-autotune or not, matching the previous
  logic

# what

- pass unexpanded bias (inp)
- let template (heuristics) that it to be expanded (ATen in not max-autotune, Triton always) expand it

# testing

```
python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D81520581](https://our.internmc.facebook.com/intern/diff/D81520581)

[ghstack-poisoned]
@coconutruben
Copy link
Contributor Author

@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…erly"


# why

- addmm aten running with an expanded version of bias vs the regular
  bias sometimes causes numerics differences
- to avoid this for now, we make addmm aten use inp vs inp_expanded
  depending on if we're in max-autotune or not, matching the previous
  logic

# what

- pass unexpanded bias (inp)
- let template (heuristics) that it to be expanded (ATen in not max-autotune, Triton always) expand it

# testing

```
python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D81520581](https://our.internmc.facebook.com/intern/diff/D81520581)

[ghstack-poisoned]
…erly"


# why

- addmm aten running with an expanded version of bias vs the regular
  bias sometimes causes numerics differences
- to avoid this for now, we make addmm aten use inp vs inp_expanded
  depending on if we're in max-autotune or not, matching the previous
  logic

# what

- pass unexpanded bias (inp)
- let template (heuristics) that it to be expanded (ATen in not max-autotune, Triton always) expand it

# testing

```
python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D81520581](https://our.internmc.facebook.com/intern/diff/D81520581)

[ghstack-poisoned]
…erly"


# why

- addmm aten running with an expanded version of bias vs the regular
  bias sometimes causes numerics differences
- to avoid this for now, we make addmm aten use inp vs inp_expanded
  depending on if we're in max-autotune or not, matching the previous
  logic

# what

- pass unexpanded bias (inp)
- let template (heuristics) that it to be expanded (ATen in not max-autotune, Triton always) expand it

# testing

```
python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D81520581](https://our.internmc.facebook.com/intern/diff/D81520581)

[ghstack-poisoned]
@coconutruben
Copy link
Contributor Author

@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…erly"


# why

- addmm aten running with an expanded version of bias vs the regular
  bias sometimes causes numerics differences
- to avoid this for now, we make addmm aten use inp vs inp_expanded
  depending on if we're in max-autotune or not, matching the previous
  logic

# what

- pass unexpanded bias (inp)
- let template (heuristics) that it to be expanded (ATen in not max-autotune, Triton always) expand it

# testing

```
python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D81520581](https://our.internmc.facebook.com/intern/diff/D81520581)

[ghstack-poisoned]
Copy link
Contributor

@jansel jansel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failing tests?

…erly"


# why

- addmm aten running with an expanded version of bias vs the regular
  bias sometimes causes numerics differences
- to avoid this for now, we make addmm aten use inp vs inp_expanded
  depending on if we're in max-autotune or not, matching the previous
  logic

# what

- pass unexpanded bias (inp)
- let template (heuristics) that it to be expanded (ATen in not max-autotune, Triton always) expand it

# testing

```
python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D81520581](https://our.internmc.facebook.com/intern/diff/D81520581)

[ghstack-poisoned]
@coconutruben
Copy link
Contributor Author

@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…erly"


# why

- addmm aten running with an expanded version of bias vs the regular
  bias sometimes causes numerics differences
- to avoid this for now, we make addmm aten use inp vs inp_expanded
  depending on if we're in max-autotune or not, matching the previous
  logic

# what

- pass unexpanded bias (inp)
- let template (heuristics) that it to be expanded (ATen in not max-autotune, Triton always) expand it

# testing

```
python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D81520581](https://our.internmc.facebook.com/intern/diff/D81520581)

[ghstack-poisoned]
…erly"


# why

- addmm aten running with an expanded version of bias vs the regular
  bias sometimes causes numerics differences
- to avoid this for now, we make addmm aten use inp vs inp_expanded
  depending on if we're in max-autotune or not, matching the previous
  logic

# what

- pass unexpanded bias (inp)
- let template (heuristics) that it to be expanded (ATen in not max-autotune, Triton always) expand it

# testing

```
python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D81520581](https://our.internmc.facebook.com/intern/diff/D81520581)

[ghstack-poisoned]
…erly"


# why

- addmm aten running with an expanded version of bias vs the regular
  bias sometimes causes numerics differences
- to avoid this for now, we make addmm aten use inp vs inp_expanded
  depending on if we're in max-autotune or not, matching the previous
  logic

# what

- pass unexpanded bias (inp)
- let template (heuristics) that it to be expanded (ATen in not max-autotune, Triton always) expand it

# testing

```
python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D81520581](https://our.internmc.facebook.com/intern/diff/D81520581)

[ghstack-poisoned]
…erly"


# why

- addmm aten running with an expanded version of bias vs the regular
  bias sometimes causes numerics differences
- to avoid this for now, we make addmm aten use inp vs inp_expanded
  depending on if we're in max-autotune or not, matching the previous
  logic

# what

- pass unexpanded bias (inp)
- let template (heuristics) that it to be expanded (ATen in not max-autotune, Triton always) expand it

# testing

```
python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D81520581](https://our.internmc.facebook.com/intern/diff/D81520581)

[ghstack-poisoned]
coconutruben added a commit that referenced this pull request Sep 13, 2025
\# why

- addmm aten running with an expanded version of bias vs the regular
  bias sometimes causes numerics differences
- to avoid this for now, we make addmm aten use inp vs inp_expanded
  depending on if we're in max-autotune or not, matching the previous
  logic

\# what

- remove the view from inp when not in max-autotune for addmm aten

\# testing

```
python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
```

ghstack-source-id: 4399549
Pull Request resolved: #161534
…erly"


# why

- addmm aten running with an expanded version of bias vs the regular
  bias sometimes causes numerics differences
- to avoid this for now, we make addmm aten use inp vs inp_expanded
  depending on if we're in max-autotune or not, matching the previous
  logic

# what

- pass unexpanded bias (inp)
- let template (heuristics) that it to be expanded (ATen in not max-autotune, Triton always) expand it

# testing

```
python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D81520581](https://our.internmc.facebook.com/intern/diff/D81520581)

[ghstack-poisoned]
coconutruben added a commit that referenced this pull request Sep 13, 2025
\# why

- addmm aten running with an expanded version of bias vs the regular
  bias sometimes causes numerics differences
- to avoid this for now, we make addmm aten use inp vs inp_expanded
  depending on if we're in max-autotune or not, matching the previous
  logic

\# what

- remove the view from inp when not in max-autotune for addmm aten

\# testing

```
python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
```

ghstack-source-id: 208c906
Pull Request resolved: #161534
@coconutruben coconutruben marked this pull request as draft September 18, 2025 17:13
@github-actions
Copy link
Contributor

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Nov 17, 2025
Khanaksahu pushed a commit to Khanaksahu/pytorch-fork that referenced this pull request Nov 17, 2025
\# why

- addmm aten running with an expanded version of bias vs the regular
  bias sometimes causes numerics differences
- to avoid this for now, we make addmm aten use inp vs inp_expanded
  depending on if we're in max-autotune or not, matching the previous
  logic

\# what

- expand KernelInputs to also store views of specific nodes, by names
- use that view (inp, the unexpanded version) in the heuristics to
  adjust it depending on whether we're in max-autotune or not

\# testing

```
python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
```

ghstack-source-id: 1182f39
Pull Request resolved: pytorch/pytorch#161534
@pytorch-bot pytorch-bot bot added ciflow/b200 ciflow/h100 ciflow/rocm Trigger "default" config CI on ROCm labels Dec 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants