Skip to content

Conversation

@zou3519
Copy link
Contributor

@zou3519 zou3519 commented Aug 23, 2024

Stack from ghstack (oldest at bottom):

The pattern matcher runs DCE and remove_noop_ops on the replacement
graph by default. Previously we had a switch for the DCE. This PR
changes that switch to also control if we run remove_noop_ops.

The context was that there is silent incorrectness with
auto_functionalized. We use the Pattern matcher to decompose
auto_functionalized into a mutable op + clones; remove_noop_ops were
deleting the clones.

Future: can try #134363

Test Plan:

  • new test. I wasn't able to produce a silently incorrect example so I
    settled for asserting that clones still exist in the post-grad graph.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

…ment graph

The pattern matcher runs DCE and remove_noop_ops on the replacement
graph by default. Previously we had a switch for the DCE. This PR
changes that switch to also control if we run remove_noop_ops.

The context was that there is silent incorrectness with
auto_functionalized. We use the Pattern matcher to decompose
auto_functionalized into a mutable op + clones; remove_noop_ops were
deleting the clones.

Future: can try #134363

Test Plan:
- new test. I wasn't able to produce a silently incorrect example so I
  settled for asserting that clones still exist in the post-grad graph.

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 23, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/134364

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 2b244cf with merge base 92151c8 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

zou3519 added a commit that referenced this pull request Aug 23, 2024
…ment graph

The pattern matcher runs DCE and remove_noop_ops on the replacement
graph by default. Previously we had a switch for the DCE. This PR
changes that switch to also control if we run remove_noop_ops.

The context was that there is silent incorrectness with
auto_functionalized. We use the Pattern matcher to decompose
auto_functionalized into a mutable op + clones; remove_noop_ops were
deleting the clones.

Future: can try #134363

Test Plan:
- new test. I wasn't able to produce a silently incorrect example so I
  settled for asserting that clones still exist in the post-grad graph.

ghstack-source-id: a4f2ade
Pull Request resolved: #134364
@zou3519 zou3519 requested review from Chillee and eellison August 23, 2024 21:33
@zou3519 zou3519 added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 23, 2024
Copy link
Contributor

@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully no uses of dce=False out there..

@zou3519 zou3519 added the topic: not user facing topic category label Aug 24, 2024
@zou3519
Copy link
Contributor Author

zou3519 commented Aug 24, 2024

@pytorchbot merge -f "rocm infinite queue"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Aug 26, 2024
…kip comments and to skip empty lines (#134248)

I had a night mare rewriting tests in test_misc.py specifically :
1. graphs can have comments that refers to my files "/lsakka/.." we really dont care about comments add option to ignore comments.
2. empty lines added when EXPECTTEST_ACCEPT=1  are changed with linter causing tests to fail or linter fail!
add flag to ignore empty lines.
3. EXPECTTEST_ACCEPT fails when the text have some not readable characters. those should not effect comparing strings, also those causes weird diffs comments when tests fails. I removed ansi_escape chars #133045

this is used in

Pull Request resolved: #134248
Approved by: https://github.com/aorenste
ghstack dependencies: #133639, #134364
pytorchmergebot pushed a commit that referenced this pull request Aug 27, 2024
…ctional (#134466)

Fixes #134372

The triton_kernel_wrapper_functional lowering was causing problems (it
was generating small kernels with nans in it, probably from realizing
aten.empty nodes. Instead of having its own manual lowering, we change
triton_kernel_wrapper_functional to go the same route as
auto_functionalized where we decompose the node into clone + mutation
nodes.

Test Plan:
- new test
- existing tests

Pull Request resolved: #134466
Approved by: https://github.com/oulgen, https://github.com/eellison
ghstack dependencies: #134364
pytorchmergebot pushed a commit that referenced this pull request Aug 27, 2024
…zed (#134490)

We say Node a is fusible into node b if node b is an auto_functionalized
node that may reinplace node a later on.

This PR also changes aten.empty to be recomputable w.r.t the Partitioner
(it is, like aten.zeros, cheap to recompute and fusible into other ops).

Fixes #134468

Test Plan:
- new test

Pull Request resolved: #134490
Approved by: https://github.com/Chillee
ghstack dependencies: #134364, #134466
pytorchmergebot pushed a commit that referenced this pull request Aug 27, 2024
…134491)

mutated arguments to triton kernels are fusible into the triton kernel.

Test Plan:
- new test

Pull Request resolved: #134491
Approved by: https://github.com/Chillee
ghstack dependencies: #134364, #134466, #134490
Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Sep 20, 2024
…ctional (pytorch#134466)

Fixes pytorch#134372

The triton_kernel_wrapper_functional lowering was causing problems (it
was generating small kernels with nans in it, probably from realizing
aten.empty nodes. Instead of having its own manual lowering, we change
triton_kernel_wrapper_functional to go the same route as
auto_functionalized where we decompose the node into clone + mutation
nodes.

Test Plan:
- new test
- existing tests

Pull Request resolved: pytorch#134466
Approved by: https://github.com/oulgen, https://github.com/eellison
ghstack dependencies: pytorch#134364
Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Sep 20, 2024
…zed (pytorch#134490)

We say Node a is fusible into node b if node b is an auto_functionalized
node that may reinplace node a later on.

This PR also changes aten.empty to be recomputable w.r.t the Partitioner
(it is, like aten.zeros, cheap to recompute and fusible into other ops).

Fixes pytorch#134468

Test Plan:
- new test

Pull Request resolved: pytorch#134490
Approved by: https://github.com/Chillee
ghstack dependencies: pytorch#134364, pytorch#134466
Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Sep 20, 2024
…ytorch#134491)

mutated arguments to triton kernels are fusible into the triton kernel.

Test Plan:
- new test

Pull Request resolved: pytorch#134491
Approved by: https://github.com/Chillee
ghstack dependencies: pytorch#134364, pytorch#134466, pytorch#134490
Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Sep 26, 2024
…kip comments and to skip empty lines (pytorch#134248)

I had a night mare rewriting tests in test_misc.py specifically :
1. graphs can have comments that refers to my files "/lsakka/.." we really dont care about comments add option to ignore comments.
2. empty lines added when EXPECTTEST_ACCEPT=1  are changed with linter causing tests to fail or linter fail!
add flag to ignore empty lines.
3. EXPECTTEST_ACCEPT fails when the text have some not readable characters. those should not effect comparing strings, also those causes weird diffs comments when tests fails. I removed ansi_escape chars pytorch#133045

this is used in

Pull Request resolved: pytorch#134248
Approved by: https://github.com/aorenste
ghstack dependencies: pytorch#133639, pytorch#134364
Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Sep 26, 2024
…ctional (pytorch#134466)

Fixes pytorch#134372

The triton_kernel_wrapper_functional lowering was causing problems (it
was generating small kernels with nans in it, probably from realizing
aten.empty nodes. Instead of having its own manual lowering, we change
triton_kernel_wrapper_functional to go the same route as
auto_functionalized where we decompose the node into clone + mutation
nodes.

Test Plan:
- new test
- existing tests

Pull Request resolved: pytorch#134466
Approved by: https://github.com/oulgen, https://github.com/eellison
ghstack dependencies: pytorch#134364
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants