[inductor] Fix needs_fixed_stride_order silent incorrectness #133639

zou3519 · 2024-08-15T22:25:14Z

Stack from ghstack (oldest at bottom):

-> [inductor] Fix needs_fixed_stride_order silent incorrectness #133639

The approach is option 2 of what Elias suggested in the comment
thread:

We require tensors to have the correct stride at usage. This may
involve a clone; if there was a clone and then a mutation into it
then we copy_ back the result of the mutation.

The reason why I went this approach was because it was the easiest and
Inductor already works really hard to remove additional clones/copy_.

There are some cases that this doesn't generate efficient code for; for
example, if the tensor is a view, we don't change the base of the view
to have the right stride order, instead we do a clone.
The view case isn't very common so I'm ignoring it for now but we could
improve this in the future.

Test Plan:

tests

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

Fixes #128084 The approach is two-fold (based on what Elias suggested in the comment thread): 1. We realize Tensors and ensure they are the right stride. 2. If we can't ensure they have the right stride (if the tensors are already in an incorrect stride with FixedLayout, or if there's a view), then we require the correct stride on use via a clone and then copy_ back the result of the mutation. We can improve this based on use cases (if the tensor is a view, then we can require the base of the view to have the right stride order). The view case isn't very common so I'm ignoring it for now. Test Plan: - tests [ghstack-poisoned]

pytorch-bot · 2024-08-15T22:25:17Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133639

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit d6b81f3 with merge base 92151c8 ():

NEW FAILURE - The following job has failed:

trunk / linux-focal-rocm6.1-py3.8 / test (default, 1, 2, linux.rocm.gpu) (gh)
inductor/test_torchinductor.py::GPUTests::test_mutable_custom_op_fixed_layout2_cuda

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Fixes #128084 The approach is two-fold (based on what Elias suggested in the comment thread): 1. We realize Tensors and ensure they are the right stride. 2. If we can't ensure they have the right stride (if the tensors are already in an incorrect stride with FixedLayout, or if there's a view), then we require the correct stride on use via a clone and then copy_ back the result of the mutation. We can improve this based on use cases (if the tensor is a view, then we can require the base of the view to have the right stride order). The view case isn't very common so I'm ignoring it for now. Also, inductor is really good at removing unnecessary copy_: I couldn't find a case where it the output code had it. Test Plan: - tests [ghstack-poisoned]

Fixes #128084 The approach is two-fold (based on what Elias suggested in the comment thread): 1. We realize Tensors and ensure they are the right stride. 2. If we can't ensure they have the right stride (if the tensors are already in an incorrect stride with FixedLayout, or if there's a view), then we require the correct stride on use via a clone and then copy_ back the result of the mutation. We can improve this based on use cases (if the tensor is a view, then we can require the base of the view to have the right stride order). The view case isn't very common so I'm ignoring it for now. Also, inductor is really good at removing unnecessary copy_: I couldn't find a case where it the output code had it. Test Plan: - tests ghstack-source-id: e69e99e Pull Request resolved: #133639

Fixes #128084 The approach is option 2 of what Elias suggested in the comment thread: - We require tensors to have the correct stride at usage. This may involve a clone; if there was a clone and then a mutation into it then we copy_ back the result of the mutation. The reason why I went this approach was because it was the easiest and Inductor already works really hard to remove additional clones/copy_. There are some cases that this doesn't generate efficient code for; for example, if the tensor is a view, we don't change the base of the view to have the right stride order, instead we do a clone. The view case isn't very common so I'm ignoring it for now but we could improve this in the future. Test Plan: - tests [ghstack-poisoned]

Fixes #128084 The approach is option 2 of what Elias suggested in the comment thread: - We require tensors to have the correct stride at usage. This may involve a clone; if there was a clone and then a mutation into it then we copy_ back the result of the mutation. The reason why I went this approach was because it was the easiest and Inductor already works really hard to remove additional clones/copy_. There are some cases that this doesn't generate efficient code for; for example, if the tensor is a view, we don't change the base of the view to have the right stride order, instead we do a clone. The view case isn't very common so I'm ignoring it for now but we could improve this in the future. Test Plan: - tests ghstack-source-id: 3de3d62 Pull Request resolved: #133639

Fixes #128084 The approach is option 2 of what Elias suggested in the comment thread: - We require tensors to have the correct stride at usage. This may involve a clone; if there was a clone and then a mutation into it then we copy_ back the result of the mutation. The reason why I went this approach was because it was the easiest and Inductor already works really hard to remove additional clones/copy_. There are some cases that this doesn't generate efficient code for; for example, if the tensor is a view, we don't change the base of the view to have the right stride order, instead we do a clone. The view case isn't very common so I'm ignoring it for now but we could improve this in the future. Test Plan: - tests [ghstack-poisoned]

Fixes #128084 The approach is option 2 of what Elias suggested in the comment thread: - We require tensors to have the correct stride at usage. This may involve a clone; if there was a clone and then a mutation into it then we copy_ back the result of the mutation. The reason why I went this approach was because it was the easiest and Inductor already works really hard to remove additional clones/copy_. There are some cases that this doesn't generate efficient code for; for example, if the tensor is a view, we don't change the base of the view to have the right stride order, instead we do a clone. The view case isn't very common so I'm ignoring it for now but we could improve this in the future. Test Plan: - tests ghstack-source-id: 78ec0b2 Pull Request resolved: #133639

Fixes #128084 The approach is option 2 of what Elias suggested in the comment thread: - We require tensors to have the correct stride at usage. This may involve a clone; if there was a clone and then a mutation into it then we copy_ back the result of the mutation. The reason why I went this approach was because it was the easiest and Inductor already works really hard to remove additional clones/copy_. There are some cases that this doesn't generate efficient code for; for example, if the tensor is a view, we don't change the base of the view to have the right stride order, instead we do a clone. The view case isn't very common so I'm ignoring it for now but we could improve this in the future. Test Plan: - tests [ghstack-poisoned]

Fixes #128084 The approach is option 2 of what Elias suggested in the comment thread: - We require tensors to have the correct stride at usage. This may involve a clone; if there was a clone and then a mutation into it then we copy_ back the result of the mutation. The reason why I went this approach was because it was the easiest and Inductor already works really hard to remove additional clones/copy_. There are some cases that this doesn't generate efficient code for; for example, if the tensor is a view, we don't change the base of the view to have the right stride order, instead we do a clone. The view case isn't very common so I'm ignoring it for now but we could improve this in the future. Test Plan: - tests ghstack-source-id: 7fee9b5 Pull Request resolved: #133639

Fixes #128084 The approach is option 2 of what Elias suggested in the comment thread: - We require tensors to have the correct stride at usage. This may involve a clone; if there was a clone and then a mutation into it then we copy_ back the result of the mutation. The reason why I went this approach was because it was the easiest and Inductor already works really hard to remove additional clones/copy_. There are some cases that this doesn't generate efficient code for; for example, if the tensor is a view, we don't change the base of the view to have the right stride order, instead we do a clone. The view case isn't very common so I'm ignoring it for now but we could improve this in the future. Test Plan: - tests cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

Fixes #128084 The approach is option 2 of what Elias suggested in the comment thread: - We require tensors to have the correct stride at usage. This may involve a clone; if there was a clone and then a mutation into it then we copy_ back the result of the mutation. The reason why I went this approach was because it was the easiest and Inductor already works really hard to remove additional clones/copy_. There are some cases that this doesn't generate efficient code for; for example, if the tensor is a view, we don't change the base of the view to have the right stride order, instead we do a clone. The view case isn't very common so I'm ignoring it for now but we could improve this in the future. Test Plan: - tests ghstack-source-id: 8b7a94a Pull Request resolved: #133639

eellison

looks great !

eellison · 2024-08-21T01:50:14Z

test/inductor/test_torchinductor_codegen_dynamic_shapes.py

    "test_conv2d_channels_last_dynamic_shapes": TestFailure(("cpu",)),
    "test_conv3d_dynamic_shapes": TestFailure(("cpu",)),
    "test_conv3d_channels_last_dynamic_shapes": TestFailure(("cpu",)),
+    "test_mutable_custom_op_fixed_layout2_dynamic_shapes": TestFailure(("cpu",)),


still failing on cpu?

Yes, I think it's because inductor doesn't generate any for loops for this code on CPU?

torch/_inductor/graph.py

eellison · 2024-08-21T02:01:24Z

torch/_inductor/graph.py

+        # lowering an op that is FallbackKernel for the first time will automatically
+        # generate the lowering so that the following isn't called multiple times.
+        if torch._C.Tag.needs_fixed_stride_order in op.tags:
+            torch._inductor.lowering.add_layout_constraint(op, constrain_to_fx_strides)


where is the logic that prevents this from being called twice ? tried to find and failed

I'll try to refactor this to be clearer and/or adjust the comment.

The first time we call maybe_add_custom_op_layout_constraints with a custom op foo:

there's no lowering for the custom op

directly after this function, in run_node, we do GraphLowering.call_function (

pytorch/torch/_inductor/graph.py

Line 1281 in 2bffbe0

result = super().run_node(n)

)

This generates a lowering for the custom op (make_fallback in

pytorch/torch/_inductor/graph.py

Line 1023 in 2bffbe0

make_fallback(target, layout_constraint)

)

The second time we call maybe_add_custom_op_layout_constraints with a custom op foo:

there's a lowering for the custom op, so the early return prevents us from adding the layout constraint twice.

Note that this was how the code worked previously (we would generate the lowering and add the layout constraint once on the first usage of the custom op) but I needed to hoist this logic out because resolving layout constraints happens in run_node

I ended up adding new logic to register the layout constraint once to make it clearer what is going on. Relying on the lowering being populated seemed like too much mental overload

torch/_inductor/graph.py

Fixes #128084 The approach is option 2 of what Elias suggested in the comment thread: - We require tensors to have the correct stride at usage. This may involve a clone; if there was a clone and then a mutation into it then we copy_ back the result of the mutation. The reason why I went this approach was because it was the easiest and Inductor already works really hard to remove additional clones/copy_. There are some cases that this doesn't generate efficient code for; for example, if the tensor is a view, we don't change the base of the view to have the right stride order, instead we do a clone. The view case isn't very common so I'm ignoring it for now but we could improve this in the future. Test Plan: - tests cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

Fixes #128084 The approach is option 2 of what Elias suggested in the comment thread: - We require tensors to have the correct stride at usage. This may involve a clone; if there was a clone and then a mutation into it then we copy_ back the result of the mutation. The reason why I went this approach was because it was the easiest and Inductor already works really hard to remove additional clones/copy_. There are some cases that this doesn't generate efficient code for; for example, if the tensor is a view, we don't change the base of the view to have the right stride order, instead we do a clone. The view case isn't very common so I'm ignoring it for now but we could improve this in the future. Test Plan: - tests ghstack-source-id: d7a1952 Pull Request resolved: #133639

zou3519 · 2024-08-21T22:52:39Z

@pytorchbot merge -f "rocm infinite queue"

pytorchmergebot · 2024-08-23T13:48:07Z

@zou3519 your PR has been successfully reverted.

…ytorch#133639)" This reverts commit 8604c0a. Reverted pytorch#133639 on behalf of https://github.com/jeanschmidt due to Broke internal fbgemm signals, see [D61670495](https://www.internalfb.com/diff/D61670495) ([comment](pytorch#133639 (comment)))

zou3519 · 2024-08-23T17:06:04Z

@pytorchbot merge -f "this just needs an internal-only change (D61722869), going to force merge this to avoid needing to go through another development cycle"

pytorchmergebot · 2024-08-23T17:07:46Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…#133639) Fixes pytorch#128084 The approach is option 2 of what Elias suggested in the comment thread: - We require tensors to have the correct stride at usage. This may involve a clone; if there was a clone and then a mutation into it then we copy_ back the result of the mutation. The reason why I went this approach was because it was the easiest and Inductor already works really hard to remove additional clones/copy_. There are some cases that this doesn't generate efficient code for; for example, if the tensor is a view, we don't change the base of the view to have the right stride order, instead we do a clone. The view case isn't very common so I'm ignoring it for now but we could improve this in the future. Test Plan: - tests Pull Request resolved: pytorch#133639 Approved by: https://github.com/eellison

…ment graph (#134364) The pattern matcher runs DCE and remove_noop_ops on the replacement graph by default. Previously we had a switch for the DCE. This PR changes that switch to also control if we run remove_noop_ops. The context was that there is silent incorrectness with auto_functionalized. We use the Pattern matcher to decompose auto_functionalized into a mutable op + clones; remove_noop_ops were deleting the clones. Future: can try #134363 Test Plan: - new test. I wasn't able to produce a silently incorrect example so I settled for asserting that clones still exist in the post-grad graph. Pull Request resolved: #134364 Approved by: https://github.com/eellison ghstack dependencies: #133639

…kip comments and to skip empty lines (#134248) I had a night mare rewriting tests in test_misc.py specifically : 1. graphs can have comments that refers to my files "/lsakka/.." we really dont care about comments add option to ignore comments. 2. empty lines added when EXPECTTEST_ACCEPT=1 are changed with linter causing tests to fail or linter fail! add flag to ignore empty lines. 3. EXPECTTEST_ACCEPT fails when the text have some not readable characters. those should not effect comparing strings, also those causes weird diffs comments when tests fails. I removed ansi_escape chars #133045 this is used in Pull Request resolved: #134248 Approved by: https://github.com/aorenste ghstack dependencies: #133639, #134364

By default, Inductor is allowed to manipulate the layout (strides+storage offset) of input tensors to custom operators. We want to change it so that the default is that Inductor should respect the stride order of input tensors to custom operators. This PR adds a config to toggle the behavior, in the next PR up we'll change the default. We also make the following changes: - We add a new operator Tag (flexible_layout), which means that inductor is allowed to manipulate the layout. When we flip the default, users can specify they want the old behavior by using this tag. This is a reland of #126986, which was previously reverted due to silent incorrectness. We've since fixed the silent incorrectness (#133639) Test Plan: - new test [ghstack-poisoned]

By default, Inductor is allowed to manipulate the layout (strides+storage offset) of input tensors to custom operators. We want to change it so that the default is that Inductor should respect the stride order of input tensors to custom operators. This PR adds a config to toggle the behavior, in the next PR up we'll change the default. We also make the following changes: - We add a new operator Tag (flexible_layout), which means that inductor is allowed to manipulate the layout. When we flip the default, users can specify they want the old behavior by using this tag. This is a reland of #126986, which was previously reverted due to silent incorrectness. We've since fixed the silent incorrectness (#133639) Test Plan: - new test Pull Request resolved: #135238 Approved by: https://github.com/albanD

By default, Inductor is allowed to manipulate the layout (strides+storage offset) of input tensors to custom operators. We want to change it so that the default is that Inductor should respect the stride order of input tensors to custom operators. This PR adds a config to toggle the behavior, in the next PR up we'll change the default. We also make the following changes: - We add a new operator Tag (flexible_layout), which means that inductor is allowed to manipulate the layout. When we flip the default, users can specify they want the old behavior by using this tag. This is a reland of pytorch/pytorch#126986, which was previously reverted due to silent incorrectness. We've since fixed the silent incorrectness (pytorch/pytorch#133639) Test Plan: - new test ghstack-source-id: ce6c52d Pull Request resolved: pytorch/pytorch#135238

By default, Inductor is allowed to manipulate the layout (strides+storage offset) of input tensors to custom operators. We want to change it so that the default is that Inductor should respect the stride order of input tensors to custom operators. This PR adds a config to toggle the behavior, in the next PR up we'll change the default. We also make the following changes: - We add a new operator Tag (flexible_layout), which means that inductor is allowed to manipulate the layout. When we flip the default, users can specify they want the old behavior by using this tag. This is a reland of pytorch#126986, which was previously reverted due to silent incorrectness. We've since fixed the silent incorrectness (pytorch#133639) Test Plan: - new test Pull Request resolved: pytorch#135238 Approved by: https://github.com/albanD

…ment graph (pytorch#134364) The pattern matcher runs DCE and remove_noop_ops on the replacement graph by default. Previously we had a switch for the DCE. This PR changes that switch to also control if we run remove_noop_ops. The context was that there is silent incorrectness with auto_functionalized. We use the Pattern matcher to decompose auto_functionalized into a mutable op + clones; remove_noop_ops were deleting the clones. Future: can try pytorch#134363 Test Plan: - new test. I wasn't able to produce a silently incorrect example so I settled for asserting that clones still exist in the post-grad graph. Pull Request resolved: pytorch#134364 Approved by: https://github.com/eellison ghstack dependencies: pytorch#133639

…kip comments and to skip empty lines (pytorch#134248) I had a night mare rewriting tests in test_misc.py specifically : 1. graphs can have comments that refers to my files "/lsakka/.." we really dont care about comments add option to ignore comments. 2. empty lines added when EXPECTTEST_ACCEPT=1 are changed with linter causing tests to fail or linter fail! add flag to ignore empty lines. 3. EXPECTTEST_ACCEPT fails when the text have some not readable characters. those should not effect comparing strings, also those causes weird diffs comments when tests fails. I removed ansi_escape chars pytorch#133045 this is used in Pull Request resolved: pytorch#134248 Approved by: https://github.com/aorenste ghstack dependencies: pytorch#133639, pytorch#134364

zou3519 mentioned this pull request Aug 15, 2024

[Inductor] short-term fix for needs_fixed_stride_order silent incorrectness #133452

Closed

pytorch-bot bot added ciflow/inductor module: inductor labels Aug 15, 2024

zou3519 added the ci-no-td Do not run TD on this PR label Aug 16, 2024

zou3519 added the keep-going Don't stop on first failure, keep running tests until the end label Aug 20, 2024

zou3519 requested review from FindHao and eellison August 20, 2024 14:08

eellison approved these changes Aug 21, 2024

View reviewed changes

zou3519 added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 21, 2024

zou3519 added the release notes: inductor label Aug 21, 2024

pytorchmergebot added the Reverted label Aug 23, 2024

pytorchmergebot reopened this Aug 23, 2024

pytorchmergebot added the merging label Aug 23, 2024

pytorchmergebot closed this in afd081c Aug 23, 2024

pytorchmergebot removed the merging label Aug 23, 2024

zou3519 mentioned this pull request Aug 23, 2024

Add option to skip functional passes in the pattern matcher's replacement graph #134364

Closed

This was referenced Aug 25, 2024

Remove ansi escape chars in assertExpectedInline and add options to skip comments and to skip empty lines #134248

Closed

Redesign custom op functionlaization for better re-inplace #134409

Closed

laithsakka mentioned this pull request Aug 29, 2024

Handle non existing base in inference mode for autp_functionalized_v2 #134786

Closed

zou3519 mentioned this pull request Sep 5, 2024

Add Inductor config for default stride behavior #135238

Closed

[inductor] Fix needs_fixed_stride_order silent incorrectness #133639

[inductor] Fix needs_fixed_stride_order silent incorrectness #133639

Uh oh!

Conversation

zou3519 commented Aug 15, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133639

❌ 1 New Failure

Uh oh!

eellison left a comment

Choose a reason for hiding this comment

Uh oh!

eellison Aug 21, 2024

Choose a reason for hiding this comment

Uh oh!

zou3519 Aug 21, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

eellison Aug 21, 2024

Choose a reason for hiding this comment

Uh oh!

zou3519 Aug 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zou3519 Aug 21, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zou3519 commented Aug 21, 2024

Uh oh!

pytorchmergebot commented Aug 23, 2024

Uh oh!

zou3519 commented Aug 23, 2024

Uh oh!

pytorchmergebot commented Aug 23, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

zou3519 commented Aug 15, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Aug 15, 2024 •

edited

Loading

zou3519 Aug 21, 2024 •

edited

Loading