-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[inductor] Fix needs_fixed_stride_order silent incorrectness #133639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fixes #128084 The approach is two-fold (based on what Elias suggested in the comment thread): 1. We realize Tensors and ensure they are the right stride. 2. If we can't ensure they have the right stride (if the tensors are already in an incorrect stride with FixedLayout, or if there's a view), then we require the correct stride on use via a clone and then copy_ back the result of the mutation. We can improve this based on use cases (if the tensor is a view, then we can require the base of the view to have the right stride order). The view case isn't very common so I'm ignoring it for now. Test Plan: - tests [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133639
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit d6b81f3 with merge base 92151c8 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Fixes #128084 The approach is two-fold (based on what Elias suggested in the comment thread): 1. We realize Tensors and ensure they are the right stride. 2. If we can't ensure they have the right stride (if the tensors are already in an incorrect stride with FixedLayout, or if there's a view), then we require the correct stride on use via a clone and then copy_ back the result of the mutation. We can improve this based on use cases (if the tensor is a view, then we can require the base of the view to have the right stride order). The view case isn't very common so I'm ignoring it for now. Also, inductor is really good at removing unnecessary copy_: I couldn't find a case where it the output code had it. Test Plan: - tests [ghstack-poisoned]
Fixes #128084 The approach is two-fold (based on what Elias suggested in the comment thread): 1. We realize Tensors and ensure they are the right stride. 2. If we can't ensure they have the right stride (if the tensors are already in an incorrect stride with FixedLayout, or if there's a view), then we require the correct stride on use via a clone and then copy_ back the result of the mutation. We can improve this based on use cases (if the tensor is a view, then we can require the base of the view to have the right stride order). The view case isn't very common so I'm ignoring it for now. Also, inductor is really good at removing unnecessary copy_: I couldn't find a case where it the output code had it. Test Plan: - tests ghstack-source-id: e69e99e Pull Request resolved: #133639
Fixes #128084 The approach is option 2 of what Elias suggested in the comment thread: - We require tensors to have the correct stride at usage. This may involve a clone; if there was a clone and then a mutation into it then we copy_ back the result of the mutation. The reason why I went this approach was because it was the easiest and Inductor already works really hard to remove additional clones/copy_. There are some cases that this doesn't generate efficient code for; for example, if the tensor is a view, we don't change the base of the view to have the right stride order, instead we do a clone. The view case isn't very common so I'm ignoring it for now but we could improve this in the future. Test Plan: - tests [ghstack-poisoned]
Fixes #128084 The approach is option 2 of what Elias suggested in the comment thread: - We require tensors to have the correct stride at usage. This may involve a clone; if there was a clone and then a mutation into it then we copy_ back the result of the mutation. The reason why I went this approach was because it was the easiest and Inductor already works really hard to remove additional clones/copy_. There are some cases that this doesn't generate efficient code for; for example, if the tensor is a view, we don't change the base of the view to have the right stride order, instead we do a clone. The view case isn't very common so I'm ignoring it for now but we could improve this in the future. Test Plan: - tests ghstack-source-id: 3de3d62 Pull Request resolved: #133639
Fixes #128084 The approach is option 2 of what Elias suggested in the comment thread: - We require tensors to have the correct stride at usage. This may involve a clone; if there was a clone and then a mutation into it then we copy_ back the result of the mutation. The reason why I went this approach was because it was the easiest and Inductor already works really hard to remove additional clones/copy_. There are some cases that this doesn't generate efficient code for; for example, if the tensor is a view, we don't change the base of the view to have the right stride order, instead we do a clone. The view case isn't very common so I'm ignoring it for now but we could improve this in the future. Test Plan: - tests [ghstack-poisoned]
Fixes #128084 The approach is option 2 of what Elias suggested in the comment thread: - We require tensors to have the correct stride at usage. This may involve a clone; if there was a clone and then a mutation into it then we copy_ back the result of the mutation. The reason why I went this approach was because it was the easiest and Inductor already works really hard to remove additional clones/copy_. There are some cases that this doesn't generate efficient code for; for example, if the tensor is a view, we don't change the base of the view to have the right stride order, instead we do a clone. The view case isn't very common so I'm ignoring it for now but we could improve this in the future. Test Plan: - tests ghstack-source-id: 78ec0b2 Pull Request resolved: #133639
Fixes #128084 The approach is option 2 of what Elias suggested in the comment thread: - We require tensors to have the correct stride at usage. This may involve a clone; if there was a clone and then a mutation into it then we copy_ back the result of the mutation. The reason why I went this approach was because it was the easiest and Inductor already works really hard to remove additional clones/copy_. There are some cases that this doesn't generate efficient code for; for example, if the tensor is a view, we don't change the base of the view to have the right stride order, instead we do a clone. The view case isn't very common so I'm ignoring it for now but we could improve this in the future. Test Plan: - tests [ghstack-poisoned]
Fixes #128084 The approach is option 2 of what Elias suggested in the comment thread: - We require tensors to have the correct stride at usage. This may involve a clone; if there was a clone and then a mutation into it then we copy_ back the result of the mutation. The reason why I went this approach was because it was the easiest and Inductor already works really hard to remove additional clones/copy_. There are some cases that this doesn't generate efficient code for; for example, if the tensor is a view, we don't change the base of the view to have the right stride order, instead we do a clone. The view case isn't very common so I'm ignoring it for now but we could improve this in the future. Test Plan: - tests [ghstack-poisoned]
Fixes #128084 The approach is option 2 of what Elias suggested in the comment thread: - We require tensors to have the correct stride at usage. This may involve a clone; if there was a clone and then a mutation into it then we copy_ back the result of the mutation. The reason why I went this approach was because it was the easiest and Inductor already works really hard to remove additional clones/copy_. There are some cases that this doesn't generate efficient code for; for example, if the tensor is a view, we don't change the base of the view to have the right stride order, instead we do a clone. The view case isn't very common so I'm ignoring it for now but we could improve this in the future. Test Plan: - tests ghstack-source-id: 7fee9b5 Pull Request resolved: #133639
Fixes #128084 The approach is option 2 of what Elias suggested in the comment thread: - We require tensors to have the correct stride at usage. This may involve a clone; if there was a clone and then a mutation into it then we copy_ back the result of the mutation. The reason why I went this approach was because it was the easiest and Inductor already works really hard to remove additional clones/copy_. There are some cases that this doesn't generate efficient code for; for example, if the tensor is a view, we don't change the base of the view to have the right stride order, instead we do a clone. The view case isn't very common so I'm ignoring it for now but we could improve this in the future. Test Plan: - tests cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
Fixes #128084 The approach is option 2 of what Elias suggested in the comment thread: - We require tensors to have the correct stride at usage. This may involve a clone; if there was a clone and then a mutation into it then we copy_ back the result of the mutation. The reason why I went this approach was because it was the easiest and Inductor already works really hard to remove additional clones/copy_. There are some cases that this doesn't generate efficient code for; for example, if the tensor is a view, we don't change the base of the view to have the right stride order, instead we do a clone. The view case isn't very common so I'm ignoring it for now but we could improve this in the future. Test Plan: - tests ghstack-source-id: 8b7a94a Pull Request resolved: #133639
eellison
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great !
| "test_conv2d_channels_last_dynamic_shapes": TestFailure(("cpu",)), | ||
| "test_conv3d_dynamic_shapes": TestFailure(("cpu",)), | ||
| "test_conv3d_channels_last_dynamic_shapes": TestFailure(("cpu",)), | ||
| "test_mutable_custom_op_fixed_layout2_dynamic_shapes": TestFailure(("cpu",)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still failing on cpu?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it's because inductor doesn't generate any for loops for this code on CPU?
torch/_inductor/graph.py
Outdated
| # lowering an op that is FallbackKernel for the first time will automatically | ||
| # generate the lowering so that the following isn't called multiple times. | ||
| if torch._C.Tag.needs_fixed_stride_order in op.tags: | ||
| torch._inductor.lowering.add_layout_constraint(op, constrain_to_fx_strides) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is the logic that prevents this from being called twice ? tried to find and failed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to refactor this to be clearer and/or adjust the comment.
The first time we call maybe_add_custom_op_layout_constraints with a custom op foo:
- there's no lowering for the custom op
- directly after this function, in run_node, we do GraphLowering.call_function ()
pytorch/torch/_inductor/graph.py
Line 1281 in 2bffbe0
result = super().run_node(n) - This generates a lowering for the custom op (make_fallback in )
pytorch/torch/_inductor/graph.py
Line 1023 in 2bffbe0
make_fallback(target, layout_constraint)
The second time we call maybe_add_custom_op_layout_constraints with a custom op foo:
- there's a lowering for the custom op, so the early return prevents us from adding the layout constraint twice.
Note that this was how the code worked previously (we would generate the lowering and add the layout constraint once on the first usage of the custom op) but I needed to hoist this logic out because resolving layout constraints happens in run_node
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up adding new logic to register the layout constraint once to make it clearer what is going on. Relying on the lowering being populated seemed like too much mental overload
Fixes #128084 The approach is option 2 of what Elias suggested in the comment thread: - We require tensors to have the correct stride at usage. This may involve a clone; if there was a clone and then a mutation into it then we copy_ back the result of the mutation. The reason why I went this approach was because it was the easiest and Inductor already works really hard to remove additional clones/copy_. There are some cases that this doesn't generate efficient code for; for example, if the tensor is a view, we don't change the base of the view to have the right stride order, instead we do a clone. The view case isn't very common so I'm ignoring it for now but we could improve this in the future. Test Plan: - tests cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
Fixes #128084 The approach is option 2 of what Elias suggested in the comment thread: - We require tensors to have the correct stride at usage. This may involve a clone; if there was a clone and then a mutation into it then we copy_ back the result of the mutation. The reason why I went this approach was because it was the easiest and Inductor already works really hard to remove additional clones/copy_. There are some cases that this doesn't generate efficient code for; for example, if the tensor is a view, we don't change the base of the view to have the right stride order, instead we do a clone. The view case isn't very common so I'm ignoring it for now but we could improve this in the future. Test Plan: - tests cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
Fixes #128084 The approach is option 2 of what Elias suggested in the comment thread: - We require tensors to have the correct stride at usage. This may involve a clone; if there was a clone and then a mutation into it then we copy_ back the result of the mutation. The reason why I went this approach was because it was the easiest and Inductor already works really hard to remove additional clones/copy_. There are some cases that this doesn't generate efficient code for; for example, if the tensor is a view, we don't change the base of the view to have the right stride order, instead we do a clone. The view case isn't very common so I'm ignoring it for now but we could improve this in the future. Test Plan: - tests ghstack-source-id: d7a1952 Pull Request resolved: #133639
|
@pytorchbot merge -f "rocm infinite queue" |
|
@zou3519 your PR has been successfully reverted. |
…ytorch#133639)" This reverts commit 8604c0a. Reverted pytorch#133639 on behalf of https://github.com/jeanschmidt due to Broke internal fbgemm signals, see [D61670495](https://www.internalfb.com/diff/D61670495) ([comment](pytorch#133639 (comment)))
|
@pytorchbot merge -f "this just needs an internal-only change (D61722869), going to force merge this to avoid needing to go through another development cycle" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…#133639) Fixes pytorch#128084 The approach is option 2 of what Elias suggested in the comment thread: - We require tensors to have the correct stride at usage. This may involve a clone; if there was a clone and then a mutation into it then we copy_ back the result of the mutation. The reason why I went this approach was because it was the easiest and Inductor already works really hard to remove additional clones/copy_. There are some cases that this doesn't generate efficient code for; for example, if the tensor is a view, we don't change the base of the view to have the right stride order, instead we do a clone. The view case isn't very common so I'm ignoring it for now but we could improve this in the future. Test Plan: - tests Pull Request resolved: pytorch#133639 Approved by: https://github.com/eellison
…ment graph (#134364) The pattern matcher runs DCE and remove_noop_ops on the replacement graph by default. Previously we had a switch for the DCE. This PR changes that switch to also control if we run remove_noop_ops. The context was that there is silent incorrectness with auto_functionalized. We use the Pattern matcher to decompose auto_functionalized into a mutable op + clones; remove_noop_ops were deleting the clones. Future: can try #134363 Test Plan: - new test. I wasn't able to produce a silently incorrect example so I settled for asserting that clones still exist in the post-grad graph. Pull Request resolved: #134364 Approved by: https://github.com/eellison ghstack dependencies: #133639
…kip comments and to skip empty lines (#134248) I had a night mare rewriting tests in test_misc.py specifically : 1. graphs can have comments that refers to my files "/lsakka/.." we really dont care about comments add option to ignore comments. 2. empty lines added when EXPECTTEST_ACCEPT=1 are changed with linter causing tests to fail or linter fail! add flag to ignore empty lines. 3. EXPECTTEST_ACCEPT fails when the text have some not readable characters. those should not effect comparing strings, also those causes weird diffs comments when tests fails. I removed ansi_escape chars #133045 this is used in Pull Request resolved: #134248 Approved by: https://github.com/aorenste ghstack dependencies: #133639, #134364
By default, Inductor is allowed to manipulate the layout (strides+storage offset) of input tensors to custom operators. We want to change it so that the default is that Inductor should respect the stride order of input tensors to custom operators. This PR adds a config to toggle the behavior, in the next PR up we'll change the default. We also make the following changes: - We add a new operator Tag (flexible_layout), which means that inductor is allowed to manipulate the layout. When we flip the default, users can specify they want the old behavior by using this tag. This is a reland of #126986, which was previously reverted due to silent incorrectness. We've since fixed the silent incorrectness (#133639) Test Plan: - new test [ghstack-poisoned]
By default, Inductor is allowed to manipulate the layout (strides+storage offset) of input tensors to custom operators. We want to change it so that the default is that Inductor should respect the stride order of input tensors to custom operators. This PR adds a config to toggle the behavior, in the next PR up we'll change the default. We also make the following changes: - We add a new operator Tag (flexible_layout), which means that inductor is allowed to manipulate the layout. When we flip the default, users can specify they want the old behavior by using this tag. This is a reland of #126986, which was previously reverted due to silent incorrectness. We've since fixed the silent incorrectness (#133639) Test Plan: - new test Pull Request resolved: #135238 Approved by: https://github.com/albanD
By default, Inductor is allowed to manipulate the layout (strides+storage offset) of input tensors to custom operators. We want to change it so that the default is that Inductor should respect the stride order of input tensors to custom operators. This PR adds a config to toggle the behavior, in the next PR up we'll change the default. We also make the following changes: - We add a new operator Tag (flexible_layout), which means that inductor is allowed to manipulate the layout. When we flip the default, users can specify they want the old behavior by using this tag. This is a reland of pytorch/pytorch#126986, which was previously reverted due to silent incorrectness. We've since fixed the silent incorrectness (pytorch/pytorch#133639) Test Plan: - new test ghstack-source-id: ce6c52d Pull Request resolved: pytorch/pytorch#135238
By default, Inductor is allowed to manipulate the layout (strides+storage offset) of input tensors to custom operators. We want to change it so that the default is that Inductor should respect the stride order of input tensors to custom operators. This PR adds a config to toggle the behavior, in the next PR up we'll change the default. We also make the following changes: - We add a new operator Tag (flexible_layout), which means that inductor is allowed to manipulate the layout. When we flip the default, users can specify they want the old behavior by using this tag. This is a reland of pytorch#126986, which was previously reverted due to silent incorrectness. We've since fixed the silent incorrectness (pytorch#133639) Test Plan: - new test Pull Request resolved: pytorch#135238 Approved by: https://github.com/albanD
…ment graph (pytorch#134364) The pattern matcher runs DCE and remove_noop_ops on the replacement graph by default. Previously we had a switch for the DCE. This PR changes that switch to also control if we run remove_noop_ops. The context was that there is silent incorrectness with auto_functionalized. We use the Pattern matcher to decompose auto_functionalized into a mutable op + clones; remove_noop_ops were deleting the clones. Future: can try pytorch#134363 Test Plan: - new test. I wasn't able to produce a silently incorrect example so I settled for asserting that clones still exist in the post-grad graph. Pull Request resolved: pytorch#134364 Approved by: https://github.com/eellison ghstack dependencies: pytorch#133639
…kip comments and to skip empty lines (pytorch#134248) I had a night mare rewriting tests in test_misc.py specifically : 1. graphs can have comments that refers to my files "/lsakka/.." we really dont care about comments add option to ignore comments. 2. empty lines added when EXPECTTEST_ACCEPT=1 are changed with linter causing tests to fail or linter fail! add flag to ignore empty lines. 3. EXPECTTEST_ACCEPT fails when the text have some not readable characters. those should not effect comparing strings, also those causes weird diffs comments when tests fails. I removed ansi_escape chars pytorch#133045 this is used in Pull Request resolved: pytorch#134248 Approved by: https://github.com/aorenste ghstack dependencies: pytorch#133639, pytorch#134364
Stack from ghstack (oldest at bottom):
Fixes #128084
The approach is option 2 of what Elias suggested in the comment
thread:
involve a clone; if there was a clone and then a mutation into it
then we copy_ back the result of the mutation.
The reason why I went this approach was because it was the easiest and
Inductor already works really hard to remove additional clones/copy_.
There are some cases that this doesn't generate efficient code for; for
example, if the tensor is a view, we don't change the base of the view
to have the right stride order, instead we do a clone.
The view case isn't very common so I'm ignoring it for now but we could
improve this in the future.
Test Plan:
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang