Skip to content

Conversation

@zou3519
Copy link
Contributor

@zou3519 zou3519 commented Aug 15, 2024

Stack from ghstack (oldest at bottom):

Fixes #128084

The approach is option 2 of what Elias suggested in the comment
thread:

  • We require tensors to have the correct stride at usage. This may
    involve a clone; if there was a clone and then a mutation into it
    then we copy_ back the result of the mutation.

The reason why I went this approach was because it was the easiest and
Inductor already works really hard to remove additional clones/copy_.

There are some cases that this doesn't generate efficient code for; for
example, if the tensor is a view, we don't change the base of the view
to have the right stride order, instead we do a clone.
The view case isn't very common so I'm ignoring it for now but we could
improve this in the future.

Test Plan:

  • tests

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

Fixes #128084

The approach is two-fold (based on what Elias suggested in the comment
thread):
1. We realize Tensors and ensure they are the right stride.
2. If we can't ensure they have the right stride (if the tensors are
   already in an incorrect stride with FixedLayout, or if there's a
   view), then we require the correct stride on use via a clone and then
   copy_ back the result of the mutation.

We can improve this based on use cases (if the tensor is a view, then we
can require the base of the view to have the right stride order). The
view case isn't very common so I'm ignoring it for now.

Test Plan:
- tests

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 15, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133639

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit d6b81f3 with merge base 92151c8 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@zou3519 zou3519 added the ci-no-td Do not run TD on this PR label Aug 16, 2024
Fixes #128084

The approach is two-fold (based on what Elias suggested in the comment
thread):
1. We realize Tensors and ensure they are the right stride.
2. If we can't ensure they have the right stride (if the tensors are
   already in an incorrect stride with FixedLayout, or if there's a
   view), then we require the correct stride on use via a clone and then
   copy_ back the result of the mutation.

We can improve this based on use cases (if the tensor is a view, then we
can require the base of the view to have the right stride order). The
view case isn't very common so I'm ignoring it for now.

Also, inductor is really good at removing unnecessary copy_: I couldn't
find a case where it the output code had it.

Test Plan:
- tests

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Aug 16, 2024
Fixes #128084

The approach is two-fold (based on what Elias suggested in the comment
thread):
1. We realize Tensors and ensure they are the right stride.
2. If we can't ensure they have the right stride (if the tensors are
   already in an incorrect stride with FixedLayout, or if there's a
   view), then we require the correct stride on use via a clone and then
   copy_ back the result of the mutation.

We can improve this based on use cases (if the tensor is a view, then we
can require the base of the view to have the right stride order). The
view case isn't very common so I'm ignoring it for now.

Also, inductor is really good at removing unnecessary copy_: I couldn't
find a case where it the output code had it.

Test Plan:
- tests

ghstack-source-id: e69e99e
Pull Request resolved: #133639
Fixes #128084

The approach is option 2 of what Elias suggested in the comment
thread:
- We require tensors to have the correct stride at usage. This may
  involve a clone; if there was a clone and then a mutation into it
  then we copy_ back the result of the mutation.

The reason why I went this approach was because it was the easiest and
Inductor already works really hard to remove additional clones/copy_.

There are some cases that this doesn't generate efficient code for; for
example, if the tensor is a view, we don't change the base of the view
to have the right stride order, instead we do a clone.
The view case isn't very common so I'm ignoring it for now but we could
improve this in the future.

Test Plan:
- tests

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Aug 16, 2024
Fixes #128084

The approach is option 2 of what Elias suggested in the comment
thread:
- We require tensors to have the correct stride at usage. This may
  involve a clone; if there was a clone and then a mutation into it
  then we copy_ back the result of the mutation.

The reason why I went this approach was because it was the easiest and
Inductor already works really hard to remove additional clones/copy_.

There are some cases that this doesn't generate efficient code for; for
example, if the tensor is a view, we don't change the base of the view
to have the right stride order, instead we do a clone.
The view case isn't very common so I'm ignoring it for now but we could
improve this in the future.

Test Plan:
- tests

ghstack-source-id: 3de3d62
Pull Request resolved: #133639
Fixes #128084

The approach is option 2 of what Elias suggested in the comment
thread:
- We require tensors to have the correct stride at usage. This may
  involve a clone; if there was a clone and then a mutation into it
  then we copy_ back the result of the mutation.

The reason why I went this approach was because it was the easiest and
Inductor already works really hard to remove additional clones/copy_.

There are some cases that this doesn't generate efficient code for; for
example, if the tensor is a view, we don't change the base of the view
to have the right stride order, instead we do a clone.
The view case isn't very common so I'm ignoring it for now but we could
improve this in the future.

Test Plan:
- tests

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Aug 16, 2024
Fixes #128084

The approach is option 2 of what Elias suggested in the comment
thread:
- We require tensors to have the correct stride at usage. This may
  involve a clone; if there was a clone and then a mutation into it
  then we copy_ back the result of the mutation.

The reason why I went this approach was because it was the easiest and
Inductor already works really hard to remove additional clones/copy_.

There are some cases that this doesn't generate efficient code for; for
example, if the tensor is a view, we don't change the base of the view
to have the right stride order, instead we do a clone.
The view case isn't very common so I'm ignoring it for now but we could
improve this in the future.

Test Plan:
- tests

ghstack-source-id: 78ec0b2
Pull Request resolved: #133639
Fixes #128084

The approach is option 2 of what Elias suggested in the comment
thread:
- We require tensors to have the correct stride at usage. This may
  involve a clone; if there was a clone and then a mutation into it
  then we copy_ back the result of the mutation.

The reason why I went this approach was because it was the easiest and
Inductor already works really hard to remove additional clones/copy_.

There are some cases that this doesn't generate efficient code for; for
example, if the tensor is a view, we don't change the base of the view
to have the right stride order, instead we do a clone.
The view case isn't very common so I'm ignoring it for now but we could
improve this in the future.

Test Plan:
- tests

[ghstack-poisoned]
Fixes #128084

The approach is option 2 of what Elias suggested in the comment
thread:
- We require tensors to have the correct stride at usage. This may
  involve a clone; if there was a clone and then a mutation into it
  then we copy_ back the result of the mutation.

The reason why I went this approach was because it was the easiest and
Inductor already works really hard to remove additional clones/copy_.

There are some cases that this doesn't generate efficient code for; for
example, if the tensor is a view, we don't change the base of the view
to have the right stride order, instead we do a clone.
The view case isn't very common so I'm ignoring it for now but we could
improve this in the future.

Test Plan:
- tests

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Aug 19, 2024
Fixes #128084

The approach is option 2 of what Elias suggested in the comment
thread:
- We require tensors to have the correct stride at usage. This may
  involve a clone; if there was a clone and then a mutation into it
  then we copy_ back the result of the mutation.

The reason why I went this approach was because it was the easiest and
Inductor already works really hard to remove additional clones/copy_.

There are some cases that this doesn't generate efficient code for; for
example, if the tensor is a view, we don't change the base of the view
to have the right stride order, instead we do a clone.
The view case isn't very common so I'm ignoring it for now but we could
improve this in the future.

Test Plan:
- tests

ghstack-source-id: 7fee9b5
Pull Request resolved: #133639
@zou3519 zou3519 added the keep-going Don't stop on first failure, keep running tests until the end label Aug 20, 2024
@zou3519 zou3519 requested review from FindHao and eellison August 20, 2024 14:08
Fixes #128084

The approach is option 2 of what Elias suggested in the comment
thread:
- We require tensors to have the correct stride at usage. This may
  involve a clone; if there was a clone and then a mutation into it
  then we copy_ back the result of the mutation.

The reason why I went this approach was because it was the easiest and
Inductor already works really hard to remove additional clones/copy_.

There are some cases that this doesn't generate efficient code for; for
example, if the tensor is a view, we don't change the base of the view
to have the right stride order, instead we do a clone.
The view case isn't very common so I'm ignoring it for now but we could
improve this in the future.

Test Plan:
- tests

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Aug 20, 2024
Fixes #128084

The approach is option 2 of what Elias suggested in the comment
thread:
- We require tensors to have the correct stride at usage. This may
  involve a clone; if there was a clone and then a mutation into it
  then we copy_ back the result of the mutation.

The reason why I went this approach was because it was the easiest and
Inductor already works really hard to remove additional clones/copy_.

There are some cases that this doesn't generate efficient code for; for
example, if the tensor is a view, we don't change the base of the view
to have the right stride order, instead we do a clone.
The view case isn't very common so I'm ignoring it for now but we could
improve this in the future.

Test Plan:
- tests

ghstack-source-id: 8b7a94a
Pull Request resolved: #133639
Copy link
Contributor

@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great !

"test_conv2d_channels_last_dynamic_shapes": TestFailure(("cpu",)),
"test_conv3d_dynamic_shapes": TestFailure(("cpu",)),
"test_conv3d_channels_last_dynamic_shapes": TestFailure(("cpu",)),
"test_mutable_custom_op_fixed_layout2_dynamic_shapes": TestFailure(("cpu",)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still failing on cpu?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it's because inductor doesn't generate any for loops for this code on CPU?

Comment on lines 1258 to 1261
# lowering an op that is FallbackKernel for the first time will automatically
# generate the lowering so that the following isn't called multiple times.
if torch._C.Tag.needs_fixed_stride_order in op.tags:
torch._inductor.lowering.add_layout_constraint(op, constrain_to_fx_strides)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is the logic that prevents this from being called twice ? tried to find and failed

Copy link
Contributor Author

@zou3519 zou3519 Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to refactor this to be clearer and/or adjust the comment.

The first time we call maybe_add_custom_op_layout_constraints with a custom op foo:

The second time we call maybe_add_custom_op_layout_constraints with a custom op foo:

  • there's a lowering for the custom op, so the early return prevents us from adding the layout constraint twice.

Note that this was how the code worked previously (we would generate the lowering and add the layout constraint once on the first usage of the custom op) but I needed to hoist this logic out because resolving layout constraints happens in run_node

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up adding new logic to register the layout constraint once to make it clearer what is going on. Relying on the lowering being populated seemed like too much mental overload

Fixes #128084

The approach is option 2 of what Elias suggested in the comment
thread:
- We require tensors to have the correct stride at usage. This may
  involve a clone; if there was a clone and then a mutation into it
  then we copy_ back the result of the mutation.

The reason why I went this approach was because it was the easiest and
Inductor already works really hard to remove additional clones/copy_.

There are some cases that this doesn't generate efficient code for; for
example, if the tensor is a view, we don't change the base of the view
to have the right stride order, instead we do a clone.
The view case isn't very common so I'm ignoring it for now but we could
improve this in the future.

Test Plan:
- tests

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
@zou3519 zou3519 added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 21, 2024
Fixes #128084

The approach is option 2 of what Elias suggested in the comment
thread:
- We require tensors to have the correct stride at usage. This may
  involve a clone; if there was a clone and then a mutation into it
  then we copy_ back the result of the mutation.

The reason why I went this approach was because it was the easiest and
Inductor already works really hard to remove additional clones/copy_.

There are some cases that this doesn't generate efficient code for; for
example, if the tensor is a view, we don't change the base of the view
to have the right stride order, instead we do a clone.
The view case isn't very common so I'm ignoring it for now but we could
improve this in the future.

Test Plan:
- tests

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Aug 21, 2024
Fixes #128084

The approach is option 2 of what Elias suggested in the comment
thread:
- We require tensors to have the correct stride at usage. This may
  involve a clone; if there was a clone and then a mutation into it
  then we copy_ back the result of the mutation.

The reason why I went this approach was because it was the easiest and
Inductor already works really hard to remove additional clones/copy_.

There are some cases that this doesn't generate efficient code for; for
example, if the tensor is a view, we don't change the base of the view
to have the right stride order, instead we do a clone.
The view case isn't very common so I'm ignoring it for now but we could
improve this in the future.

Test Plan:
- tests

ghstack-source-id: d7a1952
Pull Request resolved: #133639
@zou3519
Copy link
Contributor Author

zou3519 commented Aug 21, 2024

@pytorchbot merge -f "rocm infinite queue"

@pytorchmergebot
Copy link
Collaborator

@zou3519 your PR has been successfully reverted.

@zou3519
Copy link
Contributor Author

zou3519 commented Aug 23, 2024

@pytorchbot merge -f "this just needs an internal-only change (D61722869), going to force merge this to avoid needing to go through another development cycle"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit to mayank31398/pytorch that referenced this pull request Aug 23, 2024
…#133639)

Fixes pytorch#128084

The approach is option 2 of what Elias suggested in the comment
thread:
- We require tensors to have the correct stride at usage. This may
  involve a clone; if there was a clone and then a mutation into it
  then we copy_ back the result of the mutation.

The reason why I went this approach was because it was the easiest and
Inductor already works really hard to remove additional clones/copy_.

There are some cases that this doesn't generate efficient code for; for
example, if the tensor is a view, we don't change the base of the view
to have the right stride order, instead we do a clone.
The view case isn't very common so I'm ignoring it for now but we could
improve this in the future.

Test Plan:
- tests

Pull Request resolved: pytorch#133639
Approved by: https://github.com/eellison
pytorchmergebot pushed a commit that referenced this pull request Aug 24, 2024
…ment graph (#134364)

The pattern matcher runs DCE and remove_noop_ops on the replacement
graph by default. Previously we had a switch for the DCE. This PR
changes that switch to also control if we run remove_noop_ops.

The context was that there is silent incorrectness with
auto_functionalized. We use the Pattern matcher to decompose
auto_functionalized into a mutable op + clones; remove_noop_ops were
deleting the clones.

Future: can try #134363

Test Plan:
- new test. I wasn't able to produce a silently incorrect example so I
  settled for asserting that clones still exist in the post-grad graph.

Pull Request resolved: #134364
Approved by: https://github.com/eellison
ghstack dependencies: #133639
pytorchmergebot pushed a commit that referenced this pull request Aug 26, 2024
…kip comments and to skip empty lines (#134248)

I had a night mare rewriting tests in test_misc.py specifically :
1. graphs can have comments that refers to my files "/lsakka/.." we really dont care about comments add option to ignore comments.
2. empty lines added when EXPECTTEST_ACCEPT=1  are changed with linter causing tests to fail or linter fail!
add flag to ignore empty lines.
3. EXPECTTEST_ACCEPT fails when the text have some not readable characters. those should not effect comparing strings, also those causes weird diffs comments when tests fails. I removed ansi_escape chars #133045

this is used in

Pull Request resolved: #134248
Approved by: https://github.com/aorenste
ghstack dependencies: #133639, #134364
zou3519 added a commit that referenced this pull request Sep 5, 2024
By default, Inductor is allowed to manipulate the layout
(strides+storage offset) of input tensors to custom operators.

We want to change it so that the default is that Inductor should respect
the stride order of input tensors to custom operators.

This PR adds a config to toggle the behavior, in the next PR up we'll
change the default. We also make the following changes:
- We add a new operator Tag (flexible_layout), which means that
inductor is allowed to manipulate the layout. When we flip the default,
users can specify they want the old behavior by using this tag.

This is a reland of #126986,
which was previously reverted due to silent incorrectness. We've since
fixed the silent incorrectness
(#133639)

Test Plan:
- new test

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Sep 6, 2024
By default, Inductor is allowed to manipulate the layout
(strides+storage offset) of input tensors to custom operators.

We want to change it so that the default is that Inductor should respect
the stride order of input tensors to custom operators.

This PR adds a config to toggle the behavior, in the next PR up we'll
change the default. We also make the following changes:
- We add a new operator Tag (flexible_layout), which means that
inductor is allowed to manipulate the layout. When we flip the default,
users can specify they want the old behavior by using this tag.

This is a reland of #126986,
which was previously reverted due to silent incorrectness. We've since
fixed the silent incorrectness
(#133639)

Test Plan:
- new test

Pull Request resolved: #135238
Approved by: https://github.com/albanD
enter-ctrl9 pushed a commit to enter-ctrl9/pytorch11 that referenced this pull request Sep 15, 2024
By default, Inductor is allowed to manipulate the layout
(strides+storage offset) of input tensors to custom operators.

We want to change it so that the default is that Inductor should respect
the stride order of input tensors to custom operators.

This PR adds a config to toggle the behavior, in the next PR up we'll
change the default. We also make the following changes:
- We add a new operator Tag (flexible_layout), which means that
inductor is allowed to manipulate the layout. When we flip the default,
users can specify they want the old behavior by using this tag.

This is a reland of pytorch/pytorch#126986,
which was previously reverted due to silent incorrectness. We've since
fixed the silent incorrectness
(pytorch/pytorch#133639)

Test Plan:
- new test

ghstack-source-id: ce6c52d
Pull Request resolved: pytorch/pytorch#135238
Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Sep 20, 2024
By default, Inductor is allowed to manipulate the layout
(strides+storage offset) of input tensors to custom operators.

We want to change it so that the default is that Inductor should respect
the stride order of input tensors to custom operators.

This PR adds a config to toggle the behavior, in the next PR up we'll
change the default. We also make the following changes:
- We add a new operator Tag (flexible_layout), which means that
inductor is allowed to manipulate the layout. When we flip the default,
users can specify they want the old behavior by using this tag.

This is a reland of pytorch#126986,
which was previously reverted due to silent incorrectness. We've since
fixed the silent incorrectness
(pytorch#133639)

Test Plan:
- new test

Pull Request resolved: pytorch#135238
Approved by: https://github.com/albanD
Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Sep 26, 2024
…ment graph (pytorch#134364)

The pattern matcher runs DCE and remove_noop_ops on the replacement
graph by default. Previously we had a switch for the DCE. This PR
changes that switch to also control if we run remove_noop_ops.

The context was that there is silent incorrectness with
auto_functionalized. We use the Pattern matcher to decompose
auto_functionalized into a mutable op + clones; remove_noop_ops were
deleting the clones.

Future: can try pytorch#134363

Test Plan:
- new test. I wasn't able to produce a silently incorrect example so I
  settled for asserting that clones still exist in the post-grad graph.

Pull Request resolved: pytorch#134364
Approved by: https://github.com/eellison
ghstack dependencies: pytorch#133639
Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Sep 26, 2024
…kip comments and to skip empty lines (pytorch#134248)

I had a night mare rewriting tests in test_misc.py specifically :
1. graphs can have comments that refers to my files "/lsakka/.." we really dont care about comments add option to ignore comments.
2. empty lines added when EXPECTTEST_ACCEPT=1  are changed with linter causing tests to fail or linter fail!
add flag to ignore empty lines.
3. EXPECTTEST_ACCEPT fails when the text have some not readable characters. those should not effect comparing strings, also those causes weird diffs comments when tests fails. I removed ansi_escape chars pytorch#133045

this is used in

Pull Request resolved: pytorch#134248
Approved by: https://github.com/aorenste
ghstack dependencies: pytorch#133639, pytorch#134364
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request keep-going Don't stop on first failure, keep running tests until the end Merged module: inductor release notes: inductor Reverted

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants