Skip to content

Conversation

@ssnl
Copy link
Collaborator

@ssnl ssnl commented Jun 27, 2018

  1. Optimize for permute case: since we sort the strides, we can check if the size & strides match via some permutation (assuming that strides are unique).

  2. Optimize for expand case:

    1. expanded input: expanding dim i to size[i] is just multiplying the #times an address is referred to in input by size[i]. So we can just divide gradient by \prod_{expanded dim i} inp_size[i].
    2. expanded output: using a single sum over multiple dimensions, rather than doing a sum for each one. (same for squeeze).

I added a test that asserts combinations of permute, (un)squeeze, expand, and transpose won't allocate the large storage-like tensor. Although it tests the internal behavior of as_strided_backward and will need to be modified if further optimizations are done in other cases, I can't find better ways to test that we are actually not copying but only returning a view of the gradient.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ssnl has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ezyang
Copy link
Contributor

ezyang commented Jun 27, 2018

@pytorchbot retest this please

This comment was marked as off-topic.

@ezyang
Copy link
Contributor

ezyang commented Jun 28, 2018

This is a pretty complex patch. Maybe try to get someone to pair review it with you in person.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ssnl has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ssnl ssnl changed the title Speed up permute and expand cases in as_strided_backward [wip] Speed up permute and expand cases in as_strided_backward Jun 28, 2018
@ezyang
Copy link
Contributor

ezyang commented Nov 5, 2018

You can't claim a speed-up without perf numbers!

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

ezyang
ezyang previously approved these changes Nov 5, 2018
@zdevito zdevito removed their request for review February 13, 2019 01:22
@gchanan gchanan removed their request for review February 28, 2019 16:19
@pytorchbot pytorchbot added module: autograd Related to torch.autograd, and the autograd engine in general module: internals Related to internal abstractions in c10 and ATen labels Jun 7, 2019
zou3519
zou3519 previously requested changes Jun 7, 2019
Copy link
Contributor

@zou3519 zou3519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test failure looks real

Jun 07 15:04:29 ======================================================================
Jun 07 15:04:29 ERROR: test_as_strided (__main__.TestAutograd)
Jun 07 15:04:29 ----------------------------------------------------------------------
Jun 07 15:04:29 Traceback (most recent call last):
Jun 07 15:04:29   File "test_autograd.py", line 2623, in test_as_strided
Jun 07 15:04:29     test(torch.randn(2, 3), lambda x: x.expand(10, 2, 3), [2, 3], [3, 1], 0)
Jun 07 15:04:29   File "test_autograd.py", line 2593, in test
Jun 07 15:04:29     gradcheck(closure, [x])
Jun 07 15:04:29   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/autograd/gradcheck.py", line 289, in gradcheck
Jun 07 15:04:29     'numerical:%s\nanalytical:%s\n' % (i, j, n, a))
Jun 07 15:04:29   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/autograd/gradcheck.py", line 227, in fail_test
Jun 07 15:04:29     raise RuntimeError(msg)
Jun 07 15:04:29 RuntimeError: Jacobian mismatch for output 0 with respect to input 0,
Jun 07 15:04:29 numerical:tensor([[1.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
Jun 07 15:04:29         [0.0000, 1.0000, 0.0000, 0.0000, 0.0000, 0.0000],
Jun 07 15:04:29         [0.0000, 0.0000, 1.0000, 0.0000, 0.0000, 0.0000],
Jun 07 15:04:29         [0.0000, 0.0000, 0.0000, 1.0000, 0.0000, 0.0000],
Jun 07 15:04:29         [0.0000, 0.0000, 0.0000, 0.0000, 1.0000, 0.0000],
Jun 07 15:04:29         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 1.0000]])
Jun 07 15:04:29 analytical:tensor([[10.,  0.,  0.,  0.,  0.,  0.],
Jun 07 15:04:29         [ 0., 10.,  0.,  0.,  0.,  0.],
Jun 07 15:04:29         [ 0.,  0., 10.,  0.,  0.,  0.],
Jun 07 15:04:29         [ 0.,  0.,  0., 10.,  0.,  0.],
Jun 07 15:04:29         [ 0.,  0.,  0.,  0., 10.,  0.],
Jun 07 15:04:29         [ 0.,  0.,  0.,  0.,  0., 10.]])

@ssnl
Copy link
Collaborator Author

ssnl commented Jun 7, 2019

This is WIP still. But I plan to finish it when I have time.

@zou3519 zou3519 dismissed stale reviews from ezyang and themself June 7, 2019 17:41

wip PR

facebook-github-bot pushed a commit that referenced this pull request Oct 31, 2019
Summary:
Currently, `reshape` does an `as_strided` when the geometry is viewable. However, `as_strided` backward is not very optimized, and can not always detect such cases. Improvements are planned at #8965, and I will finish it some day. But the current situation is that in these cases backward through `reshape` will copy gradient while a simple `view` will not. This is unnecessary.

Notably this affects `flatten` and a whole bunch of other ops implemented on top of `reshape`.

```py
In [15]: x = torch.randn(3, 4, requires_grad=True)

In [16]: y = x.reshape(x.shape)

In [17]: assert y._base is not None

In [18]: gy = torch.randn_like(y)

In [20]: gx = torch.autograd.grad(y, x, gy)[0]

In [21]: gx
Out[21]:
tensor([[ 0.2189,  0.3396, -0.1108,  1.7703],
        [ 1.0737, -0.1222,  1.0765, -1.3363],
        [-1.3798, -0.2950,  0.0800,  0.2501]])

In [22]: gx._base  # not gy
Out[22]:
tensor([ 0.2189,  0.3396, -0.1108,  1.7703,  1.0737, -0.1222,  1.0765, -1.3363,
        -1.3798, -0.2950,  0.0800,  0.2501])

In [23]: gy.zero_()
Out[23]:
tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

In [24]: gx  # not sharing storage with gy
Out[24]:
tensor([[ 0.2189,  0.3396, -0.1108,  1.7703],
        [ 1.0737, -0.1222,  1.0765, -1.3363],
        [-1.3798, -0.2950,  0.0800,  0.2501]])

# but everything is optimized with view, which should be equivalent with reshape in this case
In [25]: y = x.view(x.shape)

In [26]: assert y._base is not None

In [27]: gy = torch.randn_like(y)

In [28]: gx = torch.autograd.grad(y, x, gy)[0]

In [29]: gx
Out[29]:
tensor([[-2.4463,  1.1446,  0.1501,  0.1212],
        [-1.1125,  1.4661,  0.9092, -0.2153],
        [-0.1937, -0.3381, -1.3883, -0.7329]])

In [30]: gy.zero_()
Out[30]:
tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

In [31]: gx  # sharing storage with gy
Out[31]:
tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

```
Pull Request resolved: #28901

Differential Revision: D18240868

Pulled By: ezyang

fbshipit-source-id: 28fdaa0c7014a9dae6731dfe8b67784d38fc27f0
zdevito pushed a commit to zdevito/ATen that referenced this pull request Oct 31, 2019
Summary:
Currently, `reshape` does an `as_strided` when the geometry is viewable. However, `as_strided` backward is not very optimized, and can not always detect such cases. Improvements are planned at pytorch/pytorch#8965, and I will finish it some day. But the current situation is that in these cases backward through `reshape` will copy gradient while a simple `view` will not. This is unnecessary.

Notably this affects `flatten` and a whole bunch of other ops implemented on top of `reshape`.

```py
In [15]: x = torch.randn(3, 4, requires_grad=True)

In [16]: y = x.reshape(x.shape)

In [17]: assert y._base is not None

In [18]: gy = torch.randn_like(y)

In [20]: gx = torch.autograd.grad(y, x, gy)[0]

In [21]: gx
Out[21]:
tensor([[ 0.2189,  0.3396, -0.1108,  1.7703],
        [ 1.0737, -0.1222,  1.0765, -1.3363],
        [-1.3798, -0.2950,  0.0800,  0.2501]])

In [22]: gx._base  # not gy
Out[22]:
tensor([ 0.2189,  0.3396, -0.1108,  1.7703,  1.0737, -0.1222,  1.0765, -1.3363,
        -1.3798, -0.2950,  0.0800,  0.2501])

In [23]: gy.zero_()
Out[23]:
tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

In [24]: gx  # not sharing storage with gy
Out[24]:
tensor([[ 0.2189,  0.3396, -0.1108,  1.7703],
        [ 1.0737, -0.1222,  1.0765, -1.3363],
        [-1.3798, -0.2950,  0.0800,  0.2501]])

# but everything is optimized with view, which should be equivalent with reshape in this case
In [25]: y = x.view(x.shape)

In [26]: assert y._base is not None

In [27]: gy = torch.randn_like(y)

In [28]: gx = torch.autograd.grad(y, x, gy)[0]

In [29]: gx
Out[29]:
tensor([[-2.4463,  1.1446,  0.1501,  0.1212],
        [-1.1125,  1.4661,  0.9092, -0.2153],
        [-0.1937, -0.3381, -1.3883, -0.7329]])

In [30]: gy.zero_()
Out[30]:
tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

In [31]: gx  # sharing storage with gy
Out[31]:
tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

```
Pull Request resolved: pytorch/pytorch#28901

Differential Revision: D18240868

Pulled By: ezyang

fbshipit-source-id: 28fdaa0c7014a9dae6731dfe8b67784d38fc27f0
@pytorchbot
Copy link
Collaborator

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
Stale pull requests will automatically be closed 30 days after being marked Stale

@pytorchbot pytorchbot added Stale and removed Stale labels Apr 12, 2022
@github-actions
Copy link
Contributor

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Jun 11, 2022
@github-actions github-actions bot closed this Jul 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed module: autograd Related to torch.autograd, and the autograd engine in general module: internals Related to internal abstractions in c10 and ATen open source Stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants