[wip] Speed up permute and expand cases in as_strided_backward #8965

ssnl · 2018-06-27T21:31:07Z

Optimize for permute case: since we sort the strides, we can check if the size & strides match via some permutation (assuming that strides are unique).
Optimize for expand case:
1. expanded input: expanding dim i to size[i] is just multiplying the #times an address is referred to in input by size[i]. So we can just divide gradient by \prod_{expanded dim i} inp_size[i].
2. expanded output: using a single sum over multiple dimensions, rather than doing a sum for each one. (same for squeeze).

I added a test that asserts combinations of permute, (un)squeeze, expand, and transpose won't allocate the large storage-like tensor. Although it tests the internal behavior of as_strided_backward and will need to be modified if further optimizations are done in other cases, I can't find better ways to test that we are actually not copying but only returning a view of the gradient.

facebook-github-bot

@ssnl has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ezyang · 2018-06-27T22:27:23Z

@pytorchbot retest this please

tools/autograd/templates/Functions.cpp

ezyang · 2018-06-28T13:43:57Z

This is a pretty complex patch. Maybe try to get someone to pair review it with you in person.

facebook-github-bot

@ssnl has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ezyang · 2018-11-05T16:49:50Z

You can't claim a speed-up without perf numbers!

tools/autograd/templates/Functions.cpp

zou3519

Test failure looks real

Jun 07 15:04:29 ======================================================================
Jun 07 15:04:29 ERROR: test_as_strided (__main__.TestAutograd)
Jun 07 15:04:29 ----------------------------------------------------------------------
Jun 07 15:04:29 Traceback (most recent call last):
Jun 07 15:04:29   File "test_autograd.py", line 2623, in test_as_strided
Jun 07 15:04:29     test(torch.randn(2, 3), lambda x: x.expand(10, 2, 3), [2, 3], [3, 1], 0)
Jun 07 15:04:29   File "test_autograd.py", line 2593, in test
Jun 07 15:04:29     gradcheck(closure, [x])
Jun 07 15:04:29   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/autograd/gradcheck.py", line 289, in gradcheck
Jun 07 15:04:29     'numerical:%s\nanalytical:%s\n' % (i, j, n, a))
Jun 07 15:04:29   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/autograd/gradcheck.py", line 227, in fail_test
Jun 07 15:04:29     raise RuntimeError(msg)
Jun 07 15:04:29 RuntimeError: Jacobian mismatch for output 0 with respect to input 0,
Jun 07 15:04:29 numerical:tensor([[1.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
Jun 07 15:04:29         [0.0000, 1.0000, 0.0000, 0.0000, 0.0000, 0.0000],
Jun 07 15:04:29         [0.0000, 0.0000, 1.0000, 0.0000, 0.0000, 0.0000],
Jun 07 15:04:29         [0.0000, 0.0000, 0.0000, 1.0000, 0.0000, 0.0000],
Jun 07 15:04:29         [0.0000, 0.0000, 0.0000, 0.0000, 1.0000, 0.0000],
Jun 07 15:04:29         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 1.0000]])
Jun 07 15:04:29 analytical:tensor([[10.,  0.,  0.,  0.,  0.,  0.],
Jun 07 15:04:29         [ 0., 10.,  0.,  0.,  0.,  0.],
Jun 07 15:04:29         [ 0.,  0., 10.,  0.,  0.,  0.],
Jun 07 15:04:29         [ 0.,  0.,  0., 10.,  0.,  0.],
Jun 07 15:04:29         [ 0.,  0.,  0.,  0., 10.,  0.],
Jun 07 15:04:29         [ 0.,  0.,  0.,  0.,  0., 10.]])

ssnl · 2019-06-07T17:38:21Z

This is WIP still. But I plan to finish it when I have time.

wip PR

Summary: Currently, `reshape` does an `as_strided` when the geometry is viewable. However, `as_strided` backward is not very optimized, and can not always detect such cases. Improvements are planned at #8965, and I will finish it some day. But the current situation is that in these cases backward through `reshape` will copy gradient while a simple `view` will not. This is unnecessary. Notably this affects `flatten` and a whole bunch of other ops implemented on top of `reshape`. ```py In [15]: x = torch.randn(3, 4, requires_grad=True) In [16]: y = x.reshape(x.shape) In [17]: assert y._base is not None In [18]: gy = torch.randn_like(y) In [20]: gx = torch.autograd.grad(y, x, gy)[0] In [21]: gx Out[21]: tensor([[ 0.2189, 0.3396, -0.1108, 1.7703], [ 1.0737, -0.1222, 1.0765, -1.3363], [-1.3798, -0.2950, 0.0800, 0.2501]]) In [22]: gx._base # not gy Out[22]: tensor([ 0.2189, 0.3396, -0.1108, 1.7703, 1.0737, -0.1222, 1.0765, -1.3363, -1.3798, -0.2950, 0.0800, 0.2501]) In [23]: gy.zero_() Out[23]: tensor([[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.]]) In [24]: gx # not sharing storage with gy Out[24]: tensor([[ 0.2189, 0.3396, -0.1108, 1.7703], [ 1.0737, -0.1222, 1.0765, -1.3363], [-1.3798, -0.2950, 0.0800, 0.2501]]) # but everything is optimized with view, which should be equivalent with reshape in this case In [25]: y = x.view(x.shape) In [26]: assert y._base is not None In [27]: gy = torch.randn_like(y) In [28]: gx = torch.autograd.grad(y, x, gy)[0] In [29]: gx Out[29]: tensor([[-2.4463, 1.1446, 0.1501, 0.1212], [-1.1125, 1.4661, 0.9092, -0.2153], [-0.1937, -0.3381, -1.3883, -0.7329]]) In [30]: gy.zero_() Out[30]: tensor([[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.]]) In [31]: gx # sharing storage with gy Out[31]: tensor([[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.]]) ``` Pull Request resolved: #28901 Differential Revision: D18240868 Pulled By: ezyang fbshipit-source-id: 28fdaa0c7014a9dae6731dfe8b67784d38fc27f0

Summary: Currently, `reshape` does an `as_strided` when the geometry is viewable. However, `as_strided` backward is not very optimized, and can not always detect such cases. Improvements are planned at pytorch/pytorch#8965, and I will finish it some day. But the current situation is that in these cases backward through `reshape` will copy gradient while a simple `view` will not. This is unnecessary. Notably this affects `flatten` and a whole bunch of other ops implemented on top of `reshape`. ```py In [15]: x = torch.randn(3, 4, requires_grad=True) In [16]: y = x.reshape(x.shape) In [17]: assert y._base is not None In [18]: gy = torch.randn_like(y) In [20]: gx = torch.autograd.grad(y, x, gy)[0] In [21]: gx Out[21]: tensor([[ 0.2189, 0.3396, -0.1108, 1.7703], [ 1.0737, -0.1222, 1.0765, -1.3363], [-1.3798, -0.2950, 0.0800, 0.2501]]) In [22]: gx._base # not gy Out[22]: tensor([ 0.2189, 0.3396, -0.1108, 1.7703, 1.0737, -0.1222, 1.0765, -1.3363, -1.3798, -0.2950, 0.0800, 0.2501]) In [23]: gy.zero_() Out[23]: tensor([[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.]]) In [24]: gx # not sharing storage with gy Out[24]: tensor([[ 0.2189, 0.3396, -0.1108, 1.7703], [ 1.0737, -0.1222, 1.0765, -1.3363], [-1.3798, -0.2950, 0.0800, 0.2501]]) # but everything is optimized with view, which should be equivalent with reshape in this case In [25]: y = x.view(x.shape) In [26]: assert y._base is not None In [27]: gy = torch.randn_like(y) In [28]: gx = torch.autograd.grad(y, x, gy)[0] In [29]: gx Out[29]: tensor([[-2.4463, 1.1446, 0.1501, 0.1212], [-1.1125, 1.4661, 0.9092, -0.2153], [-0.1937, -0.3381, -1.3883, -0.7329]]) In [30]: gy.zero_() Out[30]: tensor([[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.]]) In [31]: gx # sharing storage with gy Out[31]: tensor([[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.]]) ``` Pull Request resolved: pytorch/pytorch#28901 Differential Revision: D18240868 Pulled By: ezyang fbshipit-source-id: 28fdaa0c7014a9dae6731dfe8b67784d38fc27f0

pytorchbot · 2022-04-12T01:34:09Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
Stale pull requests will automatically be closed 30 days after being marked Stale

github-actions · 2022-06-11T02:14:41Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

ssnl requested review from apaszke, colesbury, ezyang, gchanan, soumith and zdevito as code owners June 27, 2018 21:31

facebook-github-bot reviewed Jun 27, 2018

View reviewed changes

ezyang reviewed Jun 28, 2018

View reviewed changes