Don't keep unnecessary saved_inputs alive #16583

apaszke · 2019-01-30T23:27:58Z

This greatly improves memory efficiency of certain ops like Dropout2d. Previously, they were implemented as input * mask where mask never requires_grad, but we didn't use that knowledge in forward, and (in case of a in-place dropout) kept input.clone() for the backward, when it would simply get ignored.

This patch tries to address this situation by emitting some guards for stores like this, but only if they are as simple, as checking if a single value requires_grad.

Interestingly, the same optimizations apply to methods like bmm, baddmm, etc., but not to mm nor addmm, because of how their derivatives are defined. Apparently they unnecessarily use mat1 to compute the derivative of mat1 just to improve the error message in case mat1 was sparse. I'd like to apply this optimization to that case, but I don't want to loose the nicer error message, so if anyone has any ideas for solutions, please let me know...

Full list of operators affected by this patch:

_nnpack_spatial_convolution
addbmm
addcdiv
addcmul
addmv
addr
baddbmm
bmm
cross
div
dot
fmod
ger
index_add_
mul
mv
scatter_add_

This greatly improves memory efficiency of certain ops like Dropout2d. Previously, they were implemented as `input * mask` where mask never requires_grad, but we didn't use that knowledge in forward, and (in case of a in-place dropout) kept input.clone() for the backward, when it would simply get ignored. This patch tries to address this situation by emitting some guards for stores like this, but only if they are as simple, as checking if a single value requires_grad.

ssnl · 2019-01-31T00:18:09Z

I'd like to apply this optimization to that case, but I don't want to loose the nicer error message, so if anyone has any ideas for solutions, please let me know...

Idea: save TensorGeometry(mat1) instead, and augment TensorGeometry to include sparsity if needed.

ssnl · 2019-01-31T00:18:46Z

This looks awesome. A before & after comparison on generated code would be great! :)

apaszke · 2019-01-31T10:48:53Z

Oh yes, I forgot to show an example change. Here's the case of mul_ (the one we use in in-place dropout).

Old code:

ensor & VariableType::mul_(Tensor & self, const Tensor & other) const {
  ...
  if (compute_requires_grad( self, other )) {
    grad_fn = std::shared_ptr<MulBackward0>(new MulBackward0(), deleteFunction);
    grad_fn->set_next_edges(collect_next_edges( self, other ));
    grad_fn->self_ = SavedVariable(self.clone(), false);
    grad_fn->other_ = SavedVariable(other, false);
  }
  ...
}

New code:

Tensor & VariableType::mul_(Tensor & self, const Tensor & other) const {
  ...
  if (compute_requires_grad( self, other )) {
    grad_fn = std::shared_ptr<MulBackward0>(new MulBackward0(), deleteFunction);
    grad_fn->set_next_edges(collect_next_edges( self, other ));
    if (grad_fn->should_compute_output(1)) {
      grad_fn->self_ = SavedVariable(self.clone(), false);
    }
    if (grad_fn->should_compute_output(0)) {
      grad_fn->other_ = SavedVariable(other, false);
    }
  }
  ...
}

apaszke · 2019-01-31T10:49:40Z

TensorGeometry seems like a nice idea, I'll try that in a next patch!

apaszke · 2019-01-31T10:50:43Z

Failures are either timeouts, or are data loader multiprocess tests, which I doubt are affected by this change.

gchanan · 2019-01-31T16:21:14Z

I'm guessing this doesn't also magically fix #15115, but I'm going to check anyway.

gchanan · 2019-01-31T16:28:29Z

oh, it appears it does fix #15115.

gchanan · 2019-01-31T16:31:07Z

test/test_autograd.py

+        # In the end the memory usage should remain equal, because neither of
+        # (x + 2) and ((x + 2) * m) should be kept alive for backward, while the
+        # previous allocation of z had the same size as the current one.
+        self.assertEqual(base_mem, end_mem)


can you add a test of #15115 -- it's probably more robust than this test because it doesn't rely on memory allocations and actually tests something different (i.e. you could solve #15115 by saving but not unpacking, whereas you can't solve the memory issues that way).

Sure, I'll add that test, but I still think this one is robust and tests an important thing. The reason why I took up this patch is because we were using on the order of gigabytes more memory for CNNs that used in-place Dropout2d (which is effectively input * mask) which should never happen!

Also, note that this patch doesn't change the unpacking behavior. So we will still run all the unpacks, except that variables that weren't saved will unpack as undefined tensors (which is fine, because they won't be used anyway).

right, I'm not saying one test or the other is better, I'm saying they test different things and should both be tested.

facebook-github-bot

@gchanan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@gchanan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…puts

facebook-github-bot

@gchanan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

gchanan · 2019-02-01T18:09:10Z

I don't know why there are failing tests.

apaszke · 2019-02-01T20:57:54Z

I don't see any tests failing. Where are those? Can you rerun them? It used to be green.

gchanan · 2019-02-01T21:51:45Z

I see 4 failing tests:

ci/circleci: binary_linux_conda_2.7_cpu_build, ci/circleci: binary_linux_conda_3.6_cu90_build I believe are bogus but should be fixed by Allow USE_NINJA to be toggled by an env variable #16665.
pr/py2-clang7-rocmdeb-ubuntu16.04, pr/py2-devtoolset7-rocmrpm-centos7.5 which seem to timeout and which I haven't seen on other PRs.

…puts

facebook-github-bot

@gchanan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

gchanan · 2019-02-04T17:04:19Z

Trying rebase again.

gchanan · 2019-02-04T19:40:02Z

@apaszke this has consistently timed out on rocm builds, across a few rebases. Can you take a look?

apaszke · 2019-02-04T19:47:15Z

How can I debug this? It adds a bit of code in VariableType.cpp, so if ROCm compiler is extremely slow for some reason, we might start overflowing the time if it was close...

gchanan · 2019-02-05T15:43:37Z

@apaszke sorry I should have said RoCM tests (the error message says build, but it's on a test).

Here is the latest example:

 18:04:13 test_zeros_like (test_sparse.TestCudaUncoalescedSparse) ... Build timed out (after 120 minutes). Marking the build as failed.

apaszke · 2019-02-05T21:49:09Z

Still, do we have instructions that would let me reproduce a ROCm build?

bddppq · 2019-02-09T17:52:04Z

@pytorchbot retest this please

facebook-github-bot

@gchanan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: Fixes pytorch#16577. This greatly improves memory efficiency of certain ops like Dropout2d. Previously, they were implemented as `input * mask` where mask never requires_grad, but we didn't use that knowledge in forward, and (in case of a in-place dropout) kept input.clone() for the backward, when it would simply get ignored. This patch tries to address this situation by emitting some guards for stores like this, but only if they are as simple, as checking if a single value requires_grad. Interestingly, the same optimizations apply to methods like bmm, baddmm, etc., but _not to mm nor addmm_, because of how their derivatives are defined. Apparently they unnecessarily use `mat1` to compute the derivative of `mat1` just to improve the error message in case `mat1` was sparse. I'd like to apply this optimization to that case, but I don't want to loose the nicer error message, so if anyone has any ideas for solutions, please let me know... Full list of operators affected by this patch: * _nnpack_spatial_convolution * addbmm * addcdiv * addcmul * addmv * addr * baddbmm * bmm * cross * div * dot * fmod * ger * index_add_ * mul * mv * scatter_add_ Pull Request resolved: pytorch#16583 Differential Revision: D13900881 Pulled By: gchanan fbshipit-source-id: dd0aeb2ab58c4b6aa95b37b46d3255b3e014291c

Exclude backward functions

43ab835

gchanan reviewed Jan 31, 2019

View reviewed changes

Add an extra test

1ea510d

gchanan approved these changes Jan 31, 2019

View reviewed changes

facebook-github-bot reviewed Jan 31, 2019

View reviewed changes

Merge remote-tracking branch 'origin/master' into HEAD

a67e633

facebook-github-bot reviewed Feb 1, 2019

View reviewed changes

Merge remote-tracking branch 'origin/master' into free_unnecessary_in…

bbb333f

…puts

facebook-github-bot reviewed Feb 1, 2019

View reviewed changes

Merge remote-tracking branch 'origin/master' into free_unnecessary_in…

091c6f3

…puts

facebook-github-bot reviewed Feb 4, 2019

View reviewed changes

facebook-github-bot reviewed Feb 11, 2019

View reviewed changes

facebook-github-bot closed this in 7743ed8 Feb 11, 2019

gchanan mentioned this pull request Feb 12, 2019

Some operations after .copy_() results in error in backward #15115

Closed

ezyang added open source merged labels Jun 24, 2019

Don't keep unnecessary saved_inputs alive #16583

Don't keep unnecessary saved_inputs alive #16583

Uh oh!

Conversation

apaszke commented Jan 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ssnl commented Jan 31, 2019

Uh oh!

ssnl commented Jan 31, 2019

Uh oh!

apaszke commented Jan 31, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

apaszke commented Jan 31, 2019

Uh oh!

apaszke commented Jan 31, 2019

Uh oh!

gchanan commented Jan 31, 2019

Uh oh!

gchanan commented Jan 31, 2019

Uh oh!

gchanan Jan 31, 2019

Choose a reason for hiding this comment

Uh oh!

apaszke Jan 31, 2019

Choose a reason for hiding this comment

Uh oh!

apaszke Jan 31, 2019

Choose a reason for hiding this comment

Uh oh!

gchanan Jan 31, 2019

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

gchanan commented Feb 1, 2019

Uh oh!

apaszke commented Feb 1, 2019

Uh oh!

gchanan commented Feb 1, 2019

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

gchanan commented Feb 4, 2019

Uh oh!

gchanan commented Feb 4, 2019

Uh oh!

apaszke commented Feb 4, 2019

Uh oh!

gchanan commented Feb 5, 2019

Uh oh!

apaszke commented Feb 5, 2019

Uh oh!

bddppq commented Feb 9, 2019

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

apaszke commented Jan 30, 2019 •

edited

Loading

apaszke commented Jan 31, 2019 •

edited

Loading