grad detach_ only when it has grad_fn in zero_grad call by zhaojuanmao · Pull Request #41283 · pytorch/pytorch

zhaojuanmao · 2020-07-10T22:30:12Z

Stack from ghstack:

grad detach_ only when it has grad_fn in zero_grad call #41283 grad detach_ only when it has grad_fn in zero_grad call

in optimizer.zero_grad(), detach_ is useful to avoid memory leak only when grad has grad_fn, so add check to call grad.detach_ only when the grad has grad_fn in zero_grad() function

Differential Revision: D22487315

in optimizer.zero_grad(), detach_ is useful to avoid memory leak only when grad has grad_fn, so add check to call grad.detach_ only when the grad has grad_fn in zero_grad() function Differential Revision: [D22487315](https://our.internmc.facebook.com/intern/diff/D22487315/) [ghstack-poisoned]

in optimizer.zero_grad(), detach_ is useful to avoid memory leak only when grad has grad_fn, so add check to call grad.detach_ only when the grad has grad_fn in zero_grad() function Differential Revision: [D22487315](https://our.internmc.facebook.com/intern/diff/D22487315/) ghstack-source-id: 107562167 Pull Request resolved: #41283

dr-ci · 2020-07-11T00:05:31Z

💊 CI failures summary and remediations

As of commit c1decbd (more details on the Dr. CI page):

1/2 failures introduced in this PR
1/2 tentatively recognized as flaky ❄️
- Click here to rerun these jobs

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_macos_10_13_py3_test (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

Jul 28 16:06:31 RuntimeError: test_dataloader failed!

Jul 28 16:06:31 Generated XML report: test-reports/python-unittest/TEST-TestNamedTupleDataLoader-20200728155239.xml 
Jul 28 16:06:31 Generated XML report: test-reports/python-unittest/TEST-TestTensorDataset-20200728155239.xml 
Jul 28 16:06:31 Generated XML report: test-reports/python-unittest/TEST-TestCustomPinFn-20200728155239.xml 
Jul 28 16:06:31 Generated XML report: test-reports/python-unittest/TEST-TestSetAffinity-20200728155239.xml 
Jul 28 16:06:31 Generated XML report: test-reports/python-unittest/TEST-TestStringDataLoader-20200728155239.xml 
Jul 28 16:06:31 Traceback (most recent call last): 
Jul 28 16:06:31   File "test/run_test.py", line 744, in <module> 
Jul 28 16:06:31     main() 
Jul 28 16:06:31   File "test/run_test.py", line 733, in main 
Jul 28 16:06:31     raise RuntimeError(err) 
Jul 28 16:06:31 RuntimeError: test_dataloader failed! 
Jul 28 16:06:31 + cleanup 
Jul 28 16:06:31 + retcode=1 
Jul 28 16:06:31 + set +x

❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

binary_windows_libtorch_3_7_cpu_release_build (1/1)

Step: "Checkout code" (full log | diagnosis details | 🔁 rerun) ❄️

Writing SSH key for checkout to id_rsa

Creating .ssh directory
Adding the following entries to known_hosts:
github.com ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ==
bitbucket.org ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAubiN81eDcafrgMeLzaFPsw2kNvEcqTKl/VqLat/MaB33pZy0y3rJZtnqwR2qOOvbwKZYKiEO1O6VqNEBxKvJJelCq0dTXWT5pbO2gDXC6h6QDXCaHo6pOHGPUy+YBaGQRGuSusMEASYiWunYN0vCAI8QaXnWMXNMdFP3jHAJH0eDsoiGnLPBlBp4TNm6rYI74nMzgz3B9IikW4WVK+dc8KZJZWYjAuORU3jc1c/NPskD2ASinf8v3xnfXeukU0sJ5N6m5E8VLjObPEO+mN2t/FZTMZLiFqPWc/ALSqnMnnhwrNi2rbfg/rd/IpL8Le3pSBne8+seeFVBoGqzHM9yXw==

Writing SSH key for checkout to id_rsa

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 13 times.

albanD · 2020-07-13T13:38:11Z

torch/optim/optimizer.py

                if p.grad is not None:
-                    p.grad.detach_()
+                    if p.grad.grad_fn is not None:
+                        p.grad.detach_()


On top of removing .grad_fn, it also removes the .requires_grad flag. I think you want to add a p.grad.requires_grad_(False) even when there is no grad_fn.

in optimizer.zero_grad(), detach_ is useful to avoid memory leak only when grad has grad_fn, so add check to call grad.detach_ only when the grad has grad_fn in zero_grad() function Differential Revision: [D22487315](https://our.internmc.facebook.com/intern/diff/D22487315/) [ghstack-poisoned]

Pull Request resolved: #41283 in optimizer.zero_grad(), detach_ is useful to avoid memory leak only when grad has grad_fn, so add check to call grad.detach_ only when the grad has grad_fn in zero_grad() function ghstack-source-id: 107825131 Differential Revision: [D22487315](https://our.internmc.facebook.com/intern/diff/D22487315/)

in optimizer.zero_grad(), detach_ is useful to avoid memory leak only when grad has grad_fn, so add check to call grad.detach_ only when the grad has grad_fn in zero_grad() function Differential Revision: [D22487315](https://our.internmc.facebook.com/intern/diff/D22487315/) [ghstack-poisoned]

Pull Request resolved: #41283 in optimizer.zero_grad(), detach_ is useful to avoid memory leak only when grad has grad_fn, so add check to call grad.detach_ only when the grad has grad_fn in zero_grad() function ghstack-source-id: 108377840 Differential Revision: [D22487315](https://our.internmc.facebook.com/intern/diff/D22487315/)

Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage. In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we made changes in #41283. Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to keep grad undefined for unused parameters. Differential Revision: [D22707857](https://our.internmc.facebook.com/intern/diff/D22707857/) [ghstack-poisoned]

Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage. In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we made changes in #41283. Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to keep grad undefined for unused parameters. Differential Revision: [D22707857](https://our.internmc.facebook.com/intern/diff/D22707857/) ghstack-source-id: 108384696 Pull Request resolved: #41954

albanD

I feel like this should work fine.
Make sure that all CI is green in case we missed something.

Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage. In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we made changes in #41283. Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to keep grad undefined for unused parameters. Differential Revision: [D22707857](https://our.internmc.facebook.com/intern/diff/D22707857/) [ghstack-poisoned]

Pull Request resolved: #41954 Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage. In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we made changes in #41283. Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to keep grad undefined for unused parameters. ghstack-source-id: 108461312 Differential Revision: [D22707857](https://our.internmc.facebook.com/intern/diff/D22707857/)

in optimizer.zero_grad(), detach_ is useful to avoid memory leak only when grad has grad_fn, so add check to call grad.detach_ only when the grad has grad_fn in zero_grad() function Differential Revision: [D22487315](https://our.internmc.facebook.com/intern/diff/D22487315/) [ghstack-poisoned]

Pull Request resolved: #41283 in optimizer.zero_grad(), detach_ is useful to avoid memory leak only when grad has grad_fn, so add check to call grad.detach_ only when the grad has grad_fn in zero_grad() function ghstack-source-id: 108492744 Differential Revision: [D22487315](https://our.internmc.facebook.com/intern/diff/D22487315/)

Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage. In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we made changes in #41283. Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to keep grad undefined for unused parameters. Differential Revision: [D22707857](https://our.internmc.facebook.com/intern/diff/D22707857/) [ghstack-poisoned]

Pull Request resolved: #41954 Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage. In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we made changes in #41283. Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to keep grad undefined for unused parameters. ghstack-source-id: 108498988 Differential Revision: [D22707857](https://our.internmc.facebook.com/intern/diff/D22707857/)

in optimizer.zero_grad(), detach_ is useful to avoid memory leak only when grad has grad_fn, so add check to call grad.detach_ only when the grad has grad_fn in zero_grad() function Differential Revision: [D22487315](https://our.internmc.facebook.com/intern/diff/D22487315/) [ghstack-poisoned]

Pull Request resolved: #41283 in optimizer.zero_grad(), detach_ is useful to avoid memory leak only when grad has grad_fn, so add check to call grad.detach_ only when the grad has grad_fn in zero_grad() function ghstack-source-id: 108577794 Differential Revision: [D22487315](https://our.internmc.facebook.com/intern/diff/D22487315/)

Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage. In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we made changes in #41283. Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to keep grad undefined for unused parameters. Differential Revision: [D22707857](https://our.internmc.facebook.com/intern/diff/D22707857/) [ghstack-poisoned]

Pull Request resolved: #41954 Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage. In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we made changes in #41283. Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to keep grad undefined for unused parameters. ghstack-source-id: 108579189 Differential Revision: [D22707857](https://our.internmc.facebook.com/intern/diff/D22707857/)

zhaojuanmao · 2020-07-28T22:08:37Z

CI is green, landing

…emory usage" reland #41954 Add one argument in DDP API to enable/disable letting grads pointing to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage. In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we made changes in #41283. Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to keep grad undefined for unused parameters. Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/) [ghstack-poisoned]

Pull Request resolved: #44344 reland #41954 Add one argument in DDP API to enable/disable letting grads pointing to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage. In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we made changes in #41283. Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to keep grad undefined for unused parameters. ghstack-source-id: 111631800 Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/)

…emory usage" reland #41954 Add one argument in DDP API to enable/disable letting grads pointing to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage. In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we made changes in #41283. Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to keep grad undefined for unused parameters. Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/) [ghstack-poisoned]

Pull Request resolved: #44344 reland #41954 Add one argument in DDP API to enable/disable letting grads pointing to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage. In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we made changes in #41283. Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to keep grad undefined for unused parameters. ghstack-source-id: 111827320 Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/)

…emory usage" reland #41954 Add one argument in DDP API to enable/disable letting grads pointing to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage. In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we made changes in #41283. Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to keep grad undefined for unused parameters. Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/) [ghstack-poisoned]

[test all] Pull Request resolved: #44344 reland #41954 Add one argument in DDP API to enable/disable letting grads pointing to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage. In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we made changes in #41283. Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to keep grad undefined for unused parameters. ghstack-source-id: 112194326 Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23588186/)!

…emory usage" reland #41954 Add one argument in DDP API to enable/disable letting grads pointing to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage. In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we made changes in #41283. Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to keep grad undefined for unused parameters. Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/) [ghstack-poisoned]

[test all] Pull Request resolved: #44344 reland #41954 Add one argument in DDP API to enable/disable letting grads pointing to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage. In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we made changes in #41283. Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to keep grad undefined for unused parameters. ghstack-source-id: 112244977 Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23588186/)!

…in DDP to save memory usage" reland #41954 Add one argument in DDP API to enable/disable letting grads pointing to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage. In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we made changes in #41283. Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to keep grad undefined for unused parameters. Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/) [ghstack-poisoned]

…emory usage" reland #41954 Add one argument in DDP API to enable/disable letting grads pointing to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage. In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we made changes in #41283. Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to keep grad undefined for unused parameters. Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/) [ghstack-poisoned]

…in DDP to save memory usage" reland #41954 Add one argument in DDP API to enable/disable letting grads pointing to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage. In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we made changes in #41283. Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to keep grad undefined for unused parameters. Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/) [ghstack-poisoned]

…emory usage" reland #41954 Add one argument in DDP API to enable/disable letting grads pointing to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage. In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we made changes in #41283. Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to keep grad undefined for unused parameters. Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/) [ghstack-poisoned]

[test all] Pull Request resolved: #44344 reland #41954 Add one argument in DDP API to enable/disable letting grads pointing to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage. In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we made changes in #41283. Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to keep grad undefined for unused parameters. ghstack-source-id: 112705673 Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23588186/)!

…in DDP to save memory usage" reland #41954 Add one argument in DDP API to enable/disable letting grads pointing to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage. In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we made changes in #41283. Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to keep grad undefined for unused parameters. Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/) [ghstack-poisoned]

…emory usage" reland #41954 Add one argument in DDP API to enable/disable letting grads pointing to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage. In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we made changes in #41283. Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to keep grad undefined for unused parameters. Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/) [ghstack-poisoned]

[test all] Pull Request resolved: #44344 reland #41954 Add one argument in DDP API to enable/disable letting grads pointing to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage. In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we made changes in #41283. Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to keep grad undefined for unused parameters. ghstack-source-id: 112730565 Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23588186/)!

…in DDP to save memory usage" reland #41954 Add one argument in DDP API to enable/disable letting grads pointing to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage. In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we made changes in #41283. Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to keep grad undefined for unused parameters. Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/) [ghstack-poisoned]

…emory usage" reland #41954 Add one argument in DDP API to enable/disable letting grads pointing to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage. In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we made changes in #41283. Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to keep grad undefined for unused parameters. Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/) [ghstack-poisoned]

[test all] Pull Request resolved: #44344 reland #41954 Add one argument in DDP API to enable/disable letting grads pointing to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage. In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we made changes in #41283. Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to keep grad undefined for unused parameters. ghstack-source-id: 112760412 Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23588186/)!

…in DDP to save memory usage" reland #41954 Add one argument in DDP API to enable/disable letting grads pointing to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage. In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we made changes in #41283. Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to keep grad undefined for unused parameters. Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/) [ghstack-poisoned]

…emory usage" reland #41954 Add one argument in DDP API to enable/disable letting grads pointing to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage. In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we made changes in #41283. Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to keep grad undefined for unused parameters. Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/) [ghstack-poisoned]

zhaojuanmao requested review from albanD, ezyang, mrshenli and pritamdamania87 July 10, 2020 22:34

albanD reviewed Jul 13, 2020

View reviewed changes

ezyang removed their request for review July 14, 2020 01:00

zhaojuanmao requested a review from albanD July 23, 2020 22:23

zhaojuanmao mentioned this pull request Jul 23, 2020

Make grad point to bucket buffer in DDP to save memory usage #41954

Closed

albanD approved these changes Jul 24, 2020

View reviewed changes

zhaojuanmao requested a review from apaszke as a code owner July 24, 2020 23:50

mcarilli mentioned this pull request Oct 5, 2020

Add zero_grad(set_to_none=True) #42754

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

grad detach_ only when it has grad_fn in zero_grad call#41283

grad detach_ only when it has grad_fn in zero_grad call#41283
zhaojuanmao wants to merge 6 commits intogh/zhaojuanmao/48/basefrom
gh/zhaojuanmao/48/head

zhaojuanmao commented Jul 10, 2020 •

edited

Loading

Uh oh!

dr-ci bot commented Jul 11, 2020 •

edited

Loading

Uh oh!

albanD Jul 13, 2020

Uh oh!

albanD left a comment

Uh oh!

zhaojuanmao commented Jul 28, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

zhaojuanmao commented Jul 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dr-ci bot commented Jul 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

pytorch_macos_10_13_py3_test (1/1)

❄️ 1 failure tentatively classified as flaky

binary_windows_libtorch_3_7_cpu_release_build (1/1)

Uh oh!

albanD Jul 13, 2020

Choose a reason for hiding this comment

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

zhaojuanmao commented Jul 28, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zhaojuanmao commented Jul 10, 2020 •

edited

Loading

dr-ci bot commented Jul 11, 2020 •

edited

Loading