Skip to content

grad detach_ only when it has grad_fn in zero_grad call#41283

Closed
zhaojuanmao wants to merge 6 commits intogh/zhaojuanmao/48/basefrom
gh/zhaojuanmao/48/head
Closed

grad detach_ only when it has grad_fn in zero_grad call#41283
zhaojuanmao wants to merge 6 commits intogh/zhaojuanmao/48/basefrom
gh/zhaojuanmao/48/head

Conversation

@zhaojuanmao
Copy link
Copy Markdown
Contributor

@zhaojuanmao zhaojuanmao commented Jul 10, 2020

Stack from ghstack:

in optimizer.zero_grad(), detach_ is useful to avoid memory leak only when grad has grad_fn, so add check to call grad.detach_ only when the grad has grad_fn in zero_grad() function

Differential Revision: D22487315

in optimizer.zero_grad(), detach_ is useful to avoid memory leak only when grad has grad_fn, so add check to call grad.detach_ only when the grad has grad_fn in zero_grad() function

Differential Revision: [D22487315](https://our.internmc.facebook.com/intern/diff/D22487315/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Jul 10, 2020
in optimizer.zero_grad(), detach_ is useful to avoid memory leak only when grad has grad_fn, so add check to call grad.detach_ only when the grad has grad_fn in zero_grad() function

Differential Revision: [D22487315](https://our.internmc.facebook.com/intern/diff/D22487315/)

ghstack-source-id: 107562167
Pull Request resolved: #41283
@dr-ci
Copy link
Copy Markdown

dr-ci bot commented Jul 11, 2020

💊 CI failures summary and remediations

As of commit c1decbd (more details on the Dr. CI page):



🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_macos_10_13_py3_test (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

Jul 28 16:06:31 RuntimeError: test_dataloader failed!
Jul 28 16:06:31 Generated XML report: test-reports/python-unittest/TEST-TestNamedTupleDataLoader-20200728155239.xml 
Jul 28 16:06:31 Generated XML report: test-reports/python-unittest/TEST-TestTensorDataset-20200728155239.xml 
Jul 28 16:06:31 Generated XML report: test-reports/python-unittest/TEST-TestCustomPinFn-20200728155239.xml 
Jul 28 16:06:31 Generated XML report: test-reports/python-unittest/TEST-TestSetAffinity-20200728155239.xml 
Jul 28 16:06:31 Generated XML report: test-reports/python-unittest/TEST-TestStringDataLoader-20200728155239.xml 
Jul 28 16:06:31 Traceback (most recent call last): 
Jul 28 16:06:31   File "test/run_test.py", line 744, in <module> 
Jul 28 16:06:31     main() 
Jul 28 16:06:31   File "test/run_test.py", line 733, in main 
Jul 28 16:06:31     raise RuntimeError(err) 
Jul 28 16:06:31 RuntimeError: test_dataloader failed! 
Jul 28 16:06:31 + cleanup 
Jul 28 16:06:31 + retcode=1 
Jul 28 16:06:31 + set +x 

❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

See CircleCI build binary_windows_libtorch_3_7_cpu_release_build (1/1)

Step: "Checkout code" (full log | diagnosis details | 🔁 rerun) ❄️

Writing SSH key for checkout to id_rsa
Creating .ssh directory
Adding the following entries to known_hosts:
github.com ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ==
bitbucket.org ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAubiN81eDcafrgMeLzaFPsw2kNvEcqTKl/VqLat/MaB33pZy0y3rJZtnqwR2qOOvbwKZYKiEO1O6VqNEBxKvJJelCq0dTXWT5pbO2gDXC6h6QDXCaHo6pOHGPUy+YBaGQRGuSusMEASYiWunYN0vCAI8QaXnWMXNMdFP3jHAJH0eDsoiGnLPBlBp4TNm6rYI74nMzgz3B9IikW4WVK+dc8KZJZWYjAuORU3jc1c/NPskD2ASinf8v3xnfXeukU0sJ5N6m5E8VLjObPEO+mN2t/FZTMZLiFqPWc/ALSqnMnnhwrNi2rbfg/rd/IpL8Le3pSBne8+seeFVBoGqzHM9yXw==

Writing SSH key for checkout to id_rsa

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 13 times.

if p.grad is not None:
p.grad.detach_()
if p.grad.grad_fn is not None:
p.grad.detach_()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On top of removing .grad_fn, it also removes the .requires_grad flag. I think you want to add a p.grad.requires_grad_(False) even when there is no grad_fn.

@ezyang ezyang removed their request for review July 14, 2020 01:00
in optimizer.zero_grad(), detach_ is useful to avoid memory leak only when grad has grad_fn, so add check to call grad.detach_ only when the grad has grad_fn in zero_grad() function

Differential Revision: [D22487315](https://our.internmc.facebook.com/intern/diff/D22487315/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Jul 15, 2020
Pull Request resolved: #41283

in optimizer.zero_grad(), detach_ is useful to avoid memory leak only when grad has grad_fn, so add check to call grad.detach_ only when the grad has grad_fn in zero_grad() function
ghstack-source-id: 107825131

Differential Revision: [D22487315](https://our.internmc.facebook.com/intern/diff/D22487315/)
in optimizer.zero_grad(), detach_ is useful to avoid memory leak only when grad has grad_fn, so add check to call grad.detach_ only when the grad has grad_fn in zero_grad() function

Differential Revision: [D22487315](https://our.internmc.facebook.com/intern/diff/D22487315/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Jul 23, 2020
Pull Request resolved: #41283

in optimizer.zero_grad(), detach_ is useful to avoid memory leak only when grad has grad_fn, so add check to call grad.detach_ only when the grad has grad_fn in zero_grad() function
ghstack-source-id: 108377840

Differential Revision: [D22487315](https://our.internmc.facebook.com/intern/diff/D22487315/)
@zhaojuanmao zhaojuanmao requested a review from albanD July 23, 2020 22:23
zhaojuanmao added a commit that referenced this pull request Jul 23, 2020
Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.

Differential Revision: [D22707857](https://our.internmc.facebook.com/intern/diff/D22707857/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Jul 23, 2020
Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.

Differential Revision: [D22707857](https://our.internmc.facebook.com/intern/diff/D22707857/)

ghstack-source-id: 108384696
Pull Request resolved: #41954
Copy link
Copy Markdown
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this should work fine.
Make sure that all CI is green in case we missed something.

zhaojuanmao added a commit that referenced this pull request Jul 24, 2020
Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.

Differential Revision: [D22707857](https://our.internmc.facebook.com/intern/diff/D22707857/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Jul 24, 2020
Pull Request resolved: #41954

Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.
ghstack-source-id: 108461312

Differential Revision: [D22707857](https://our.internmc.facebook.com/intern/diff/D22707857/)
in optimizer.zero_grad(), detach_ is useful to avoid memory leak only when grad has grad_fn, so add check to call grad.detach_ only when the grad has grad_fn in zero_grad() function

Differential Revision: [D22487315](https://our.internmc.facebook.com/intern/diff/D22487315/)

[ghstack-poisoned]
@zhaojuanmao zhaojuanmao requested a review from apaszke as a code owner July 24, 2020 23:50
zhaojuanmao added a commit that referenced this pull request Jul 24, 2020
Pull Request resolved: #41283

in optimizer.zero_grad(), detach_ is useful to avoid memory leak only when grad has grad_fn, so add check to call grad.detach_ only when the grad has grad_fn in zero_grad() function
ghstack-source-id: 108492744

Differential Revision: [D22487315](https://our.internmc.facebook.com/intern/diff/D22487315/)
zhaojuanmao added a commit that referenced this pull request Jul 25, 2020
Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.

Differential Revision: [D22707857](https://our.internmc.facebook.com/intern/diff/D22707857/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Jul 25, 2020
Pull Request resolved: #41954

Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.
ghstack-source-id: 108498988

Differential Revision: [D22707857](https://our.internmc.facebook.com/intern/diff/D22707857/)
in optimizer.zero_grad(), detach_ is useful to avoid memory leak only when grad has grad_fn, so add check to call grad.detach_ only when the grad has grad_fn in zero_grad() function

Differential Revision: [D22487315](https://our.internmc.facebook.com/intern/diff/D22487315/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Jul 27, 2020
Pull Request resolved: #41283

in optimizer.zero_grad(), detach_ is useful to avoid memory leak only when grad has grad_fn, so add check to call grad.detach_ only when the grad has grad_fn in zero_grad() function
ghstack-source-id: 108577794

Differential Revision: [D22487315](https://our.internmc.facebook.com/intern/diff/D22487315/)
zhaojuanmao added a commit that referenced this pull request Jul 27, 2020
Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.

Differential Revision: [D22707857](https://our.internmc.facebook.com/intern/diff/D22707857/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Jul 27, 2020
Pull Request resolved: #41954

Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.
ghstack-source-id: 108579189

Differential Revision: [D22707857](https://our.internmc.facebook.com/intern/diff/D22707857/)
@zhaojuanmao
Copy link
Copy Markdown
Contributor Author

CI is green, landing

zhaojuanmao added a commit that referenced this pull request Sep 8, 2020
…emory usage"

reland #41954

Add one argument in DDP API to enable/disable letting grads pointing  to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.

Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Sep 8, 2020
Pull Request resolved: #44344

reland #41954

Add one argument in DDP API to enable/disable letting grads pointing  to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.
ghstack-source-id: 111631800

Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/)
zhaojuanmao added a commit that referenced this pull request Sep 10, 2020
…emory usage"

reland #41954

Add one argument in DDP API to enable/disable letting grads pointing  to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.

Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Sep 10, 2020
…emory usage"

reland #41954

Add one argument in DDP API to enable/disable letting grads pointing  to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.

Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Sep 10, 2020
…emory usage"

reland #41954

Add one argument in DDP API to enable/disable letting grads pointing  to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.

Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Sep 10, 2020
Pull Request resolved: #44344

reland #41954

Add one argument in DDP API to enable/disable letting grads pointing  to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.
ghstack-source-id: 111827320

Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/)
zhaojuanmao added a commit that referenced this pull request Sep 16, 2020
…emory usage"

reland #41954

Add one argument in DDP API to enable/disable letting grads pointing  to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.

Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Sep 16, 2020
[test all]
Pull Request resolved: #44344

reland #41954

Add one argument in DDP API to enable/disable letting grads pointing  to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.
ghstack-source-id: 112194326

Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23588186/)!
zhaojuanmao added a commit that referenced this pull request Sep 16, 2020
…emory usage"

reland #41954

Add one argument in DDP API to enable/disable letting grads pointing  to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.

Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Sep 17, 2020
…emory usage"

reland #41954

Add one argument in DDP API to enable/disable letting grads pointing  to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.

Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Sep 17, 2020
[test all]
Pull Request resolved: #44344

reland #41954

Add one argument in DDP API to enable/disable letting grads pointing  to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.
ghstack-source-id: 112244977

Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23588186/)!
zhaojuanmao added a commit that referenced this pull request Sep 23, 2020
…in DDP to save memory usage"

reland #41954

Add one argument in DDP API to enable/disable letting grads pointing  to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.

Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Sep 23, 2020
…emory usage"

reland #41954

Add one argument in DDP API to enable/disable letting grads pointing  to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.

Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Sep 23, 2020
…in DDP to save memory usage"

reland #41954

Add one argument in DDP API to enable/disable letting grads pointing  to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.

Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Sep 23, 2020
…emory usage"

reland #41954

Add one argument in DDP API to enable/disable letting grads pointing  to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.

Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Sep 23, 2020
[test all]
Pull Request resolved: #44344

reland #41954

Add one argument in DDP API to enable/disable letting grads pointing  to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.
ghstack-source-id: 112705673

Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23588186/)!
zhaojuanmao added a commit that referenced this pull request Sep 23, 2020
…in DDP to save memory usage"

reland #41954

Add one argument in DDP API to enable/disable letting grads pointing  to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.

Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Sep 23, 2020
…emory usage"

reland #41954

Add one argument in DDP API to enable/disable letting grads pointing  to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.

Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Sep 23, 2020
[test all]
Pull Request resolved: #44344

reland #41954

Add one argument in DDP API to enable/disable letting grads pointing  to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.
ghstack-source-id: 112730565

Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23588186/)!
zhaojuanmao added a commit that referenced this pull request Sep 23, 2020
…in DDP to save memory usage"

reland #41954

Add one argument in DDP API to enable/disable letting grads pointing  to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.

Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Sep 23, 2020
…emory usage"

reland #41954

Add one argument in DDP API to enable/disable letting grads pointing  to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.

Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Sep 23, 2020
[test all]
Pull Request resolved: #44344

reland #41954

Add one argument in DDP API to enable/disable letting grads pointing  to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.
ghstack-source-id: 112760412

Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23588186/)!
zhaojuanmao added a commit that referenced this pull request Sep 24, 2020
…in DDP to save memory usage"

reland #41954

Add one argument in DDP API to enable/disable letting grads pointing  to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.

Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Sep 24, 2020
…emory usage"

reland #41954

Add one argument in DDP API to enable/disable letting grads pointing  to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.

Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/)

[ghstack-poisoned]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants