Skip to content

Make grad point to bucket buffer in DDP to save memory usage#41954

Closed
zhaojuanmao wants to merge 8 commits intogh/zhaojuanmao/49/basefrom
gh/zhaojuanmao/49/head
Closed

Make grad point to bucket buffer in DDP to save memory usage#41954
zhaojuanmao wants to merge 8 commits intogh/zhaojuanmao/49/basefrom
gh/zhaojuanmao/49/head

Conversation

@zhaojuanmao
Copy link
Copy Markdown
Contributor

@zhaojuanmao zhaojuanmao commented Jul 23, 2020

Stack from ghstack:

Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.

Test Plans:
For roberta_base model with ~1GB parameters, peak memory dropped ~1GB (8250MB-7183MB). Per iteration latency (0.982s ->0.909s), 8% speed up; will rerun a few times to confirm the speed up
For resnet model with ~97M parameters, peak memory dropped ~100MB (3089MB -> 2988MB). Per iteration latency has no change (0.122s -> 0.123s)

Differential Revision: D22707857

Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.

Differential Revision: [D22707857](https://our.internmc.facebook.com/intern/diff/D22707857/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Jul 23, 2020
Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.

Differential Revision: [D22707857](https://our.internmc.facebook.com/intern/diff/D22707857/)

ghstack-source-id: 108384696
Pull Request resolved: #41954
@dr-ci
Copy link
Copy Markdown

dr-ci bot commented Jul 23, 2020

💊 CI failures summary and remediations

As of commit 2082a5a (more details on the Dr. CI page):



🚧 1 fixed upstream failure:

These were probably caused by upstream breakages that were already fixed.

Please rebase on the viable/strict branch (expand for instructions)

Since your merge base is older than viable/strict, run these commands:

git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD

Check out the recency history of this "viable master" tracking branch.


ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 34 times.

Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.

Differential Revision: [D22707857](https://our.internmc.facebook.com/intern/diff/D22707857/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Jul 24, 2020
Pull Request resolved: #41954

Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.
ghstack-source-id: 108461312

Differential Revision: [D22707857](https://our.internmc.facebook.com/intern/diff/D22707857/)
Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.

Differential Revision: [D22707857](https://our.internmc.facebook.com/intern/diff/D22707857/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Jul 25, 2020
Pull Request resolved: #41954

Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.
ghstack-source-id: 108498988

Differential Revision: [D22707857](https://our.internmc.facebook.com/intern/diff/D22707857/)
Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.

Differential Revision: [D22707857](https://our.internmc.facebook.com/intern/diff/D22707857/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Jul 27, 2020
Pull Request resolved: #41954

Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.
ghstack-source-id: 108579189

Differential Revision: [D22707857](https://our.internmc.facebook.com/intern/diff/D22707857/)
@pritamdamania87
Copy link
Copy Markdown
Contributor

Would be nice if we could also share some memory saving results as part of the PR description.

@rohan-varma rohan-varma self-requested a review July 31, 2020 00:40
Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.

Differential Revision: [D22707857](https://our.internmc.facebook.com/intern/diff/D22707857/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Aug 4, 2020
Pull Request resolved: #41954
Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.

Differential Revision: [D22707857](https://our.internmc.facebook.com/intern/diff/D22707857/)
ghstack-source-id: 109198898
Copy link
Copy Markdown
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good.
I would add a comment here as well:

// However, that accumulation is sometimes in place and sometimes not,
// which may break user code.
as now the DDP relies on the fact that AccumulateGrad always change the .grad implace when it exists (when no double backward) even if the variable and the .grad don't have the same layout!

@zhaojuanmao
Copy link
Copy Markdown
Contributor Author

@albanD thanks for your review. I will add comments in accumulate_grad.h. But be noted, if grads are mutated in place, DDP can save memory; if grads are mutated out of place somehow, although DDP can not save memory, it is still working at it is today (checking grad is alias of bucket buffer or not, if not, copying grad to bucket buffer)

Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.


Test Plans:
For roberta_base model with 1GB parameters, this diff can save 1GB memory.
DDP without this diff, peak allocated memory during training loop is 8250 MB;
DDP with this diff, peak allocated memory during training loop is 7182 MB;

Differential Revision: [D22707857](https://our.internmc.facebook.com/intern/diff/D22707857/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Aug 12, 2020
Pull Request resolved: #41954
Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.
ghstack-source-id: 109704136

Differential Revision: [D22707857](https://our.internmc.facebook.com/intern/diff/D22707857/)
@albanD
Copy link
Copy Markdown
Collaborator

albanD commented Aug 12, 2020

Yes it will still give the right result I agree but it will defeat the purpose of this optimization (silently!). So it might be hard to detect that changes there actually break this optimization. Hence my request to add a comment there to make people aware that this can happen.

Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.


Test Plans:
For roberta_base model with 1GB parameters, this diff can save 1GB memory.
DDP without this diff, peak allocated memory during training loop is 8250 MB;
DDP with this diff, peak allocated memory during training loop is 7182 MB;

Differential Revision: [D22707857](https://our.internmc.facebook.com/intern/diff/D22707857/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Sep 10, 2020
…emory usage"

reland #41954

Add one argument in DDP API to enable/disable letting grads pointing  to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.

Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Sep 10, 2020
Pull Request resolved: #44344

reland #41954

Add one argument in DDP API to enable/disable letting grads pointing  to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.
ghstack-source-id: 111827320

Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/)
zhaojuanmao added a commit that referenced this pull request Sep 14, 2020
…ing of second iteration"

Part of relanding PR #41954, this refactoring is to move rebuild_buckets call from end of first iteration to beginning of second iteration

Differential Revision: [D23583017](https://our.internmc.facebook.com/intern/diff/D23583017/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23583017/)!

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Sep 14, 2020
…nd iteration

Pull Request resolved: #44326

Part of relanding PR #41954, this refactoring is to move rebuild_buckets call from end of first iteration to beginning of second iteration
ghstack-source-id: 112011490

Differential Revision: [D23583017](https://our.internmc.facebook.com/intern/diff/D23583017/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23583017/)!
zhaojuanmao added a commit that referenced this pull request Sep 14, 2020
Part of relanding PR #41954, this refactor is to seperate intialize_bucket_views and populate_bucket_views_out, as they are doing different things and called by different callsites as well

Differential Revision: [D23583347](https://our.internmc.facebook.com/intern/diff/D23583347/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Sep 14, 2020
Pull Request resolved: #44330

Part of relanding PR #41954, this refactor is to seperate intialize_bucket_views and populate_bucket_views_out, as they are doing different things and called by different callsites as well
ghstack-source-id: 112022404

Differential Revision: [D23583347](https://our.internmc.facebook.com/intern/diff/D23583347/)
facebook-github-bot pushed a commit that referenced this pull request Sep 15, 2020
…nd iteration (#44326)

Summary:
Pull Request resolved: #44326

Part of relanding PR #41954, this refactoring is to move rebuild_buckets call from end of first iteration to beginning of second iteration
ghstack-source-id: 112011490

Test Plan: unit tests

Reviewed By: mrshenli

Differential Revision: D23583017

fbshipit-source-id: ef67f79437a820d9b5699b651803622418499a83
zhaojuanmao added a commit that referenced this pull request Sep 16, 2020
Part of relanding PR #41954, this refactor is to seperate intialize_bucket_views and populate_bucket_views_out, as they are doing different things and called by different callsites as well

Differential Revision: [D23583347](https://our.internmc.facebook.com/intern/diff/D23583347/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Sep 16, 2020
Part of relanding PR #41954, this refactor is to seperate intialize_bucket_views and populate_bucket_views_out, as they are doing different things and called by different callsites as well

Differential Revision: [D23583347](https://our.internmc.facebook.com/intern/diff/D23583347/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Sep 16, 2020
[test all]
Pull Request resolved: #44330

Part of relanding PR #41954, this refactor is to seperate intialize_bucket_views and populate_bucket_views_out, as they are doing different things and called by different callsites as well
ghstack-source-id: 112185672

Differential Revision: [D23583347](https://our.internmc.facebook.com/intern/diff/D23583347/)
zhaojuanmao added a commit that referenced this pull request Sep 16, 2020
…g of second iteration

[test all]

Update for relanding: in ddp.join(), moved _rebuild_buckets from end of backward to beginning of forward as well.

Part of relanding PR #41954, this refactoring is to move rebuild_buckets call from end of first iteration to beginning of second iteration
ghstack-source-id: 112011490

Differential Revision: [D23735185](https://our.internmc.facebook.com/intern/diff/D23735185/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23735185/)!

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Sep 16, 2020
…to beginning of second iteration"

[test all]

Update for relanding: in ddp.join(), moved _rebuild_buckets from end of backward to beginning of forward as well.

Part of relanding PR #41954, this refactoring is to move rebuild_buckets call from end of first iteration to beginning of second iteration

Differential Revision: [D23735185](https://our.internmc.facebook.com/intern/diff/D23735185/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23735185/)!

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Sep 16, 2020
…emory usage"

reland #41954

Add one argument in DDP API to enable/disable letting grads pointing  to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.

Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Sep 16, 2020
[test all]
Pull Request resolved: #44344

reland #41954

Add one argument in DDP API to enable/disable letting grads pointing  to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.
ghstack-source-id: 112194326

Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23588186/)!
zhaojuanmao added a commit that referenced this pull request Sep 16, 2020
…emory usage"

reland #41954

Add one argument in DDP API to enable/disable letting grads pointing  to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage.
In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we
made changes in #41283.

Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to
keep grad undefined for unused parameters.

Differential Revision: [D23588186](https://our.internmc.facebook.com/intern/diff/D23588186/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Sep 16, 2020
Part of relanding PR #41954, this refactor is to seperate intialize_bucket_views and populate_bucket_views_out, as they are doing different things and called by different callsites as well

Differential Revision: [D23583347](https://our.internmc.facebook.com/intern/diff/D23583347/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Sep 16, 2020
Part of relanding PR #41954, this refactor is to seperate intialize_bucket_views and populate_bucket_views_out, as they are doing different things and called by different callsites as well

Differential Revision: [D23583347](https://our.internmc.facebook.com/intern/diff/D23583347/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Sep 16, 2020
…to beginning of second iteration"

[test all]

Update for relanding: in ddp.join(), moved _rebuild_buckets from end of backward to beginning of forward as well.

Part of relanding PR #41954, this refactoring is to move rebuild_buckets call from end of first iteration to beginning of second iteration

Differential Revision: [D23735185](https://our.internmc.facebook.com/intern/diff/D23735185/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23735185/)!

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Sep 17, 2020
Part of relanding PR #41954, this refactor is to seperate intialize_bucket_views and populate_bucket_views_out, as they are doing different things and called by different callsites as well

Differential Revision: [D23583347](https://our.internmc.facebook.com/intern/diff/D23583347/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Sep 17, 2020
Part of relanding PR #41954, this refactor is to seperate intialize_bucket_views and populate_bucket_views_out, as they are doing different things and called by different callsites as well

Differential Revision: [D23583347](https://our.internmc.facebook.com/intern/diff/D23583347/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Sep 17, 2020
[test all]
Pull Request resolved: #44330

Part of relanding PR #41954, this refactor is to seperate intialize_bucket_views and populate_bucket_views_out, as they are doing different things and called by different callsites as well
ghstack-source-id: 112243783

Differential Revision: [D23583347](https://our.internmc.facebook.com/intern/diff/D23583347/)
zhaojuanmao added a commit that referenced this pull request Sep 17, 2020
…to beginning of second iteration"

[test all]

Update for relanding: in ddp.join(), moved _rebuild_buckets from end of backward to beginning of forward as well.

Part of relanding PR #41954, this refactoring is to move rebuild_buckets call from end of first iteration to beginning of second iteration

Differential Revision: [D23735185](https://our.internmc.facebook.com/intern/diff/D23735185/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23735185/)!

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Sep 17, 2020
…g of second iteration

Pull Request resolved: #44798

[test all]

Update for relanding: in ddp.join(), moved _rebuild_buckets from end of backward to beginning of forward as well.

Part of relanding PR #41954, this refactoring is to move rebuild_buckets call from end of first iteration to beginning of second iteration
ghstack-source-id: 112244222
ghstack-source-id: 112244222

Differential Revision: [D23735185](https://our.internmc.facebook.com/intern/diff/D23735185/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23735185/)!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants