-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[DDP] Allow await of custom buffer reduction in backward #64515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
For performance reasons, we would like to ensure that we can await user collectives as part of custom buffer reduction in parallel to other work. As a result, add support to return futures from custom buffer hooks and await those futures at end of backwards pass. Also added some docs to clarify how to use these APIs. Differential Revision: [D30757761](https://our.internmc.facebook.com/intern/diff/D30757761/) [ghstack-poisoned]
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 9a08564 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
For performance reasons, we would like to ensure that we can await user collectives as part of custom buffer reduction in parallel to other work. As a result, add support to return futures from custom buffer hooks and await those futures at end of backwards pass. Also added some docs to clarify how to use these APIs. Differential Revision: [D30757761](https://our.internmc.facebook.com/intern/diff/D30757761/) ghstack-source-id: 137398451 Pull Request resolved: #64515
For performance reasons, we would like to ensure that we can await user collectives as part of custom buffer reduction in parallel to other work. As a result, add support to return futures from custom buffer hooks and await those futures at end of backwards pass. Also added some docs to clarify how to use these APIs. Differential Revision: [D30757761](https://our.internmc.facebook.com/intern/diff/D30757761/) cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski @SciPioneer @H-Huang mrzzd cbalioglu gcramer23 [ghstack-poisoned]
Pull Request resolved: #64515 For performance reasons, we would like to ensure that we can await user collectives as part of custom buffer reduction in parallel to other work. As a result, add support to return futures from custom buffer hooks and await those futures at end of backwards pass. Also added some docs to clarify how to use these APIs. ghstack-source-id: 137526311 Differential Revision: [D30757761](https://our.internmc.facebook.com/intern/diff/D30757761/)
torch/csrc/distributed/c10d/init.cpp
Outdated
| }, | ||
| py::call_guard<py::gil_scoped_release>()) | ||
| .def("get_backward_stats", &::c10d::Reducer::get_backward_stats) | ||
| .def("_install_futures", [](::c10d::Reducer& reducer, const std::vector<std::shared_ptr<jit::PythonFutureWrapper>>& futs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: _install_post_backward_futures
| } | ||
|
|
||
| void Reducer::install_futures(c10::List<c10::intrusive_ptr<c10::ivalue::Future>> futs) { | ||
| installed_futures_ = std::move(futs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we append futs here instead of moving? it is possible that install_futures() was called multiple times and we need to wait all these installed futures?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea!
| # Note that when custom hooks return futures, this comparison is not expected to work when hook run location is pre-forward pass. This is because the hook does async communication | ||
| # and forward pass modifies the buffer without appropriate synchronization. Therefore, if returning futures from custom buffer hooks, it is advised to set hook run location to post | ||
| # forward. | ||
| if return_futures and hook_run_location == hook_post_fwd: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hook_run_location does not have to be 'hook_post_fwd'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It actually does, if you see the comments on L8017. Basically if we run the hook before forward and return futures, then the user must perform appropriate synchronization in their forward pass for buffers to be equal. This is because the forward pass would use buffers without appropriate synchronization with the NCCL stream if we are returning futures.
For the test we can manually change it to synchronize, but I think the best bet here is to document this and let users know they must explicitly synchronize if they want to use return_futures and hook_run_location == hook_pre_fwd (this is already documented on 1289).
For performance reasons, we would like to ensure that we can await user collectives as part of custom buffer reduction in parallel to other work. As a result, add support to return futures from custom buffer hooks and await those futures at end of backwards pass. Also added some docs to clarify how to use these APIs. Differential Revision: [D30757761](https://our.internmc.facebook.com/intern/diff/D30757761/) cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski @SciPioneer @H-Huang mrzzd cbalioglu gcramer23 [ghstack-poisoned]
Pull Request resolved: #64515 For performance reasons, we would like to ensure that we can await user collectives as part of custom buffer reduction in parallel to other work. As a result, add support to return futures from custom buffer hooks and await those futures at end of backwards pass. Also added some docs to clarify how to use these APIs. ghstack-source-id: 138095560 Differential Revision: [D30757761](https://our.internmc.facebook.com/intern/diff/D30757761/)
For performance reasons, we would like to ensure that we can await user collectives as part of custom buffer reduction in parallel to other work. As a result, add support to return futures from custom buffer hooks and await those futures at end of backwards pass. Also added some docs to clarify how to use these APIs. Differential Revision: [D30757761](https://our.internmc.facebook.com/intern/diff/D30757761/) cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski @SciPioneer @H-Huang mrzzd cbalioglu gcramer23 [ghstack-poisoned]
Pull Request resolved: #64515 For performance reasons, we would like to ensure that we can await user collectives as part of custom buffer reduction in parallel to other work. As a result, add support to return futures from custom buffer hooks and await those futures at end of backwards pass. Also added some docs to clarify how to use these APIs. ghstack-source-id: 138326912 Differential Revision: [D30757761](https://our.internmc.facebook.com/intern/diff/D30757761/)
For performance reasons, we would like to ensure that we can await user collectives as part of custom buffer reduction in parallel to other work. As a result, add support to return futures from custom buffer hooks and await those futures at end of backwards pass. Also added some docs to clarify how to use these APIs. Differential Revision: [D30757761](https://our.internmc.facebook.com/intern/diff/D30757761/) cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski @SciPioneer @H-Huang mrzzd cbalioglu gcramer23 [ghstack-poisoned]
For performance reasons, we would like to ensure that we can await user collectives as part of custom buffer reduction in parallel to other work. As a result, add support to return futures from custom buffer hooks and await those futures at end of backwards pass. Also added some docs to clarify how to use these APIs. Differential Revision: [D30757761](https://our.internmc.facebook.com/intern/diff/D30757761/) cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski @SciPioneer @H-Huang mrzzd cbalioglu gcramer23 [ghstack-poisoned]
Pull Request resolved: #64515 For performance reasons, we would like to ensure that we can await user collectives as part of custom buffer reduction in parallel to other work. As a result, add support to return futures from custom buffer hooks and await those futures at end of backwards pass. Also added some docs to clarify how to use these APIs. ghstack-source-id: 138406398 Differential Revision: [D30757761](https://our.internmc.facebook.com/intern/diff/D30757761/)
For performance reasons, we would like to ensure that we can await user collectives as part of custom buffer reduction in parallel to other work. As a result, add support to return futures from custom buffer hooks and await those futures at end of backwards pass. Also added some docs to clarify how to use these APIs. Differential Revision: [D30757761](https://our.internmc.facebook.com/intern/diff/D30757761/) cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski @SciPioneer @H-Huang mrzzd cbalioglu gcramer23 [ghstack-poisoned]
Pull Request resolved: #64515 For performance reasons, we would like to ensure that we can await user collectives as part of custom buffer reduction in parallel to other work. As a result, add support to return futures from custom buffer hooks and await those futures at end of backwards pass. Also added some docs to clarify how to use these APIs. ghstack-source-id: 138564329 Differential Revision: [D30757761](https://our.internmc.facebook.com/intern/diff/D30757761/)
For performance reasons, we would like to ensure that we can await user collectives as part of custom buffer reduction in parallel to other work. As a result, add support to return futures from custom buffer hooks and await those futures at end of backwards pass. Also added some docs to clarify how to use these APIs. Differential Revision: [D30757761](https://our.internmc.facebook.com/intern/diff/D30757761/) cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski @SciPioneer @H-Huang mrzzd cbalioglu gcramer23 [ghstack-poisoned]
Pull Request resolved: #64515 For performance reasons, we would like to ensure that we can await user collectives as part of custom buffer reduction in parallel to other work. As a result, add support to return futures from custom buffer hooks and await those futures at end of backwards pass. Also added some docs to clarify how to use these APIs. ghstack-source-id: 138655662 Differential Revision: [D30757761](https://our.internmc.facebook.com/intern/diff/D30757761/)
For performance reasons, we would like to ensure that we can await user collectives as part of custom buffer reduction in parallel to other work. As a result, add support to return futures from custom buffer hooks and await those futures at end of backwards pass. Also added some docs to clarify how to use these APIs. Differential Revision: [D30757761](https://our.internmc.facebook.com/intern/diff/D30757761/) cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski @SciPioneer @H-Huang mrzzd cbalioglu gcramer23 [ghstack-poisoned]
For performance reasons, we would like to ensure that we can await user collectives as part of custom buffer reduction in parallel to other work. As a result, add support to return futures from custom buffer hooks and await those futures at end of backwards pass. Also added some docs to clarify how to use these APIs. Differential Revision: [D30757761](https://our.internmc.facebook.com/intern/diff/D30757761/) cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski @SciPioneer @H-Huang mrzzd cbalioglu gcramer23 [ghstack-poisoned]
Pull Request resolved: #64515 For performance reasons, we would like to ensure that we can await user collectives as part of custom buffer reduction in parallel to other work. As a result, add support to return futures from custom buffer hooks and await those futures at end of backwards pass. Also added some docs to clarify how to use these APIs. ghstack-source-id: 138707331 Differential Revision: [D30757761](https://our.internmc.facebook.com/intern/diff/D30757761/)
For performance reasons, we would like to ensure that we can await user collectives as part of custom buffer reduction in parallel to other work. As a result, add support to return futures from custom buffer hooks and await those futures at end of backwards pass. Also added some docs to clarify how to use these APIs. Differential Revision: [D30757761](https://our.internmc.facebook.com/intern/diff/D30757761/) cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski @SciPioneer @H-Huang mrzzd cbalioglu gcramer23 [ghstack-poisoned]
For performance reasons, we would like to ensure that we can await user collectives as part of custom buffer reduction in parallel to other work. As a result, add support to return futures from custom buffer hooks and await those futures at end of backwards pass. Also added some docs to clarify how to use these APIs. Differential Revision: [D30757761](https://our.internmc.facebook.com/intern/diff/D30757761/) cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski @SciPioneer @H-Huang mrzzd cbalioglu gcramer23 [ghstack-poisoned]
Pull Request resolved: #64515 For performance reasons, we would like to ensure that we can await user collectives as part of custom buffer reduction in parallel to other work. As a result, add support to return futures from custom buffer hooks and await those futures at end of backwards pass. Also added some docs to clarify how to use these APIs. ghstack-source-id: 138793803 Differential Revision: [D30757761](https://our.internmc.facebook.com/intern/diff/D30757761/)
|
This pull request has been merged in 7170434. |
Stack from ghstack:
For performance reasons, we would like to ensure that we can await
user collectives as part of custom buffer reduction in parallel to other work.
As a result, add support to return futures from custom buffer hooks and await
those futures at end of backwards pass.
Also added some docs to clarify how to use these APIs.
Differential Revision: D30757761
cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @agolynski @SciPioneer @H-Huang @mrzzd @cbalioglu @gcramer23