-
Notifications
You must be signed in to change notification settings - Fork 4.7k
[Bugfix][comm]: Expose GradBucket in deepspeed.comm API #7400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Vensenmu <[email protected]>
03a4aa5 to
52bade2
Compare
tohtana
approved these changes
Jun 30, 2025
Collaborator
tohtana
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thank you, @Flink-ddd!
Contributor
|
Looks good. Thanks for the quick fix! |
Contributor
Author
lpnpcs
pushed a commit
to lpnpcs/DeepSpeed
that referenced
this pull request
Jul 30, 2025
This PR fixes an omission in the `deepspeed.comm` API where `GradBucket` was not exposed, despite the package aiming for full compatibility with `torch.distributed`. ##The Problem As reported in issue deepspeedai#7393, when a user replaces `torch.distributed` with `deepspeed.comm`, they expect all public APIs to be available. However, attempting to access `deepspeed.comm.GradBucket` (for example, when using it as a type hint for DDP communication hooks) results in an `AttributeError`. ##The Solution This change resolves the issue by importing `GradBucket` directly from `torch.distributed` into the `deepspeed/comm/comm.py` file, making it part of the public `deepspeed.comm` namespace. A `# noqa: F401` comment has been added to the import line. This is necessary to bypass the `flake8` linter's "imported but unused" check, as the specific purpose of this import is to expose the symbol to the end-user, not for it to be used within the `comm.py` file itself. ##How This Was Tested The fix was verified with a local test script that confirms `deepspeed.comm.GradBucket` can now be accessed correctly and is identical to `torch.distributed.GradBucket`. The pre-commit hooks now pass successfully. ##Related run test Screenshout <img width="1250" alt="Screenshot 2025-06-30 at 22 41 10" src="https://github.com/user-attachments/assets/cadf18e1-9d1a-4164-a5ff-0b3e6804ac48" /> ##Related Issue Fixes deepspeedai#7393 Signed-off-by: Vensenmu <[email protected]> Co-authored-by: Masahiro Tanaka <[email protected]>
mauryaavinash95
pushed a commit
to DataStates/DeepSpeed
that referenced
this pull request
Oct 4, 2025
This PR fixes an omission in the `deepspeed.comm` API where `GradBucket` was not exposed, despite the package aiming for full compatibility with `torch.distributed`. ##The Problem As reported in issue deepspeedai#7393, when a user replaces `torch.distributed` with `deepspeed.comm`, they expect all public APIs to be available. However, attempting to access `deepspeed.comm.GradBucket` (for example, when using it as a type hint for DDP communication hooks) results in an `AttributeError`. ##The Solution This change resolves the issue by importing `GradBucket` directly from `torch.distributed` into the `deepspeed/comm/comm.py` file, making it part of the public `deepspeed.comm` namespace. A `# noqa: F401` comment has been added to the import line. This is necessary to bypass the `flake8` linter's "imported but unused" check, as the specific purpose of this import is to expose the symbol to the end-user, not for it to be used within the `comm.py` file itself. ##How This Was Tested The fix was verified with a local test script that confirms `deepspeed.comm.GradBucket` can now be accessed correctly and is identical to `torch.distributed.GradBucket`. The pre-commit hooks now pass successfully. ##Related run test Screenshout <img width="1250" alt="Screenshot 2025-06-30 at 22 41 10" src="https://github.com/user-attachments/assets/cadf18e1-9d1a-4164-a5ff-0b3e6804ac48" /> ##Related Issue Fixes deepspeedai#7393 Signed-off-by: Vensenmu <[email protected]> Co-authored-by: Masahiro Tanaka <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR fixes an omission in the
deepspeed.commAPI whereGradBucketwas not exposed, despite the package aiming for full compatibility withtorch.distributed.The Problem
As reported in issue #7393, when a user replaces
torch.distributedwithdeepspeed.comm, they expect all public APIs to be available. However, attempting to accessdeepspeed.comm.GradBucket(for example, when using it as a type hint for DDP communication hooks) results in anAttributeError.The Solution
This change resolves the issue by importing
GradBucketdirectly fromtorch.distributedinto thedeepspeed/comm/comm.pyfile, making it part of the publicdeepspeed.commnamespace.A
# noqa: F401comment has been added to the import line. This is necessary to bypass theflake8linter's "imported but unused" check, as the specific purpose of this import is to expose the symbol to the end-user, not for it to be used within thecomm.pyfile itself.How This Was Tested
The fix was verified with a local test script that confirms
deepspeed.comm.GradBucketcan now be accessed correctly and is identical totorch.distributed.GradBucket. The pre-commit hooks now pass successfully.Related run test Screenshout
Related Issue
Fixes #7393