Skip to content

Conversation

@Flink-ddd
Copy link
Contributor

@Flink-ddd Flink-ddd commented Jun 30, 2025

This PR fixes an omission in the deepspeed.comm API where GradBucket was not exposed, despite the package aiming for full compatibility with torch.distributed.

The Problem

As reported in issue #7393, when a user replaces torch.distributed with deepspeed.comm, they expect all public APIs to be available. However, attempting to access deepspeed.comm.GradBucket (for example, when using it as a type hint for DDP communication hooks) results in an AttributeError.

The Solution

This change resolves the issue by importing GradBucket directly from torch.distributed into the deepspeed/comm/comm.py file, making it part of the public deepspeed.comm namespace.

A # noqa: F401 comment has been added to the import line. This is necessary to bypass the flake8 linter's "imported but unused" check, as the specific purpose of this import is to expose the symbol to the end-user, not for it to be used within the comm.py file itself.

How This Was Tested

The fix was verified with a local test script that confirms deepspeed.comm.GradBucket can now be accessed correctly and is identical to torch.distributed.GradBucket. The pre-commit hooks now pass successfully.

Related run test Screenshout

Screenshot 2025-06-30 at 22 41 10

Related Issue

Fixes #7393

@Flink-ddd Flink-ddd requested a review from GuanhuaWang as a code owner June 30, 2025 14:54
@Flink-ddd Flink-ddd force-pushed the fix/gradbucket-api branch from 03a4aa5 to 52bade2 Compare June 30, 2025 14:57
Copy link
Collaborator

@tohtana tohtana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thank you, @Flink-ddd!

@tohtana tohtana enabled auto-merge (squash) June 30, 2025 18:06
@alexk101
Copy link
Contributor

Looks good. Thanks for the quick fix!

@tohtana tohtana merged commit e6324af into deepspeedai:master Jun 30, 2025
9 checks passed
@Flink-ddd
Copy link
Contributor Author

Hi @tohtana , Awesome! Thank you for the quick review and merge. Happy to contribute. Hi @alexk101 , You're welcome! And thank you for the clear bug report which made it easy to fix.

lpnpcs pushed a commit to lpnpcs/DeepSpeed that referenced this pull request Jul 30, 2025
This PR fixes an omission in the `deepspeed.comm` API where `GradBucket`
was not exposed, despite the package aiming for full compatibility with
`torch.distributed`.

##The Problem

As reported in issue deepspeedai#7393, when a user replaces `torch.distributed`
with `deepspeed.comm`, they expect all public APIs to be available.
However, attempting to access `deepspeed.comm.GradBucket` (for example,
when using it as a type hint for DDP communication hooks) results in an
`AttributeError`.

##The Solution

This change resolves the issue by importing `GradBucket` directly from
`torch.distributed` into the `deepspeed/comm/comm.py` file, making it
part of the public `deepspeed.comm` namespace.

A `# noqa: F401` comment has been added to the import line. This is
necessary to bypass the `flake8` linter's "imported but unused" check,
as the specific purpose of this import is to expose the symbol to the
end-user, not for it to be used within the `comm.py` file itself.

##How This Was Tested

The fix was verified with a local test script that confirms
`deepspeed.comm.GradBucket` can now be accessed correctly and is
identical to `torch.distributed.GradBucket`. The pre-commit hooks now
pass successfully.

##Related run test Screenshout 

<img width="1250" alt="Screenshot 2025-06-30 at 22 41 10"
src="https://github.com/user-attachments/assets/cadf18e1-9d1a-4164-a5ff-0b3e6804ac48"
/>

##Related Issue
Fixes deepspeedai#7393

Signed-off-by: Vensenmu <[email protected]>
Co-authored-by: Masahiro Tanaka <[email protected]>
mauryaavinash95 pushed a commit to DataStates/DeepSpeed that referenced this pull request Oct 4, 2025
This PR fixes an omission in the `deepspeed.comm` API where `GradBucket`
was not exposed, despite the package aiming for full compatibility with
`torch.distributed`.

##The Problem

As reported in issue deepspeedai#7393, when a user replaces `torch.distributed`
with `deepspeed.comm`, they expect all public APIs to be available.
However, attempting to access `deepspeed.comm.GradBucket` (for example,
when using it as a type hint for DDP communication hooks) results in an
`AttributeError`.

##The Solution

This change resolves the issue by importing `GradBucket` directly from
`torch.distributed` into the `deepspeed/comm/comm.py` file, making it
part of the public `deepspeed.comm` namespace.

A `# noqa: F401` comment has been added to the import line. This is
necessary to bypass the `flake8` linter's "imported but unused" check,
as the specific purpose of this import is to expose the symbol to the
end-user, not for it to be used within the `comm.py` file itself.

##How This Was Tested

The fix was verified with a local test script that confirms
`deepspeed.comm.GradBucket` can now be accessed correctly and is
identical to `torch.distributed.GradBucket`. The pre-commit hooks now
pass successfully.

##Related run test Screenshout 

<img width="1250" alt="Screenshot 2025-06-30 at 22 41 10"
src="https://github.com/user-attachments/assets/cadf18e1-9d1a-4164-a5ff-0b3e6804ac48"
/>

##Related Issue
Fixes deepspeedai#7393

Signed-off-by: Vensenmu <[email protected]>
Co-authored-by: Masahiro Tanaka <[email protected]>
@Flink-ddd Flink-ddd changed the title fix(comm): Expose GradBucket in deepspeed.comm API [Bugfix][comm]: Expose GradBucket in deepspeed.comm API Nov 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] DeepSpeed GradBucket

3 participants