Skip to content

Conversation

@therealnaveenkamal
Copy link
Contributor

Fixes #7653

The extra-large params were recorded in param.dtype but the reducer looks up using comm_dtype.

if comm_dtype in self.extra_large_param_to_reduce:

cc @sfc-gh-truwase

Copy link
Collaborator

@tohtana tohtana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix!

@tohtana tohtana enabled auto-merge (squash) October 31, 2025 03:35
@tohtana tohtana merged commit 3e64f49 into deepspeedai:master Oct 31, 2025
13 checks passed
rraminen pushed a commit to rraminen/DeepSpeed that referenced this pull request Dec 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] ZeRO-1 RuntimeError when using communication_data_type: bf16

2 participants