Fix misplaced overflow handling return in fused_optimizer.py #7645

rraminen · 2025-10-23T10:10:50Z

This PR fixes an issue in deepspeed/runtime/fp16/fused_optimizer.py where the gradient overflow handling logic incorrectly exited the function too early, resulting in wrong forward pass and loss calculations in certain FP16 training scenarios.

The return self.overflow and self.timers.log(OVERFLOW_TIMERS) calls are now correctly moved inside the if self.overflow: block so that the function only returns early when an actual overflow is detected.

Origin of the error: 889f0ea

cc: @jithunnair-amd

eternalNight · 2025-10-24T03:16:45Z

Thanks! This should fix #7632.

rraminen · 2025-10-28T17:17:17Z

Hi @tjruwase, could you please help in reviewing this PR?

rraminen · 2025-10-30T20:16:03Z

Hi @tohtana, could you please help in reviewing this PR?

tohtana

@rraminen Sorry for my late response! Approved it.

@jithunnair-amd

…edai#7645) This PR fixes an issue in deepspeed/runtime/fp16/fused_optimizer.py where the gradient overflow handling logic incorrectly exited the function too early, resulting in wrong forward pass and loss calculations in certain FP16 training scenarios. The `return self.overflow` and `self.timers.log(OVERFLOW_TIMERS)` calls are now correctly moved inside the `if self.overflow:` block so that the function only returns early when an actual overflow is detected. Origin of the error: deepspeedai@889f0ea cc: @jithunnair-amd Co-authored-by: Olatunji Ruwase <[email protected]> Signed-off-by: rraminen <[email protected]>

Fix test_fp8 test for fp16 case

b27b242

rraminen requested review from tjruwase and tohtana as code owners October 23, 2025 10:10

tjruwase approved these changes Oct 28, 2025

View reviewed changes

Merge branch 'master' into fp8_UT_fix

38d4afa

tohtana approved these changes Oct 31, 2025

View reviewed changes

tohtana enabled auto-merge (squash) October 31, 2025 01:40

Merge branch 'master' into fp8_UT_fix

3496ce2

tohtana merged commit d56e847 into deepspeedai:master Oct 31, 2025
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix misplaced overflow handling return in fused_optimizer.py #7645

Fix misplaced overflow handling return in fused_optimizer.py #7645

Uh oh!

rraminen commented Oct 23, 2025 •

edited

Loading

Uh oh!

eternalNight commented Oct 24, 2025

Uh oh!

rraminen commented Oct 28, 2025

Uh oh!

rraminen commented Oct 30, 2025

Uh oh!

tohtana left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix misplaced overflow handling return in fused_optimizer.py #7645

Fix misplaced overflow handling return in fused_optimizer.py #7645

Uh oh!

Conversation

rraminen commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eternalNight commented Oct 24, 2025

Uh oh!

rraminen commented Oct 28, 2025

Uh oh!

rraminen commented Oct 30, 2025

Uh oh!

tohtana left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rraminen commented Oct 23, 2025 •

edited

Loading