Deduplicate fp32 weights under torch autocast and ZeRO3 #7651

eternalNight · 2025-10-27T01:40:51Z

When torch autocast is enabled, model weights are already in fp32 and can be directly updated by the optimizer with fp32 gradients. It is a waste of accelerator memory to keep another copy, also in fp32, as the master weight.

Use aliases to the so-called-"fp16" params as the master weights to save memory. It applies only when no optimizer offloading (either CPU or NVMe) or swapping mechanisms is enabled.

Using https://gist.github.com/eternalNight/3c2cf8c703f1e9e7742d3b7f9e1edae3 (which enables torch autocast) as an example, the memory profile of the training startup phase is as follows:

With this PR, the master weights no longer instantiate:

This is also true when DeepCompile is enabled:

When torch autocast is disabled, the master weights are preserved:

When torch autocast is enabled, model weights are already in fp32 and can be directly updated by the optimizer with fp32 gradients. It is a waste of accelerator memory to keep another copy, also in fp32, as the master weight. Use aliases to the so-called-"fp16" params as the master weights to save memory. It applies only when no optimizer offloading (either CPU or NVMe) or swapping mechanisms is enabled. Signed-off-by: Junjie Mao <[email protected]>

tohtana

Thank you @eternalNight! I actually encountered this issue and was wondering how to fix it.
This is definitely a significant improvement.

eternalNight requested review from tjruwase and tohtana as code owners October 27, 2025 01:40

tohtana approved these changes Oct 27, 2025

View reviewed changes

tohtana merged commit 706f6e8 into deepspeedai:master Oct 27, 2025
15 of 17 checks passed

eternalNight mentioned this pull request Oct 28, 2025

[REQUEST] Improve GPU memory utilization with ZeRO3 + deepcompile + torch autocast + activation checkpointing #7577

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deduplicate fp32 weights under torch autocast and ZeRO3 #7651

Deduplicate fp32 weights under torch autocast and ZeRO3 #7651

Uh oh!

eternalNight commented Oct 27, 2025

Uh oh!

tohtana left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Deduplicate fp32 weights under torch autocast and ZeRO3 #7651

Deduplicate fp32 weights under torch autocast and ZeRO3 #7651

Uh oh!

Conversation

eternalNight commented Oct 27, 2025

Uh oh!

tohtana left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants