disables ZeRO checkpoint loading path when stage=0 #7586
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #7571
When ZeRO is disabled (stage 0) and bf16 is enabled, the current guard sets
load_zero_checkpoint=True, which leads to_load_zero_checkpointand_restore_from_bit16_weights()being called even though no ZeRO state exists.This PR removes the
self.bfloat16_enabled()condition so that load_zero_checkpoint is tied strictly toself.zero_optimization().Stage 0 (BF16/FP16/FP32): cleanly skips ZeRO checkpoint path.
Stage ≥ 1: loads ZeRO partitioned optimizer state as before.
cc @sfc-gh-truwase