Skip to content

Conversation

@delock
Copy link
Collaborator

@delock delock commented Sep 11, 2025

The original Muon optimizer PR (#7509) requires user to explicitly set use_muon flags in model.parameters(), as shown in test https://github.com/deepspeedai/DeepSpeed/blob/master/tests/unit/ops/muon/test_muon.py#L27 .

This PR integrate setting of use_muon into DeepSpeed before engine initialization. This makes Muon optimizer easier to use. User only needs to change optimizer in config.json from AdamW to Muon, no need to change code. It will solve the following issue #7552

@delock delock changed the title Make Muon optimizer easier to use Make Muon optimizer easier to enable Sep 11, 2025
@PKUWZP
Copy link
Collaborator

PKUWZP commented Sep 13, 2025

@delock Can you take care of the unit test and make sure it passes?

@delock
Copy link
Collaborator Author

delock commented Sep 13, 2025

@delock Can you take care of the unit test and make sure it passes?

Yes, let me check UT. Thanks for reminding!

@delock
Copy link
Collaborator Author

delock commented Sep 13, 2025

Looks like nv-mii failed with an internet connection issue and modal-torch-latest failed on master recently. The failure in modal-torch-latest is DeepCompile failure, is it a known failure? @tohtana

@tohtana
Copy link
Collaborator

tohtana commented Sep 13, 2025

Hi @delock
Sorry for the issue. #7558 fixed it.

@delock
Copy link
Collaborator Author

delock commented Sep 14, 2025

Hi @delock Sorry for the issue. #7558 fixed it.

Thanks @tohtana

@delock
Copy link
Collaborator Author

delock commented Sep 14, 2025

Hi @loadams the remaining failure in nv-mii might be a file system failure. Probably no permission or some file missing.

OSError: [Errno 5] Input/output error: '/blob/hf_home/token'

@delock
Copy link
Collaborator Author

delock commented Sep 16, 2025

Hi @sfc-gh-truwase , it looks nv-mii workflow had been failed for a while on master everyday. Should we ignore this workflow and merge this PR? Thanks!

@delock delock enabled auto-merge (squash) September 17, 2025 11:38
@delock delock disabled auto-merge September 17, 2025 11:38
@delock
Copy link
Collaborator Author

delock commented Sep 17, 2025

Hi @sfc-gh-truwase this PR is ready to be merged. @PKUWZP note after this PR, Muon optimizer can be directly used without flag setting.

@sfc-gh-truwase sfc-gh-truwase merged commit 2585881 into master Sep 17, 2025
13 of 15 checks passed
@sfc-gh-truwase sfc-gh-truwase deleted the gma/muon_improv branch September 17, 2025 13:52
@PKUWZP
Copy link
Collaborator

PKUWZP commented Sep 17, 2025

@delock This is awesome, thanks for doing this.

mauryaavinash95 pushed a commit to DataStates/DeepSpeed that referenced this pull request Oct 4, 2025
The original Muon optimizer PR
(deepspeedai#7509) requires user to
explicitly set `use_muon` flags in `model.parameters()`, as shown in
test
https://github.com/deepspeedai/DeepSpeed/blob/master/tests/unit/ops/muon/test_muon.py#L27
.

This PR integrate setting of `use_muon` into DeepSpeed before engine
initialization. This makes Muon optimizer easier to use. User only needs
to change optimizer in `config.json` from `AdamW` to `Muon`, no need to
change code. It will solve the following issue
deepspeedai#7552

---------

Signed-off-by: Ma, Guokai <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Co-authored-by: Masahiro Tanaka <[email protected]>
@DongmingShenDS
Copy link

Thanks. I have a question though. If we only setup Muon in config.json, how should we specify the LR used for Muon and AdamW? Seems like all params will share a same LR, this can be problematic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants