Make Muon optimizer easier to enable #7555

delock · 2025-09-11T07:57:54Z

The original Muon optimizer PR (#7509) requires user to explicitly set use_muon flags in model.parameters(), as shown in test https://github.com/deepspeedai/DeepSpeed/blob/master/tests/unit/ops/muon/test_muon.py#L27 .

This PR integrate setting of use_muon into DeepSpeed before engine initialization. This makes Muon optimizer easier to use. User only needs to change optimizer in config.json from AdamW to Muon, no need to change code. It will solve the following issue #7552

Signed-off-by: Ma, Guokai <[email protected]>

PKUWZP · 2025-09-13T03:03:53Z

@delock Can you take care of the unit test and make sure it passes?

delock · 2025-09-13T04:18:05Z

@delock Can you take care of the unit test and make sure it passes?

Yes, let me check UT. Thanks for reminding!

delock · 2025-09-13T06:54:57Z

Looks like nv-mii failed with an internet connection issue and modal-torch-latest failed on master recently. The failure in modal-torch-latest is DeepCompile failure, is it a known failure? @tohtana

tohtana · 2025-09-13T08:02:17Z

Hi @delock
Sorry for the issue. #7558 fixed it.

delock · 2025-09-14T09:13:43Z

Hi @delock Sorry for the issue. #7558 fixed it.

Thanks @tohtana

delock · 2025-09-14T09:16:46Z

Hi @loadams the remaining failure in nv-mii might be a file system failure. Probably no permission or some file missing.

OSError: [Errno 5] Input/output error: '/blob/hf_home/token'

delock · 2025-09-16T02:50:02Z

Hi @sfc-gh-truwase , it looks nv-mii workflow had been failed for a while on master everyday. Should we ignore this workflow and merge this PR? Thanks!

delock · 2025-09-17T12:28:08Z

Hi @sfc-gh-truwase this PR is ready to be merged. @PKUWZP note after this PR, Muon optimizer can be directly used without flag setting.

PKUWZP · 2025-09-17T18:51:48Z

@delock This is awesome, thanks for doing this.

The original Muon optimizer PR (deepspeedai#7509) requires user to explicitly set `use_muon` flags in `model.parameters()`, as shown in test https://github.com/deepspeedai/DeepSpeed/blob/master/tests/unit/ops/muon/test_muon.py#L27 . This PR integrate setting of `use_muon` into DeepSpeed before engine initialization. This makes Muon optimizer easier to use. User only needs to change optimizer in `config.json` from `AdamW` to `Muon`, no need to change code. It will solve the following issue deepspeedai#7552 --------- Signed-off-by: Ma, Guokai <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Masahiro Tanaka <[email protected]>

DongmingShenDS · 2025-10-29T12:34:59Z

Thanks. I have a question though. If we only setup Muon in config.json, how should we specify the LR used for Muon and AdamW? Seems like all params will share a same LR, this can be problematic.

delock added 2 commits September 11, 2025 12:53

auto apply muon flags in model.parameters() if optimizer is muon

4e8a475

Signed-off-by: Ma, Guokai <[email protected]>

Remove set_muon_flag in test code

6592071

Signed-off-by: Ma, Guokai <[email protected]>

delock requested review from loadams, tjruwase and tohtana as code owners September 11, 2025 07:57

delock force-pushed the gma/muon_improv branch from 3439b3a to 6592071 Compare September 11, 2025 07:59

delock changed the title ~~Make Muon optimizer easier to use~~ Make Muon optimizer easier to enable Sep 11, 2025

sfc-gh-truwase approved these changes Sep 12, 2025

View reviewed changes

Merge branch 'master' into gma/muon_improv

ac8c476

Merge branch 'master' into gma/muon_improv

edad01d

Merge branch 'master' into gma/muon_improv

4d99a48

delock added 2 commits September 17, 2025 15:42

Merge branch 'master' into gma/muon_improv

c690eae

Merge branch 'master' into gma/muon_improv

782ab82

delock enabled auto-merge (squash) September 17, 2025 11:38

delock disabled auto-merge September 17, 2025 11:38

sfc-gh-truwase merged commit 2585881 into master Sep 17, 2025
13 of 15 checks passed

sfc-gh-truwase deleted the gma/muon_improv branch September 17, 2025 13:52

DongmingShenDS mentioned this pull request Oct 29, 2025

[REQUEST] Muon Optimizer - Different LR for Different Groups #7657

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make Muon optimizer easier to enable #7555

Make Muon optimizer easier to enable #7555

delock commented Sep 11, 2025 •

edited

Loading

Uh oh!

PKUWZP commented Sep 13, 2025

Uh oh!

delock commented Sep 13, 2025

Uh oh!

delock commented Sep 13, 2025

Uh oh!

tohtana commented Sep 13, 2025

Uh oh!

delock commented Sep 14, 2025

Uh oh!

delock commented Sep 14, 2025

Uh oh!

delock commented Sep 16, 2025

Uh oh!

delock commented Sep 17, 2025

Uh oh!

Uh oh!

PKUWZP commented Sep 17, 2025

Uh oh!

DongmingShenDS commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Make Muon optimizer easier to enable #7555

Make Muon optimizer easier to enable #7555

Conversation

delock commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PKUWZP commented Sep 13, 2025

Uh oh!

delock commented Sep 13, 2025

Uh oh!

delock commented Sep 13, 2025

Uh oh!

tohtana commented Sep 13, 2025

Uh oh!

delock commented Sep 14, 2025

Uh oh!

delock commented Sep 14, 2025

Uh oh!

delock commented Sep 16, 2025

Uh oh!

delock commented Sep 17, 2025

Uh oh!

Uh oh!

PKUWZP commented Sep 17, 2025

Uh oh!

DongmingShenDS commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

delock commented Sep 11, 2025 •

edited

Loading