-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Make Muon optimizer easier to enable #7555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Ma, Guokai <[email protected]>
Signed-off-by: Ma, Guokai <[email protected]>
3439b3a to
6592071
Compare
|
@delock Can you take care of the unit test and make sure it passes? |
Yes, let me check UT. Thanks for reminding! |
|
Looks like |
|
Hi @loadams the remaining failure in
|
|
Hi @sfc-gh-truwase , it looks |
|
Hi @sfc-gh-truwase this PR is ready to be merged. @PKUWZP note after this PR, Muon optimizer can be directly used without flag setting. |
|
@delock This is awesome, thanks for doing this. |
The original Muon optimizer PR (deepspeedai#7509) requires user to explicitly set `use_muon` flags in `model.parameters()`, as shown in test https://github.com/deepspeedai/DeepSpeed/blob/master/tests/unit/ops/muon/test_muon.py#L27 . This PR integrate setting of `use_muon` into DeepSpeed before engine initialization. This makes Muon optimizer easier to use. User only needs to change optimizer in `config.json` from `AdamW` to `Muon`, no need to change code. It will solve the following issue deepspeedai#7552 --------- Signed-off-by: Ma, Guokai <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Masahiro Tanaka <[email protected]>
|
Thanks. I have a question though. If we only setup Muon in config.json, how should we specify the LR used for Muon and AdamW? Seems like all params will share a same LR, this can be problematic. |
The original Muon optimizer PR (#7509) requires user to explicitly set
use_muonflags inmodel.parameters(), as shown in test https://github.com/deepspeedai/DeepSpeed/blob/master/tests/unit/ops/muon/test_muon.py#L27 .This PR integrate setting of
use_muoninto DeepSpeed before engine initialization. This makes Muon optimizer easier to use. User only needs to change optimizer inconfig.jsonfromAdamWtoMuon, no need to change code. It will solve the following issue #7552