No Muon optimizer for embeding and lm_head layer #7641

delock · 2025-10-22T07:34:34Z

This PR follow the suggestion in this artical https://kellerjordan.github.io/posts/muon/#empirical-considerations that non-hidden layers ('embedding' and 'lm_head') needs to be excluded from Muon optimizer. It check parameter name for embed and lm_head and not apply 'use_muon' attribute if any of these string present in the name.

Signed-off-by: Guokai Ma <[email protected]>

delock requested review from loadams and tjruwase as code owners October 22, 2025 07:34

delock force-pushed the gma/auto_muon branch from 6d331b3 to a59a452 Compare October 22, 2025 07:35

filter out embed layer and lm_head layer from Muon optimizer

a59a452

Signed-off-by: Guokai Ma <[email protected]>

sfc-gh-truwase approved these changes Oct 22, 2025

View reviewed changes

sfc-gh-truwase merged commit 67b365a into master Oct 22, 2025
12 checks passed

sfc-gh-truwase deleted the gma/auto_muon branch October 22, 2025 14:40

Provide feedback