-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Improve leaf module interface (enable via config, relax matching criteria, add document, etc.) #7604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve leaf module interface (enable via config, relax matching criteria, add document, etc.) #7604
Conversation
Signed-off-by: Masahiro Tanaka <[email protected]>
Signed-off-by: Masahiro Tanaka <[email protected]>
Signed-off-by: Masahiro Tanaka <[email protected]>
Signed-off-by: Masahiro Tanaka <[email protected]>
Signed-off-by: Masahiro Tanaka <[email protected]>
…eria, add document, etc.) (deepspeedai#7604) This PR improves the usability of the leaf module feature. Here are the changes: - Allow enabling the leaf module via both the DeepSpeed config and APIs. - Relax matching criteria to support class-based matching. - Support multiple ways of specifying the target module: class, class name (with or without package name), module name, or suffix. - Add documentation to the training guide, including config snippets and explanations of default behavior. - Add default classes (e.g., Mixtral, Qwen2/Qwen3) that automatically enable the leaf module feature. (Welcoming requests to add more classes) --------- Signed-off-by: Masahiro Tanaka <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]>
| "leaf_module": { | ||
| "classes": ["my_package.layers.CustomMoEBlock"], | ||
| "names": ["transformer.layers.0.experts"], | ||
| "name_suffixes": ["experts"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part of the doc is somewhat confusing since it may make the reader believe all 3 entries are needed if they jumped here directly not reading the API section.
Too bad json doesn't allow comments. But perhaps the first paragraph could start with - "While the example shows all 3 leaf_module keys, typically you will probably use just one of these" ?
Update document of leaf module config as suggested [here](#7604 (comment)). Signed-off-by: Masahiro Tanaka <[email protected]>
…eria, add document, etc.) (deepspeedai#7604) This PR improves the usability of the leaf module feature. Here are the changes: - Allow enabling the leaf module via both the DeepSpeed config and APIs. - Relax matching criteria to support class-based matching. - Support multiple ways of specifying the target module: class, class name (with or without package name), module name, or suffix. - Add documentation to the training guide, including config snippets and explanations of default behavior. - Add default classes (e.g., Mixtral, Qwen2/Qwen3) that automatically enable the leaf module feature. (Welcoming requests to add more classes) --------- Signed-off-by: Masahiro Tanaka <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Signed-off-by: Ma, Liangliang <[email protected]>
Update document of leaf module config as suggested [here](deepspeedai#7604 (comment)). Signed-off-by: Masahiro Tanaka <[email protected]> Signed-off-by: Ma, Liangliang <[email protected]>
This PR improves the usability of the leaf module feature.
Here are the changes: