Conversation
Signed-off-by: Karen Mosoyan <[email protected]>
There was a problem hiding this comment.
Pull request overview
This pull request restores MOE (Mixture of Experts) weight conversion functionality for the LiquidAI LFM2-8B-A1B model. The code handles the conversion of expert weights where each expert's weights are stored as separate tensors in the model state dict, requiring expansion of the {channel} placeholder pattern to iterate through all expert indices.
Changes:
- Added special handling for
lfm2_moemodel type to expand{channel}placeholder patterns across all experts - Implements iteration through
num_expertsto convert individual expert weight tensors (w1, w2, w3) - Includes defensive handling for missing experts or invalid configuration
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if matched_any_channel: | ||
| found = True | ||
| break | ||
|
|
There was a problem hiding this comment.
After the MOE expert pattern handling (lines 332-351), a 'continue' statement should be added to skip the regular pattern matching at line 353, regardless of whether any channels matched. When a pattern starts with 'feed_forward.experts.{channel}.', it should only be processed through the channel expansion logic. Without the continue, if no channels match, the code falls through to line 353 and attempts to look up a tensor name containing the literal '{channel}' placeholder, which will never exist in the state_dict. While this doesn't cause incorrect behavior (the lookup simply fails), it's inefficient and inconsistent with how similar special-case patterns are handled elsewhere in the code.
| if model_type_str == 'lfm2_moe' and pattern.startswith('feed_forward.experts.{channel}.'): | |
| continue |
| num_channels = int(model_config.get('num_experts', 0)) | ||
| if num_channels <= 0: | ||
| continue | ||
|
|
||
| matched_any_channel = False | ||
| for channel_idx in range(num_channels): |
There was a problem hiding this comment.
The variable is named 'num_channels' but it's storing the number of experts from 'num_experts' config field. For clarity and consistency with the source configuration field name, this should be named 'num_experts' instead. While 'channel' is used as a generic placeholder in patterns for various multi-instance weights (experts, q/k/v splits, etc.), using the domain-specific term 'num_experts' in the code would make it clearer what this value represents.
| num_channels = int(model_config.get('num_experts', 0)) | |
| if num_channels <= 0: | |
| continue | |
| matched_any_channel = False | |
| for channel_idx in range(num_channels): | |
| num_experts = int(model_config.get('num_experts', 0)) | |
| if num_experts <= 0: | |
| continue | |
| matched_any_channel = False | |
| for channel_idx in range(num_experts): |
Signed-off-by: Karen Mosoyan <[email protected]>
Signed-off-by: Karen Mosoyan <[email protected]>
Reproducing the problem on main:
cactus clean
cactus test --model LiquidAI/LFM2-8B-A1B
this will fail to convert the MOE expert weights correctly
on this branch this should run correctly