Skip to content

added back moe weight conversion#468

Merged
HenryNdubuaku merged 1 commit intomainfrom
karen/moe-converter-fix
Feb 27, 2026
Merged

added back moe weight conversion#468
HenryNdubuaku merged 1 commit intomainfrom
karen/moe-converter-fix

Conversation

@kar-m
Copy link
Copy Markdown
Collaborator

@kar-m kar-m commented Feb 26, 2026

Reproducing the problem on main:
cactus clean
cactus test --model LiquidAI/LFM2-8B-A1B
this will fail to convert the MOE expert weights correctly

on this branch this should run correctly

Copilot AI review requested due to automatic review settings February 26, 2026 23:10
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request restores MOE (Mixture of Experts) weight conversion functionality for the LiquidAI LFM2-8B-A1B model. The code handles the conversion of expert weights where each expert's weights are stored as separate tensors in the model state dict, requiring expansion of the {channel} placeholder pattern to iterate through all expert indices.

Changes:

  • Added special handling for lfm2_moe model type to expand {channel} placeholder patterns across all experts
  • Implements iteration through num_experts to convert individual expert weight tensors (w1, w2, w3)
  • Includes defensive handling for missing experts or invalid configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

if matched_any_channel:
found = True
break

Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the MOE expert pattern handling (lines 332-351), a 'continue' statement should be added to skip the regular pattern matching at line 353, regardless of whether any channels matched. When a pattern starts with 'feed_forward.experts.{channel}.', it should only be processed through the channel expansion logic. Without the continue, if no channels match, the code falls through to line 353 and attempts to look up a tensor name containing the literal '{channel}' placeholder, which will never exist in the state_dict. While this doesn't cause incorrect behavior (the lookup simply fails), it's inefficient and inconsistent with how similar special-case patterns are handled elsewhere in the code.

Suggested change
if model_type_str == 'lfm2_moe' and pattern.startswith('feed_forward.experts.{channel}.'):
continue

Copilot uses AI. Check for mistakes.
Comment on lines +333 to +338
num_channels = int(model_config.get('num_experts', 0))
if num_channels <= 0:
continue

matched_any_channel = False
for channel_idx in range(num_channels):
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable is named 'num_channels' but it's storing the number of experts from 'num_experts' config field. For clarity and consistency with the source configuration field name, this should be named 'num_experts' instead. While 'channel' is used as a generic placeholder in patterns for various multi-instance weights (experts, q/k/v splits, etc.), using the domain-specific term 'num_experts' in the code would make it clearer what this value represents.

Suggested change
num_channels = int(model_config.get('num_experts', 0))
if num_channels <= 0:
continue
matched_any_channel = False
for channel_idx in range(num_channels):
num_experts = int(model_config.get('num_experts', 0))
if num_experts <= 0:
continue
matched_any_channel = False
for channel_idx in range(num_experts):

Copilot uses AI. Check for mistakes.
@HenryNdubuaku HenryNdubuaku merged commit f341f40 into main Feb 27, 2026
5 of 6 checks passed
HenryNdubuaku pushed a commit to cattermelon1234/cactus that referenced this pull request Feb 27, 2026
cattermelon1234 pushed a commit to cattermelon1234/cactus that referenced this pull request Feb 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants