added moe support for lfm by kar-m · Pull Request #374 · cactus-compute/cactus

kar-m · 2026-02-20T07:58:28Z

No description provided.

Signed-off-by: Karen Mosoyan <[email protected]>

Copilot

Pull request overview

This PR adds Mixture of Experts (MoE) support for the LFM2 architecture, enabling the inference of LFM2 models with sparse MoE layers (like LiquidAI/LFM2-8B-A1B). The implementation follows the existing LFM2Model design while adding MoE-specific routing and expert computation logic.

Changes:

Added LFM2MoEModel C++ class that extends the base Model class with MoE routing and expert selection
Implemented moe_expert_apply graph operation for efficient sparse expert computation
Extended Python converter to handle MoE weight patterns and special loading for lfm2_moe models that may not work with standard HuggingFace loading
Added MoE-specific configuration parameters (num_experts, num_experts_per_tok, moe_intermediate_dim, etc.)

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
python/src/weight_patterns.py	Adds MoE weight patterns with {channel} placeholders for expert indices (router, expert_bias, and per-expert w1/w3/w2 weights)
python/src/converter.py	Implements channel-based pattern expansion for MoE experts; adds special handling for lfm2_moe to bypass HuggingFace model loading and work directly with raw state_dict
python/src/config_utils.py	Extracts MoE-specific configuration (num_experts_per_tok, norm_topk_prob, use_expert_bias, routed_scaling_factor) using consistent cfg_get pattern
python/src/cli.py	Adds _load_raw_hf_state_dict helper and _RawModelWrapper class to support lfm2_moe models that may have tokenizer or loading issues
cactus/models/model_lfm2moe.cpp	Implements LFM2MoEModel with build_mlp supporting both dense and MoE layers; includes routing logic with sigmoid + bias, topk selection, and per-expert gated SwiGLU
cactus/models/model.h	Declares LFM2MoEModel class with ExpertWeights structure and LayerWeights containing moe_router, moe_expert_bias, and vector of expert weights
cactus/graph/graph_ops_nn.cpp	Implements compute_moe_expert_apply_node with sparse token selection, compact matmul, and normalized routing weights with configurable scaling
cactus/graph/graph_execute.cpp	Adds MOE_EXPERT_APPLY to execution dispatch and updates profiling column widths to accommodate longer operation names
cactus/graph/graph_builder.cpp	Adds moe_expert_apply builder with comprehensive shape validation for accumulator, hidden state, routing probs, topk indices, and expert weights
cactus/graph/graph.h	Adds MOE_EXPERT_APPLY OpType and normalize_routing flag to OpParams; declares moe_expert_apply graph builder function
cactus/engine/engine_model.cpp	Adds MoE config parsing and factory logic to instantiate LFM2MoEModel when num_experts/moe_intermediate_dim/num_experts_per_tok are non-zero
cactus/engine/engine.h	Adds MoE configuration fields to Config struct with appropriate defaults
README.md	Adds LiquidAI/LFM2-8B-A1B to supported models list

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

rshemet · 2026-02-20T22:23:14Z

super excited about this one

Signed-off-by: HenryNdubuaku <[email protected]>

Addresses review feedback on PR cactus-compute#383: refactors the Qwen3MoeForCausalLM implementation to use the generalized moe_layer graph operation introduced in PR cactus-compute#374, instead of a custom per-expert loop. Key changes: - build_mlp uses gb->moe_layer() matching the LFM2MoEModel pattern - WeightNodeIDs uses ExpertWeights struct (w1/w3/w2) consistent with upstream - Weight file naming follows upstream convention (moe_expert_ prefix) - MoE detection via config fields (num_experts > 0) like LFM2, no new enum - Attention uses INT8 KV cache with attention_int8_hybrid (standard path) - QK normalization per-head before RoPE (Qwen3 architecture requirement) Bug fixes included: - Fix greedy sampling (temperature=0) to use pure argmax regardless of top_p/top_k - Move token_history to file scope with clear_sample_history() to prevent cross-model sampling contamination - Add FP32 softmax support for MoE router weight precision - Fix config parsing to strip \r\n from values Python conversion: - Detect Qwen3MoeForCausalLM model type - Per-expert SwiGLU weight extraction (fused and individual tensor formats) - FP16 auto-promotion for MoE router weights Signed-off-by: Mandark-droid <[email protected]> https://claude.ai/code/session_01SFkVPWXCCtTTpmMwsj274A

Addresses review feedback on PR cactus-compute#383: refactors the Qwen3MoeForCausalLM implementation to use the generalized moe_layer graph operation introduced in PR cactus-compute#374, instead of a custom per-expert loop. Key changes: - build_mlp uses gb->moe_layer() matching the LFM2MoEModel pattern - WeightNodeIDs uses ExpertWeights struct (w1/w3/w2) consistent with upstream - Weight file naming follows upstream convention (moe_expert_ prefix) - MoE detection via config fields (num_experts > 0) like LFM2, no new enum - Attention uses INT8 KV cache with attention_int8_hybrid (standard path) - QK normalization per-head before RoPE (Qwen3 architecture requirement) Bug fixes included: - Fix greedy sampling (temperature=0) to use pure argmax regardless of top_p/top_k - Move token_history to file scope with clear_sample_history() to prevent cross-model sampling contamination - Add FP32 softmax support for MoE router weight precision - Fix config parsing to strip \r\n from values Python conversion: - Detect Qwen3MoeForCausalLM model type - Per-expert SwiGLU weight extraction (fused and individual tensor formats) - FP16 auto-promotion for MoE router weights Signed-off-by: Mandark-droid <[email protected]>

* added moe support for lfm Signed-off-by: Karen Mosoyan <[email protected]> * optimisations Signed-off-by: HenryNdubuaku <[email protected]> * cleanup Signed-off-by: HenryNdubuaku <[email protected]> * generalise moe op Signed-off-by: HenryNdubuaku <[email protected]> --------- Signed-off-by: Karen Mosoyan <[email protected]> Signed-off-by: HenryNdubuaku <[email protected]> Co-authored-by: HenryNdubuaku <[email protected]>

added moe support for lfm

b2dcee9

Signed-off-by: Karen Mosoyan <[email protected]>

kar-m marked this pull request as ready for review February 20, 2026 07:58

Copilot AI review requested due to automatic review settings February 20, 2026 07:58

Copilot started reviewing on behalf of kar-m February 20, 2026 07:59 View session

Copilot AI reviewed Feb 20, 2026

View reviewed changes

Mandark-droid mentioned this pull request Feb 21, 2026

Add Qwen3MoeForCausalLM support #383

Closed

HenryNdubuaku added 3 commits February 22, 2026 18:03

optimisations

5d596be

Signed-off-by: HenryNdubuaku <[email protected]>

cleanup

4d5dba9

Signed-off-by: HenryNdubuaku <[email protected]>

generalise moe op

91f7227

Signed-off-by: HenryNdubuaku <[email protected]>

HenryNdubuaku merged commit 93f49c2 into main Feb 23, 2026
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added moe support for lfm#374

added moe support for lfm#374
HenryNdubuaku merged 4 commits intomainfrom
karen/lfm2moe

kar-m commented Feb 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

rshemet commented Feb 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

kar-m commented Feb 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

rshemet commented Feb 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants