Metis-HOME: Hybrid Optimized Mixture-of-Experts for Multimodal Reasoning

💡 Overview

Current multimodal reasoning models face a critical dilemma: they often "overthink" on simple tasks (inefficiency) and suffer from general capability degradation when optimized for reasoning.

We introduce Metis-HOME (Hybrid Optimized Mixture-of-Experts), a novel framework that enables a "Hybrid Thinking" paradigm. By structuring the original dense model (Qwen2.5-VL-7B) into two distinct expert branches—a Thinking Expert for complex reasoning and a Non-Thinking Expert for rapid inference—controlled by a lightweight router, Metis-HOME effectively resolves the reasoning-vs-generalization trade-off.

✨ Highlights

🧠 Hybrid Thinking Paradigm: Explicitly decouples "System 1" (fast, intuitive) and "System 2" (slow, deliberative) reasoning within a unified multimodal MoE architecture.
🔄 Router Mechanism: A lightweight, trainable router dynamically allocates queries based on complexity, avoiding computational waste on simple tasks like OCR or Captioning.
🚀 Performance:
- +6.9% improvement on reasoning benchmarks (MathVista, etc.) compared to the baseline.
- ~1% gain on general benchmarks, reversing the degradation trend observed in other reasoning-specialized models.
🛠️ Efficient Training: A multi-stage strategy combining Reinforcement Learning (RL) for reasoning enhancement and Mixed Supervised Fine-Tuning (SFT) for expert specialization.

📊 Results

Thinking Ratio

As shown in the following figure, the thinking ratio analysis of Metis-HOME reveals adaptive routing behavior:

High ratios (78%–98%) on reasoning-heavy benchmarks (WeMath, MathVision, etc.), indicating effective use of the thinking expert for multi-step inference.
Low ratios (2%–5%) on general benchmarks (MMBench, OCRBench), showing preference for the non-thinking expert.

This aligns with our design: deliberate reasoning for complex tasks, fast inference for simple ones, optimizing computational efficiency.

Benchmarks

Model	Reasoning							General
Model	MathVista	MathVision	MathVerse	DynaMath	WeMath	LogicVista	Avg.	Avg.
*Proprietary Models*
Gemini-2.0-Pro	71.3	48.1	67.3	43.3	56.5	53.2	56.6	73.3
Gemini-2.0-Flash	70.4	43.6	47.8	42.1	47.4	52.3	50.6	72.6
Claude 3.7 Sonnet	66.8	41.9	46.7	39.7	49.3	58.2	50.4	70.1
ChatGPT-4o	60.0	31.2	40.6	34.5	45.8	52.8	44.2	72.0
*Open-source Models*
LLaVA-OneVision-72B	67.1	25.3	27.2	15.6	32.0	40.9	34.7	68.0
Kimi-VL-A3B-Instruct	66.0	21.8	34.1	18.0	32.3	42.7	35.8	69.1
InternVL3-8B	70.5	30.0	38.5	25.7	39.5	44.5	41.4	73.6
VL-Rethinker-7B	75.5	29.3	47.2	25.4	37.8	47.0	43.7	68.3
Metis-RISE-7B	75.8	28.7	51.0	27.7	45.2	49.7	46.4	68.4
Baseline	67.4	26.2	41.1	20.2	34.5	45.6	39.2	70.3
Baseline+RL	72.8	28.7	46.8	26.2	43.3	46.5	44.0	67.2
Metis-HOME	76.0	29.5	47.7	26.4	45.6	51.5	46.1	71.2

🔍 Usage Example

You can use the demo inference script in the examples folder:

python examples/demo_inference.py

📌 Acknowledgement

We sincerely appreciate LLaMA-Factory and MM-EUREKA for providing reference training framework.

📖 Citation

@article{lan2025metis,
  title={Metis-HOME: Hybrid Optimized Mixture-of-Experts for Multimodal Reasoning},
  author={Lan, Xiaohan and Liu, Fanfan and Qiu, Haibo and Yang, Siqi and Ruan, Delian and Shi, Peng and Ma, Lin},
  journal={arXiv preprint arXiv:2510.20519},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
examples		examples
transformers		transformers
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Metis-HOME: Hybrid Optimized Mixture-of-Experts for Multimodal Reasoning

💡 Overview

✨ Highlights

📊 Results

Thinking Ratio

Benchmarks

🔍 Usage Example

📌 Acknowledgement

📖 Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

MM-Thinking/Metis-HOME

Folders and files

Latest commit

History

Repository files navigation

Metis-HOME: Hybrid Optimized Mixture-of-Experts for Multimodal Reasoning

💡 Overview

✨ Highlights

📊 Results

Thinking Ratio

Benchmarks

🔍 Usage Example

📌 Acknowledgement

📖 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages