This repository contains the official implementation for the ICML 2025 paper: Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging.
VLM_Merging explores techniques for merging large vision-language models (VLMs) to enhance their capabilities in perception and reasoning tasks. The project leverages model merging strategies to combine the strengths of different VLMs and Math LLMs, resulting in improved performance across various benchmarks.
-
Implementation of various model merging techniques including:
- Base merging
- Layer swapping
- TIES merging
- DARE-TIES and DARE-linear merging
-
Evaluation framework using VLMEvalKit for comprehensive assessment of merged models
git clone https://github.com/shiqichen17/VLM_Merging.git
cd VLM_Merging
conda create -n vlm_merging python=3.10 -y
conda activate vlm_merging
pip install -r requirements.txt
cd VLMEvalKit
pip install -e .The main merging functionality is provided in merge.py. You can merge LLaVA-1.6 and Dart-Math (both based on LLaMA3) using the following command:
python merge.py --model1_path llava-hf/llama3-llava-next-8b-hf \
--model2_path hkust-nlp/dart-math-llama3-8b-prop2diff \
--output_dir /path/to/output \
--alpha 0.5 \
--mode basebase: Standard weighted averaging of model parameterslayerswap: Swap specific layers between modelsties: Task vector merging using TIESdareties: DARE-TIES merging with sparse task vectorsdarelinear: DARE-linear merging with sparse task vectors
--alpha: Weighting factor for the merge (between 0 and 1)--base_layer_num: Base layer number (required for layerswap mode)--basemodel_path: Base model path (required for TIES-based modes)--density: Sparsity parameter for DARE modes (default: 0.2)--alpha2: Secondary alpha parameter for some merging modes (default: 0.2)
Example scripts for model merging and evaluation are provided in:
scripts/merge/example.sh: Contains examples of different merging strategies- [UPDATE]
scripts/merge/example_details.sh: Contains full merging experiments for reproducing the paper scripts/eval/evaluate_vlm.sh: Shows how to evaluate merged models on various benchmarks
The repository integrates with VLMEvalKit for comprehensive evaluation of merged models on various benchmarks. See the VLMEvalKit documentation for details on running evaluations.
If you find this work helpful, please consider citing our paper.
@misc{chen2025bringreason,
title={Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging},
author={Shiqi Chen and Jinghan Zhang and Tongyao Zhu and Wei Liu and Siyang Gao and Miao Xiong and Manling Li and Junxian He},
year={2025},
eprint={2505.05464},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.05464},
}
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- VLMEvalKit for providing the evaluation framework
- The authors of the original models used in our experiments