β¨Introduction β’
π₯ Methods Provided β’
π¦ Benchmarks β’
π¨ Models
π How to run β’
π€ Acknowledgments β’
π Contact
Welcome to MCITlib β your ultimate library for continual instruction tuning with multimodal large language models. MCITlib brings together a diverse set of cutting-edge methods into a unified, easy-to-use framework. Beyond method integration, MCITlib offers comprehensive evaluation results across a variety of benchmarks and model architectures, empowering researchers and practitioners to explore and innovate in this exciting field.
Why choose MCITlibοΌ
- π Pioneering Open Source: We are proud to be the first open-source repository to provide a complete codebase and benchmark suite dedicated to multimodal continual instruction tuning.
- π Beginner-Friendly Design: MCITlib is designed with usability in mind, offering clear, step-by-step guidance to help newcomers quickly get started and make meaningful progress.
- π Continuous Innovation: Our commitment doesnβt stop here. We will regularly update MCITlib, integrating new methods and benchmarks to stay at the forefront of the field and provide lasting value to the community.
Whether you're a beginner seeking to learn or an expert aiming to innovate, MCITlib is your gateway to advancing multimodal continual instruction tuning research. Join us and contribute to shaping the future of this rapidly evolving domain!
π«° We also have other multimodal continual instruction tuning projects that may interest you π«°.
HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model
Haiyang Guo, Fanhu Zeng, Ziwei Xiang, Fei Zhu, Da-Han Wang, Xu-Yao Zhang, Cheng-Lin Liu
![]()
![]()
![]()
Federated Continual Instruction Tuning
Haiyang Guo, Fanhu Zeng, Fei Zhu, Wenzhuo Liu, Da-Han Wang, Jian Xu, Xu-Yao Zhang, Cheng-Lin Liu
![]()
![]()
![]()
ModalPrompt: Towards Efficient Multimodal Continual Instruction Tuning with Dual-Modality Guided Prompt
Fanhu Zeng, Fei Zhu, Haiyang Guo, Xu-Yao Zhang, Cheng-Lin Liu
![]()
![]()
![]()
Continual Learning for Generative AI: From LLMs to MLLMs and Beyond
Haiyang Guo, Fanhu Zeng, Fei Zhu, Jiayi Wang, Xukai Wang, Jingang Zhou, Hongbo Zhao,
Wenzhuo Liu, Shijie Ma, Da-Han Wang, Xu-Yao Zhang, Cheng-Lin Liu
![]()
![]()
MLLM-CL: Continual Learning for Multimodal Large Language Models
Hongbo Zhao, Fei Zhu, Haiyang Guo, Meng Wang, Rundong Wang, Gaofeng Meng, Zhaoxiang Zhang
![]()
![]()
LLaVA-c: Continual Improved Visual Instruction Tuning
Wenzhuo Liu, Fei Zhu, Haiyang Guo, Longhui Wei, Cheng-Lin Liu
![]()
- [2026.1.2] π₯π₯π₯ We have updated the paper in MCITlib with the latest results. Please feel free to check it out. πππ
- [2025.10.14] π₯π₯π₯ MCITlib-v2 has been updated! The latest version includes training and testing code for 8 mainstream multimodal continual instruction tuning methods, compatible with 2 base models and 3 continual instruction tuning datasets. πππ
- [2025.09.16] We have updated the new version of the paper and attached the accuracy matrix of each method for reference. π
- [2025.08.12] Initial MCITlib paper released! π
- [2025.08.10] Initial version of MCITlib is released. π
LoRA-FT: Baseline method which simply updates LoRA parameters on new tasks. [Paper]O-LoRA: Orthogonal subspace learning for language model continual learning. [Paper]MoELoRA: CoIN: A Benchmark of Continual Instruction Tuning for Multimodal Large Language Models [Paper]ModalPrompt: ModalPrompt: Dual-Modality Guided Prompt for Continual Learning of Large Multimodal Models [Paper]CL-MoE: CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering [Paper]HiDe: HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model [Paper]SEFE: SEFE: Superficial and Essential Forgetting Eliminator for Multimodal Continual Instruction Tuning [Paper]DISCO: Federated Continual Instruction Tuning [Paper]
We currently report results on the UCIT, MLLM-DCL and MLLM-ACL benchmarks. Please refer to the provided links to download the corresponding images and instruction sets, and organize them in the following directory structure:
|--your_path
|-- Domain_data
|-- AD
|-- Med
|-- RS
|-- Sci
|-- Fin
|-- Ability_data
|-- OCR
|-- OCR_test
|-- Math
|-- Math_test
|-- APP
|-- APP_test
|-- VP
|-- VP_test
|-- UCIT
|-- datasets
|-- ArxivQA
|-- CLEVR-Math
|-- Flickr30k
|-- IconQA
|-- ImageNet-R
|-- VizWiz
Note: You need to modify the data path in all the scripts to your own path.
We currently provide a reproduction based on the LLaVA-1.5-7B and InternVL-Chat-7B model. Please download it to your local directory.
huggingface-cli download liuhaotian/llava-v1.5-7b --local-dir /your_path/llava-v1.5-7b
huggingface-cli download openai/clip-vit-large-patch14-336 --local-dir /your_path/clip-vit-large-patch14-336
huggingface-cli download OpenGVLab/InternVL-Chat-ViT-6B-Vicuna-7B --local-dir /your_path/Internvl-chat-7b
huggingface-cli download OpenGVLab/InternViT-6B-224px --local-dir /your_path/InternViT-6B-224px
We also plan to extend our reproduction to other MLLM architectures in the near future.
Note: To meet the requirements of certain methods, we need to apply additional processing to the config file in the downloaded model. The details are outlined below:
- add
"mm_text_select_layer": -1and"mm_text_tower": "/your_path/clip-vit-large-patch14-336"to theconfig.pyin your local model weight path/your_path/llava-v1.5-7band/your_path/Internvl-chat-7b. - remove
"temperature": 0.9and"top_p": 0.6in thegeneration_config.jsonof your local model weight path.
We provide reference config.py and generation_config.json in examples.
Note: Our experiment is conducted in a CUDA 11.8 environment, with most libraries in the setup aligned to this CUDA version. Therefore, we recommend using nvcc -V to check the CUDA version on your current server. If it does not match, please install CUDA 11.8 before proceeding.
git clone https://github.com/Ghy0501/MCITlib.git
cd MCITlib
conda create -n MCITlib python=3.10 -y
conda activate MCITlib
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia
cd LLaVA/LoRA-FT
pip install --upgrade pip
pip install -e .
pip install -e ".[train]"
For installing flash-attn, we recommend downloading version 2.6.3 from the official repository according to your CUDA and PyTorch versions, and placing it in a local directory for manual installation. For example:
pip install flash_attn-2.6.3+cu118torch2.0cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
We also provide an environment.yml file to help users identify missing dependencies and version mismatches. However, due to potential library conflicts, automatic installation may fail to install certain packages. We therefore recommend manually installing them based on the provided error messages and version specifications. For essential evaluation-related dependencies, please refer to the UCIT and MLLM-CL repositories.
Before running, please set all the model paths to your local paths. The paths that need to be modified are listed below, and donβt forget to update the dataset path as well.
- Change
/mnt/haiyangguo/mywork/CL-MLLM/MCITlib_v2to/your_path/MCITlib. - Change
/mnt/haiyangguo/mywork/FCIT/pre_trained/llava-v1.5-7bto/your_path/llava-v1.5-7b. - Change
/mnt/haiyangguo/mywork/CL-MLLM/pre_trained/Internvl-chat-7bto/your_path/Internvl-chat-7b. - Change
/mnt/ShareDB_6TB/models/clip-vit-large-patch14-336to/your_path/clip-vit-large-patch14-336. - Change
/mnt/ShareDB_6TB/models/InternViT-6B-224pxto/your_path/InternViT-6B-224px. - Change
/mnt/ShareDB_6TB/datasets/MLLM_CL/checkpointto/your_path/checkpoint.
After adjusting the path, users can modify parameters like gpu_num based on their actual operating environment. All parameter settings are integrated into the configs/ folder.
Note: We recommend using the Find in Folder command in VS Code for search and replace operations.
We provide predefined training and testing hyperparameters in the configs files within each method's directory, which can be adjusted as needed. The corresponding training and testing scripts are located in the scripts directory. Once all paths are correctly configured, the scripts should execute without issues. For example:
cd LLaVA/LoRA-FT
sh scripts/MCITlib/Train/train_DCL.sh
The program will automatically perform both training and inference. However, for ModalPrompt (LLaVA version), training and inference must be executed separately. Please refer to its repository for detailed instructions.
@article{guo2025mcitlib,
title={MCITlib: Multimodal Continual Instruction Tuning Library and Benchmark},
author={Guo, Haiyang and Zhu, Fei and Zhao, Hongbo and Zeng, Fanhu and Liu, Wenzhuo and Ma, Shijie and Wang, Da-Han and Zhang, Xu-Yao},
journal={arXiv preprint arXiv:2508.07307},
year={2025}
}We thank the following repos providing helpful functions in our work.
If you have any questions or suggestions for new features, please open an issue or contact the author, Haiyang Guo ([email protected]).
