Skip to content

Ghy0501/MCITlib

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

47 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

✨Introduction β€’ πŸ₯‡ Methods Provided β€’ 🏦 Benchmarks β€’ 🎨 Models
πŸƒ How to run β€’ 🀝 Acknowledgments β€’ πŸ™‚ Contact

If you like our project, please give us a star ⭐ on GitHub for latest update.

πŸ“‘ Paper (arXiv:2508.07307) hf_space zhihu zhihu zhihu

✨ Introduction

Welcome to MCITlib β€” your ultimate library for continual instruction tuning with multimodal large language models. MCITlib brings together a diverse set of cutting-edge methods into a unified, easy-to-use framework. Beyond method integration, MCITlib offers comprehensive evaluation results across a variety of benchmarks and model architectures, empowering researchers and practitioners to explore and innovate in this exciting field.

Why choose MCITlib?

  • πŸš€ Pioneering Open Source: We are proud to be the first open-source repository to provide a complete codebase and benchmark suite dedicated to multimodal continual instruction tuning.
  • 🌟 Beginner-Friendly Design: MCITlib is designed with usability in mind, offering clear, step-by-step guidance to help newcomers quickly get started and make meaningful progress.
  • πŸ”„ Continuous Innovation: Our commitment doesn’t stop here. We will regularly update MCITlib, integrating new methods and benchmarks to stay at the forefront of the field and provide lasting value to the community.

Whether you're a beginner seeking to learn or an expert aiming to innovate, MCITlib is your gateway to advancing multimodal continual instruction tuning research. Join us and contribute to shaping the future of this rapidly evolving domain!

🫰 We also have other multimodal continual instruction tuning projects that may interest you 🫰.

HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model
Haiyang Guo, Fanhu Zeng, Ziwei Xiang, Fei Zhu, Da-Han Wang, Xu-Yao Zhang, Cheng-Lin Liu
github arXiv

Federated Continual Instruction Tuning
Haiyang Guo, Fanhu Zeng, Fei Zhu, Wenzhuo Liu, Da-Han Wang, Jian Xu, Xu-Yao Zhang, Cheng-Lin Liu
github arXiv

ModalPrompt: Towards Efficient Multimodal Continual Instruction Tuning with Dual-Modality Guided Prompt
Fanhu Zeng, Fei Zhu, Haiyang Guo, Xu-Yao Zhang, Cheng-Lin Liu
github arXiv

Continual Learning for Generative AI: From LLMs to MLLMs and Beyond
Haiyang Guo, Fanhu Zeng, Fei Zhu, Jiayi Wang, Xukai Wang, Jingang Zhou, Hongbo Zhao,
Wenzhuo Liu, Shijie Ma, Da-Han Wang, Xu-Yao Zhang, Cheng-Lin Liu
github arXiv

MLLM-CL: Continual Learning for Multimodal Large Language Models
Hongbo Zhao, Fei Zhu, Haiyang Guo, Meng Wang, Rundong Wang, Gaofeng Meng, Zhaoxiang Zhang
github arXiv

LLaVA-c: Continual Improved Visual Instruction Tuning
Wenzhuo Liu, Fei Zhu, Haiyang Guo, Longhui Wei, Cheng-Lin Liu
arXiv

πŸ“° News

  • [2026.1.2] πŸ”₯πŸ”₯πŸ”₯ We have updated the paper in MCITlib with the latest results. Please feel free to check it out. πŸŽ‰πŸŽ‰πŸŽ‰
  • [2025.10.14] πŸ”₯πŸ”₯πŸ”₯ MCITlib-v2 has been updated! The latest version includes training and testing code for 8 mainstream multimodal continual instruction tuning methods, compatible with 2 base models and 3 continual instruction tuning datasets. πŸŽ‰πŸŽ‰πŸŽ‰
  • [2025.09.16] We have updated the new version of the paper and attached the accuracy matrix of each method for reference. πŸŽ‰
  • [2025.08.12] Initial MCITlib paper released! πŸŽ‰
  • [2025.08.10] Initial version of MCITlib is released. πŸŽ‰

πŸ₯‡ Methods Provided

  • LoRA-FT: Baseline method which simply updates LoRA parameters on new tasks. [Paper]
  • O-LoRA: Orthogonal subspace learning for language model continual learning. [Paper]
  • MoELoRA: CoIN: A Benchmark of Continual Instruction Tuning for Multimodal Large Language Models [Paper]
  • ModalPrompt: ModalPrompt: Dual-Modality Guided Prompt for Continual Learning of Large Multimodal Models [Paper]
  • CL-MoE: CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering [Paper]
  • HiDe: HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model [Paper]
  • SEFE: SEFE: Superficial and Essential Forgetting Eliminator for Multimodal Continual Instruction Tuning [Paper]
  • DISCO: Federated Continual Instruction Tuning [Paper]

🏦 Benchmarks

We currently report results on the UCIT, MLLM-DCL and MLLM-ACL benchmarks. Please refer to the provided links to download the corresponding images and instruction sets, and organize them in the following directory structure:

|--your_path
    |-- Domain_data
        |-- AD
        |-- Med
        |-- RS
        |-- Sci
        |-- Fin
    |-- Ability_data
        |-- OCR
        |-- OCR_test
        |-- Math
        |-- Math_test
        |-- APP
        |-- APP_test
        |-- VP
        |-- VP_test
    |-- UCIT
        |-- datasets
        |-- ArxivQA
        |-- CLEVR-Math
        |-- Flickr30k
        |-- IconQA
        |-- ImageNet-R
        |-- VizWiz

Note: You need to modify the data path in all the scripts to your own path.

🎨 Models

We currently provide a reproduction based on the LLaVA-1.5-7B and InternVL-Chat-7B model. Please download it to your local directory.

huggingface-cli download liuhaotian/llava-v1.5-7b --local-dir /your_path/llava-v1.5-7b
huggingface-cli download openai/clip-vit-large-patch14-336 --local-dir /your_path/clip-vit-large-patch14-336

huggingface-cli download OpenGVLab/InternVL-Chat-ViT-6B-Vicuna-7B --local-dir /your_path/Internvl-chat-7b
huggingface-cli download OpenGVLab/InternViT-6B-224px --local-dir /your_path/InternViT-6B-224px

We also plan to extend our reproduction to other MLLM architectures in the near future.

Note: To meet the requirements of certain methods, we need to apply additional processing to the config file in the downloaded model. The details are outlined below:

  1. add "mm_text_select_layer": -1 and "mm_text_tower": "/your_path/clip-vit-large-patch14-336" to the config.py in your local model weight path /your_path/llava-v1.5-7b and /your_path/Internvl-chat-7b.
  2. remove "temperature": 0.9 and "top_p": 0.6 in the generation_config.json of your local model weight path.

We provide reference config.py and generation_config.json in examples.

πŸƒ How to run

Note: Our experiment is conducted in a CUDA 11.8 environment, with most libraries in the setup aligned to this CUDA version. Therefore, we recommend using nvcc -V to check the CUDA version on your current server. If it does not match, please install CUDA 11.8 before proceeding.

1. Clone this repository

git clone https://github.com/Ghy0501/MCITlib.git
cd MCITlib

2. Install Package

conda create -n MCITlib python=3.10 -y
conda activate MCITlib
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia
cd LLaVA/LoRA-FT
pip install --upgrade pip
pip install -e .
pip install -e ".[train]"

For installing flash-attn, we recommend downloading version 2.6.3 from the official repository according to your CUDA and PyTorch versions, and placing it in a local directory for manual installation. For example:

pip install flash_attn-2.6.3+cu118torch2.0cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

We also provide an environment.yml file to help users identify missing dependencies and version mismatches. However, due to potential library conflicts, automatic installation may fail to install certain packages. We therefore recommend manually installing them based on the provided error messages and version specifications. For essential evaluation-related dependencies, please refer to the UCIT and MLLM-CL repositories.

3. Modify path and parameter settings

Before running, please set all the model paths to your local paths. The paths that need to be modified are listed below, and don’t forget to update the dataset path as well.

  • Change /mnt/haiyangguo/mywork/CL-MLLM/MCITlib_v2 to /your_path/MCITlib.
  • Change /mnt/haiyangguo/mywork/FCIT/pre_trained/llava-v1.5-7b to /your_path/llava-v1.5-7b.
  • Change /mnt/haiyangguo/mywork/CL-MLLM/pre_trained/Internvl-chat-7b to /your_path/Internvl-chat-7b.
  • Change /mnt/ShareDB_6TB/models/clip-vit-large-patch14-336 to /your_path/clip-vit-large-patch14-336.
  • Change /mnt/ShareDB_6TB/models/InternViT-6B-224px to /your_path/InternViT-6B-224px.
  • Change /mnt/ShareDB_6TB/datasets/MLLM_CL/checkpoint to /your_path/checkpoint.

After adjusting the path, users can modify parameters like gpu_num based on their actual operating environment. All parameter settings are integrated into the configs/ folder.

Note: We recommend using the Find in Folder command in VS Code for search and replace operations.

4. Training and Evaluation

We provide predefined training and testing hyperparameters in the configs files within each method's directory, which can be adjusted as needed. The corresponding training and testing scripts are located in the scripts directory. Once all paths are correctly configured, the scripts should execute without issues. For example:

cd LLaVA/LoRA-FT
sh scripts/MCITlib/Train/train_DCL.sh

The program will automatically perform both training and inference. However, for ModalPrompt (LLaVA version), training and inference must be executed separately. Please refer to its repository for detailed instructions.

Citation

@article{guo2025mcitlib,
  title={MCITlib: Multimodal Continual Instruction Tuning Library and Benchmark},
  author={Guo, Haiyang and Zhu, Fei and Zhao, Hongbo and Zeng, Fanhu and Liu, Wenzhuo and Ma, Shijie and Wang, Da-Han and Zhang, Xu-Yao},
  journal={arXiv preprint arXiv:2508.07307},
  year={2025}
}

🀝 Acknowledgments

We thank the following repos providing helpful functions in our work.

πŸ™‚ Contact

If you have any questions or suggestions for new features, please open an issue or contact the author, Haiyang Guo ([email protected]).

About

MCITlib: Multimodal Continual Instruction Tuning Library and Benchmark

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages