This repo is the official implementation of ICCV 2025 paper: Federated Continual Instruction Tuning
Federated Continual Instruction Tuning
Haiyang Guo, Fanhu Zeng, Fei Zhu, Wenzhuo Liu, Da-Han Wang, Jian Xu, Xu-Yao Zhang, Cheng-Lin Liu
- [2025.07.22] We release camera-ready version on Arxiv. 🎉
- [2025.06.25] FCIT has been accepted by ICCV 2025! 🎉
A vast amount of instruction tuning data is crucial for the impressive performance of Large Multimodal Models (LMMs), but the associated computational costs and data collection demands during supervised fine-tuning make it impractical for most researchers. Federated learning (FL) has the potential to leverage all distributed data and training resources to reduce the overhead of joint training. However, most existing methods assume a fixed number of tasks, while in real-world scenarios, clients continuously encounter new knowledge and often struggle to retain old tasks due to memory constraints. In this work, we introduce the Federated Continual Instruction Tuning (FCIT) benchmark to model this real-world challenge. Our benchmark includes two realistic scenarios, encompassing four different settings and twelve carefully curated instruction tuning datasets. To address the challenges posed by FCIT, we propose dynamic knowledge organization to effectively integrate updates from different tasks during training and subspace selective activation to allocate task-specific output during inference. Extensive experimental results demonstrate that our proposed method significantly enhances model performance across varying levels of data heterogeneity and catastrophic forgetting.
The installation of our environment is the same as CoIN and HiDe-LLaVA.
conda create -n FCIT python=3.10 -y
conda activate FCIT
pip install --upgrade pip
pip install -e .
pip install -e ".[train]"
pip install flash-attn --no-build-isolationTo measure the metrics of caption tasks, please install the following three packages:
pip install nltk==3.9.1
pip install pycocotools==2.0.8
pip install pycocoevalcap==1.2We recommend replacing the eval.py file under that path /envs/FCIT/lib/python3.10/site-packages/pycocoevalcap/ in your environment with the eval.py file that we have provided in the repository to avoid unwanted error reporting and time overhead.
Technical issues can be reported and addressed through the official GitHub issue trackers for both projects: CoIN and LLaVA.
Please download the images from the constituting dataset:
| Image Source | Download Path |
|---|---|
| ArxivQA | images |
| ImageNet-R | images |
| IconQA | images |
| CLEVR-Math | images |
| super-CLEVR | images |
| Flickr30k | images |
| DVQA | images |
| Grounding, AOKVQA | train val test |
| OCR-VQA | images |
| TabMWP | images |
| FigureQA | images |
After downloading all of them, organize the data as follows:
|-- datasets
|-- ArxivQA
|-- images/
|-- CLEVR
|-- images
|-- train/
|-- test/
|-- val/
|-- Flickr30k
|-- train/
|-- val/
|-- IconQA
|-- iconqa_data/
|-- iconqa/
|-- ImageNet-R
|-- train/
|-- test/
|-- COCO2014
|-- train2014/
|-- test2014/
|-- val2014/
|-- super-CLEVR
|-- images/
|-- FigureQA
|-- images/
|-- OCR-VQA
|-- images/
|-- DVQA
|-- images/
|-- TabMWP
|-- tables/
Please download the instructions and partitioned_data from our HuggingFace page, then, organize the instructions as follows:
|-- instructions
|-- ArxivQA
|-- CLEVR-Math
|-- Flickr30k-cap
|-- IconQA
|-- ImageNet-R
|-- super-CLEVR
|-- DVQA
|-- FigureQA
|-- Grounding
|-- OCRVQA
|-- AOKVQA
|-- TabMWP
|-- partitioned_data
|-- Capability-related
|-- cap
|-- Task-related
|-- seq
Please download LLaVA and CLIP, and use the config.json provided in this repository replace the original config.json in LLaVA.
The training script is in scripts/LLaVA/Train_FCIT.../train_all.sh. Before running, please do not forget to modify the path in the files to your actual path.
e.g., Task-related Homogeneous FCIT setting with beta=1.0
sh scripts/LLaVA/Train_FCIT_task_hom/train_all.sh 1.0
The evaluation script is in scripts/LLaVA/Eval_FCIT/. Before running, please do not forget to modify the path in the files to your actual path.
e.g., Task-related Homogeneous FCIT setting with beta=1.0
sh scripts/LLaVA/Eval_FCIT/Eval_FCIT_task_hom.sh 1.0
@article{guo2025federated,
title={Federated continual instruction tuning},
author={Guo, Haiyang and Zeng, Fanhu and Zhu, Fei and Liu, Wenzhuo and Wang, Da-Han and Xu, Jian and Zhang, Xu-Yao and Liu, Cheng-Lin},
journal={arXiv preprint arXiv:2503.12897},
year={2025}
}This repository is built upon the LLaVA, CoIN, Shepherd, and OpenFedLLM projects. We sincerely thank the authors for their valuable contributions to the research community.
