- [2025/06/10] 🔥 We release the training code of the Unicorn. Try training!
- [2025/04/15] 🔥 Release Unicorn-1.2M & Unicorn-Instruction-471K Datasets. [HF]
Our code is based on Bunny.
Create a conda virtual environment and activate it:
conda create -n bunny python=3.10
conda activate bunnyBasic requirements
pip install --upgrade pip # enable PEP 660 support
pip install transformers
pip install torch torchvision xformers --index-url https://download.pytorch.org/whl/cu118Install apex
# https://github.com/NVIDIA/apex#from-source
pip install ninja
git clone https://github.com/NVIDIA/apex
cd apex
# if pip >= 23.1 (ref: https://pip.pypa.io/en/stable/news/#v23-1) which supports multiple `--config-settings` with the same key...
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
# otherwise
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --global-option="--cpp_ext" --global-option="--cuda_ext" ./Install flash-attention
# https://github.com/Dao-AILab/flash-attention?tab=readme-ov-file#installation-and-features
pip install packaging
pip install flash-attn --no-build-isolationInstall bunny and other requirements
git clone https://github.com/BAAI-DCAI/Bunny.git
cd Bunny
pip install -e .Download the Unicorn-1.2M & Unicorn-Instruction-471K Datasets. [HF]
Embed the captions from Unicorn-1.2M to get text embeddings
python image_embed.py
Mean shift to get synthetic image embeddings
python embed_mean.py
Then, change the embedding path in data_utils.py
folder_path = ''
Note: the same pkl file is used in both the pretraining and instruction-tuning stages
sh script/train/pretrain.sh
sh script/train/finetune_full.sh
If you find this repository helpful, please cite the paper below.
@article{yu2025unicorn,
title={Unicorn: Text-only data synthesis for vision language model training},
author={Yu, Xiaomin and Ding, Pengxiang and Zhang, Wenjie and Huang, Siteng and Gao, Songyang and Qin, Chengwei and Wu, Kejian and Fan, Zhaoxin and Qiao, Ziyue and Wang, Donglin},
journal={arXiv preprint arXiv:2503.22655},
year={2025}
}If you have any questions, please contact: [email protected]