Skip to content

kesenzhao/UV-CoT

Repository files navigation

UV-CoT: unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization

News 🎉🎉🎉

Our paper is accepted by ICCV 2025. The lastest version of the paper is available here: link .

Links

  1. Project page: link
  2. We have released the checkpoint of UV-CoT at our Hugging Face page: link

Overview

Figure 1: UV‑CoT Overview

Visualization

fig5_v1.2.pdf

fig6_v1.2.pdf

Install

  1. Clone this repository and navigate to UV-CoT folder or download the code.
git clone https://github.com/UV-CoT
cd UV-CoT
  1. Install package
conda create -n uv-cot python=3.10 -y
conda activate uv-cot
pip install -e .
  1. Install required spaCy model
wget https://github.com/explosion/spacy-models/releases/download/en_core_web_trf-3.7.3/en_core_web_trf-3.7.3.tar.gz
pip install en_core_web_trf-3.7.3.tar.gz

Preference Data Curation

  1. Environment Setup

Please download fine-tuned Llama3 8B models: split model and question transformation model, and store them in the ./models/llama3_split folder and the ./models/llama3_changeq folder respectively.

  1. Model Feedback

The following script demonstrates using the LLaVA-v1.5-7b model to generate candidate answers and the OmniLMM 12B model to provide feedback.

mkdir ./results
bash ./script/data_gen/run_data_pipeline_llava15_omni.sh

If you want to evaluate according to final answers, please refer to:

bash ./script/data_gen/run_data_pipeline_llava15_omni_next.sh

If you have multi steps CoT, please refer to:

bash ./script/data_gen/run_data_pipeline_llava15_omni_divide.sh

If you want to use self-evaluated method , please refer to:

bash ./script/data_gen/run_data_pipeline_llava15_self_evaluated.sh
  1. A Toy Example

We provide a toy example in the folder cot_one. Process your instruction set into the same format before generating the preference data.

Train

  1. Prepare data

After downloading all of them, organize the data as follows in ./playground/data,

├── coco
│   └── train2017
│   └── train2014
├── gqa
│   └── images
├── ocr_vqa
│   └── images
├── textvqa
│   └── train_images
└── v7w
│   └── images
└── flickr30k
│   └── images
└── cot
│   └── flickr30k
│   └── docvqa
│   └── gqa
│   └── infographicsvqa
│   └── textvqa
│   └── vsr
│   └── dude
│   └── sroie
│   └── vstar
  1. Training

Here, we provide a training script to train the model in 1 iteration. The max_step parameter should be adjusted according to the amount of your data.

Run the following command to start fully fine-tuning.

bash ./script/train/llava15_train.sh
  1. Iterative alignment

To reproduce the iterative training process in the paper, you need to do the following steps for 4 times:

  • S1. Data generation.

    Follow the instructions in Preference Data Curation to generate preference pairs for the base model. Convert the generated jsonl file to huggingface parquet.

  • S2. Change training config.

    In dataset code, replace data_path here to your data path.

    In training script, replace --data_dir with a new directory, replace --model_name_or_path with the base model path, set --max_step to the number of steps for 4 epoch, set --save_steps to the number of steps for 1/4 epoch.

  • S3. Do DPO training.

    Run the training script to train the base model.

  • S4. Choose base model for next iteration.

Evaluation

  1. Inference on both training datasets and zero-shot datasets, UV-CoT can be changed to other model names saved in the ./checkpoints/
bash scripts/v1_5/eval/cot_benchmark.sh UV-CoT
  1. Inference for ablation study
bash scripts/v1_5/eval/cot_benchmark_ablations.sh UV-CoT
  1. Obtain the score using GPT-4o, the API KEY need to be set in llava/eval/eval_cot_score.py
bash scripts/v1_5/eval/cot_score.sh UV-CoT

Citation

If our work assists your research, feel free to give us a star ⭐ or cite us using:

@misc{zhao2025unsupervisedvisualchainofthoughtreasoning,
      title={Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization}, 
      author={Kesen Zhao and Beier Zhu and Qianru Sun and Hanwang Zhang},
      year={2025},
      eprint={2504.18397},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2504.18397}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published