Skip to content

Haochen-Wang409/PairUni

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PairUni: Unified Multimodal Training with GRPO

🔗 Links

✨ Abstract

Unified Vision-Language Models (UVLMs) must perform both understanding and generation within a single architecture, but these tasks rely on heterogeneous data and supervision, making it difficult to balance them during reinforcement learning (RL). We propose PairUni, a unified framework that reorganizes data into understanding–generation (UG) pairs and aligns optimization accordingly. We first use GPT-o3 to augment single-task data, generating captions for understanding samples and question-answer (QA) pairs for generation samples, forming aligned pairs from the same instance. Additionally, for each generation sample, we retrieve a semantically related understanding example to form a retrieved pair, linking different but related data points. These paired structures expose cross-task semantic correspondences and support consistent policy learning. To leverage this structure, we present Pair-GPRO, a pair-aware variant based on Group Relative Policy Optimization. It assigns a similarity score to each pair to modulate the advantage, strengthening learning from well-aligned examples and reducing task interference.We curate a high-quality dataset of 16K UG pairs named as PairUG for RL fine-tuning and evaluate PairUni on the powerful Janus-Pro UVLMs. Our approach achieves balanced improvements on various UVLMs, outperforming strong UVLMs RL baselines.

📊 Method Overview

PairGRPO Method

Data Pipeline

PairGRPO Method

Overview of the Pair-GRPO training framework

🚀 Quick Start

Environment Setup

  1. Install Python dependencies
pip install -r requirements.txt
  1. Install system dependencies
sudo apt-get install python3-tk -y
sudo apt install libgl1-mesa-glx -y
  1. Download reward model weights
mkdir -p reward_weight
cd reward_weight
wget https://huggingface.co/xswu/HPSv2/resolve/main/HPS_v2.1_compressed.pt
cd ..
  1. Install HPSv2 reward package
cd rewards/HPSv2
pip install -e .
cd ../../

Training

Run the training script:

bash train.sh

Or customize your training with:

torchrun --nproc_per_node=8 \
open_r1/grpo.py \
--deepspeed "configs/zero3.json" \
--output_dir ./checkpoints/your_run_name \
--model_name_or_path deepseek-ai/Janus-Pro-1B \
--pair_data_path data/your_data.jsonl \
--max_prompt_length 512 \
--num_generations_text 8 \
--num_generations_image 8 \
--per_device_train_batch_size 1 \
--gradient_accumulation_steps 2 \
--bf16 true \
--max_steps 8000 \
--learning_rate 1e-6

📁 Project Structure

PairUni/
├── janus/                  # Janus model implementations
│   ├── models/            # Core model architectures
│   └── janusflow/         # Flow-based generation
├── open_r1/               # PairGRPO training framework
│   ├── grpo.py           # Main training script
│   ├── dataset.py        # Dataset loader
│   └── trainer/          # Custom trainer implementation
├── rewards/               # Reward models
│   ├── HPSv2/            # Image quality reward
│   ├── reward_understand.py
│   └── reward_generate.py
├── configs/               # Training configurations
└── train.sh              # Training launch script

🎯 Dataset Format

The training data should be in JSONL format with paired examples:

{
    "similarity": 0.88, 
    "generate_ann": {
        "image_path": "data/images/geneval_train_e52c9d7d6c674fd8b2c8b5d2ec43efac.png", 
        "prompt": "a photo of a towel and a zebra", 
        "question": "Which statement best describes the contrast between the material draped on the animal and the animal’s own surface pattern?\nA. The fabric is smooth and plain, whereas the coat shows bold stripes.\nB. Both the fabric and the coat display identical striping.\nC. The fabric is covered with polka dots, while the coat is entirely plain.\nD. The fabric appears coarse and burlap-like, while the coat looks scaly.\n\nAnswer with the option's letter from the given choices directly.", 
        "answer": "A", 
        "tag": "geneval_train"
    }, 
    "understand_ann": {
        "image_path": "data/images/detection_f2436089737d4f0181f246926c8a2558.png", 
        "prompt": "In open savanna grassland, a small cluster of five plains zebras stands closely together, black-and-white striped bodies angling different directions amid tall yellowish grass under daylight, with erect manes and ears.", 
        "question": "What type of pattern dominates the animals’ coats?\nA. Stripes\nB. Polka dots\nC. Solid gray\nD. Checkered\n\nAnswer with the option's letter from the given choices directly.", 
        "answer": "A", 
        "tag": "detection"
    }
}

📝 Citation

If you find this work useful, please cite:

@article{pairuni2024,
  title={PairUni: Unified Multimodal Training with GRPO},
  author={Your Name},
  journal={arXiv preprint arXiv:XXXX.XXXXX},
  year={2024}
}

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

🙏 Acknowledgments

  • Janus for the base model architecture
  • HPSv2 for the image quality reward model
  • TRL for the GRPO training framework

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published