PairUni: Unified Multimodal Training with GRPO

🔗 Links

Paper: arXiv
Model: 🤗 Hugging Face(coming soon)
Data: 🤗 Data(coming soon)

✨ Abstract

Unified Vision-Language Models (UVLMs) must perform both understanding and generation within a single architecture, but these tasks rely on heterogeneous data and supervision, making it difficult to balance them during reinforcement learning (RL). We propose PairUni, a unified framework that reorganizes data into understanding–generation (UG) pairs and aligns optimization accordingly. We first use GPT-o3 to augment single-task data, generating captions for understanding samples and question-answer (QA) pairs for generation samples, forming aligned pairs from the same instance. Additionally, for each generation sample, we retrieve a semantically related understanding example to form a retrieved pair, linking different but related data points. These paired structures expose cross-task semantic correspondences and support consistent policy learning. To leverage this structure, we present Pair-GPRO, a pair-aware variant based on Group Relative Policy Optimization. It assigns a similarity score to each pair to modulate the advantage, strengthening learning from well-aligned examples and reducing task interference.We curate a high-quality dataset of 16K UG pairs named as PairUG for RL fine-tuning and evaluate PairUni on the powerful Janus-Pro UVLMs. Our approach achieves balanced improvements on various UVLMs, outperforming strong UVLMs RL baselines.

📊 Method Overview

Data Pipeline

Overview of the Pair-GRPO training framework

🚀 Quick Start

Environment Setup

Install Python dependencies

pip install -r requirements.txt

Install system dependencies

sudo apt-get install python3-tk -y
sudo apt install libgl1-mesa-glx -y

Download reward model weights

mkdir -p reward_weight
cd reward_weight
wget https://huggingface.co/xswu/HPSv2/resolve/main/HPS_v2.1_compressed.pt
cd ..

Install HPSv2 reward package

cd rewards/HPSv2
pip install -e .
cd ../../

Training

Run the training script:

bash train.sh

Or customize your training with:

torchrun --nproc_per_node=8 \
open_r1/grpo.py \
--deepspeed "configs/zero3.json" \
--output_dir ./checkpoints/your_run_name \
--model_name_or_path deepseek-ai/Janus-Pro-1B \
--pair_data_path data/your_data.jsonl \
--max_prompt_length 512 \
--num_generations_text 8 \
--num_generations_image 8 \
--per_device_train_batch_size 1 \
--gradient_accumulation_steps 2 \
--bf16 true \
--max_steps 8000 \
--learning_rate 1e-6

📁 Project Structure

PairUni/
├── janus/                  # Janus model implementations
│   ├── models/            # Core model architectures
│   └── janusflow/         # Flow-based generation
├── open_r1/               # PairGRPO training framework
│   ├── grpo.py           # Main training script
│   ├── dataset.py        # Dataset loader
│   └── trainer/          # Custom trainer implementation
├── rewards/               # Reward models
│   ├── HPSv2/            # Image quality reward
│   ├── reward_understand.py
│   └── reward_generate.py
├── configs/               # Training configurations
└── train.sh              # Training launch script

🎯 Dataset Format

The training data should be in JSONL format with paired examples:

{
    "similarity": 0.88, 
    "generate_ann": {
        "image_path": "data/images/geneval_train_e52c9d7d6c674fd8b2c8b5d2ec43efac.png", 
        "prompt": "a photo of a towel and a zebra", 
        "question": "Which statement best describes the contrast between the material draped on the animal and the animal’s own surface pattern?\nA. The fabric is smooth and plain, whereas the coat shows bold stripes.\nB. Both the fabric and the coat display identical striping.\nC. The fabric is covered with polka dots, while the coat is entirely plain.\nD. The fabric appears coarse and burlap-like, while the coat looks scaly.\n\nAnswer with the option's letter from the given choices directly.", 
        "answer": "A", 
        "tag": "geneval_train"
    }, 
    "understand_ann": {
        "image_path": "data/images/detection_f2436089737d4f0181f246926c8a2558.png", 
        "prompt": "In open savanna grassland, a small cluster of five plains zebras stands closely together, black-and-white striped bodies angling different directions amid tall yellowish grass under daylight, with erect manes and ears.", 
        "question": "What type of pattern dominates the animals’ coats?\nA. Stripes\nB. Polka dots\nC. Solid gray\nD. Checkered\n\nAnswer with the option's letter from the given choices directly.", 
        "answer": "A", 
        "tag": "detection"
    }
}

📝 Citation

If you find this work useful, please cite:

@article{pairuni2024,
  title={PairUni: Unified Multimodal Training with GRPO},
  author={Your Name},
  journal={arXiv preprint arXiv:XXXX.XXXXX},
  year={2024}
}

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

🙏 Acknowledgments

Janus for the base model architecture
HPSv2 for the image quality reward model
TRL for the GRPO training framework

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PairUni: Unified Multimodal Training with GRPO

🔗 Links

✨ Abstract

📊 Method Overview

🚀 Quick Start

Environment Setup

Training

📁 Project Structure

🎯 Dataset Format

📝 Citation

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
configs		configs
figs		figs
janus		janus
open_r1		open_r1
rewards		rewards
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements.txt		requirements.txt
train.sh		train.sh
utils.py		utils.py

License

Haochen-Wang409/PairUni

Folders and files

Latest commit

History

Repository files navigation

PairUni: Unified Multimodal Training with GRPO

🔗 Links

✨ Abstract

📊 Method Overview

🚀 Quick Start

Environment Setup

Training

📁 Project Structure

🎯 Dataset Format

📝 Citation

📄 License

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages