STAGE: Stable and Generalizable GRPO for Autoregressive Image Generation

News

[2025-09] We have released official Codebase and weights at Hugging Face!
[2025-09] STAGE Technical Report is released.

Introduction

STAGE is the first system to systematically study the stability and generalization of GRPO-based autoregressive visual generation.

Built on the original Janus-Pro-7B, STAGE improves the GenEval score from 0.78 to 0.89 (≈14% improvement) while avoiding degradation in image quality, and its effectiveness and generalization are further validated across a wide range of benchmarks, including T2I-Compbench and ImageReward.

We have also discussed rewards related to human preference and text rendering.

CLICK for Detailed Introduction

Reinforcement learning has recently been explored to improve text-to-image generation, yet applying existing GRPO algorithms to autoregressive (AR) image models remains challenging. The instability of the training process easily disrupts the pretrained model capability during long runs, resulting in marginal gains, degraded image quality, and poor generalization. In this work, we revisit GRPO for AR image generation and identify two key issues: contradictory gradients from unnecessary tokens and unstable policy entropy dynamics. To address these, we introduce STAGE, a stable and generalizable framework that leverages two targeted solutions:

Advantage/KL reweighting. Similarity-aware reweighting to alleviate conflicting updates; and
Entropy reward. An entropy-based reward corresponding to reference model to stabilize learning.

With the help of alleviating conflicts between tokens and an entropy reward for stabilizing training, we reduce disruption of the pretrained distribution and mitigate reward hacking, which in turn improves generalization and transfer better to other benchmarks. Experiments across multiple benchmarks show that STAGE consistently improves visual quality, stability, and cross-task generalization compared to baseline GRPO.

Qualitative Performance

Quantitative Performance

Generalization Comparison

Get Started

Quick Inference

Clone the repository:

git clone https://github.com/krennic999/STAGE.git
cd STAGE/src

Download models at Huggingface:

Reward Type	Hugging Face Link
GenEval	🤗Huggingface
HPS+GiT+Gdino	🤗Huggingface
OCR	🤗Huggingface

cd into grpo/src/infer, and use:

 python reason_inference.py \
 --model_path YOUR_MODEL_CKPT \
 --data_path test_data.txt

Or run reason_inference_xxx.py for inference on GenEval/HPS/Drawbench/T2I-Compbench/OCR

 torchrun --nnodes=1 --nproc_per_node=8 --node_rank=0 --master_port=29500 reason_inference_xxx.py --model_path your-janus-pro-model --save_root your-save-root

Set up the reward model environment for training

Install GrouningDINO if you want to use Object Detector reward

cd grpo/src/utils/GroundingDINO
pip install -e .

Install LLaVA if you want to use ORM reward

cd grpo/src/utils/LLaVA-NeXT
pip install -e ".[train]"

Detailed install instructions can be found in T2I-R1

Prepare reward model checkpoints

Download corresponding models to your local path, and update your local path to grpo/configs/reward_paths.json:

Reward Model	Link	Download Command
HPS	HPS_v2.1_compressed.pt	`wget https://huggingface.co/xswu/HPSv2/resolve/main/HPS_v2.1_compressed.pt`
GIT	git-large-vqav2	`huggingface-cli download microsoft/git-large-vqav2 --repo-type model --local-dir git-large-vqav2`
GroundingDINO	groundingdino_swint_ogc.pth	`wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth`
ORM	ORM-T2I-R1	`huggingface-cli download CaraJ/ORM-T2I-R1 --repo-type model --local-dir ORM-T2I-R1`
CLIP	open-clip	`huggingface-cli download laion/CLIP-ViT-H-14-laion2B-s32B-b79K --repo-type model --local-dir open-clip`

About how to prepare GenEval reward:

Based on Flow-GRPO, we reimplemented an offline version of the GenEval reward:

 git clone https://github.com/djghosh13/geneval.git
 cd geneval
 conda env create -f environment.yml
 conda activate geneval
 ./evaluation/download_models.sh "<OBJECT_DETECTOR_FOLDER>/"

 git clone https://github.com/open-mmlab/mmdetection.git
 cd mmdetection; git checkout 2.x
 pip install -v -e .

See GenEval for detailed install instruction.

And replace corresponding PATH in grpo/configs/reward_paths.json with your path:

 default_config = "/your-mmdet-code/mmdetection/configs/mask2former/mask2former_swin-s-p4-w7-224_lsj_8x2_50e_coco.py"
 default_ckpt = "/your-mmdet-ckpt/mmdetection"
 default_clip = "your-clip-ckpt/timm/vit_large_patch14_clip_224.openai/open_clip_pytorch_model.bin"
 default_obj  = "/your-stage-root/STAGE/src/grpo/src/utils/reward-server/reward_server/object_names.txt"

How to prepare OCR reward: Following Flow-GRPO, install paddle-ocr first, and then set ocr_base_root in grpo/configs/reward_paths.json to your path of pre-downloaded model:

 ocr_base_root = "/your-paddleocr-root/paddleocr/whl"

Training

For training on different rewards:

cd grpo/src
python run_training_xxx.py

Evaluation on benchmarks

cd into grpo/src/infer, and run reason_inference_xxx.py for inference on GenEval/HPS/Drawbench/T2I-Compbench/OCR:

 torchrun --nnodes=1 --nproc_per_node=8 --node_rank=0 --master_port=29500 reason_inference_xxx.py --model_path your-janus-pro-model --save_root your-save-root

We provide useful tool for calculate T2I-Compbench score:

First prepare T2I-Compbench based on the official repo: cd into into grpo/src/infer, first replace T2I_COMP_CODE_ROOTin ./cal_t2i_compbench_value.sh and use:

 ./cal_t2i_compbench_value.sh your-save-root

and results can be found in corresponding txt file.

For evaluation on OCR, use:

 python cal_ocr_score.py --image_root your-save-root

Acknowledgements

We thank T2I-R1 and Flow-GRPO for their great work, upon which our repo is built.

Cite

@misc{ma2025stage,
      title={STAGE: Stable and Generalizable GRPO for Autoregressive Image Generation}, 
      author={Xiaoxiao Ma and Haibo Qiu and Guohui Zhang and Zhixiong Zeng and Siqi Yang and Lin Ma and Feng Zhao},
      year={2025},
      eprint={2509.25027},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2509.25027}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
data		data
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

STAGE: Stable and Generalizable GRPO for Autoregressive Image Generation

News

Introduction

Qualitative Performance

Quantitative Performance

Generalization Comparison

Get Started

Quick Inference

Set up the reward model environment for training

Prepare reward model checkpoints

Training

Evaluation on benchmarks

Acknowledgements

Cite

About

Uh oh!

Releases

Packages

Languages

krennic999/STAGE

Folders and files

Latest commit

History

Repository files navigation

STAGE: Stable and Generalizable GRPO for Autoregressive Image Generation

News

Introduction

Qualitative Performance

Quantitative Performance

Generalization Comparison

Get Started

Quick Inference

Set up the reward model environment for training

Prepare reward model checkpoints

Training

Evaluation on benchmarks

Acknowledgements

Cite

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages