- [2025-09] We have released official Codebase and weights at Hugging Face!
- [2025-09] STAGE Technical Report is released.
STAGE is the first system to systematically study the stability and generalization of GRPO-based autoregressive visual generation.
Built on the original Janus-Pro-7B, STAGE improves the GenEval score from 0.78 to 0.89 (≈14% improvement) while avoiding degradation in image quality, and its effectiveness and generalization are further validated across a wide range of benchmarks, including T2I-Compbench and ImageReward.
We have also discussed rewards related to human preference and text rendering.
CLICK for Detailed Introduction
Reinforcement learning has recently been explored to improve text-to-image generation, yet applying existing GRPO algorithms to autoregressive (AR) image models remains challenging. The instability of the training process easily disrupts the pretrained model capability during long runs, resulting in marginal gains, degraded image quality, and poor generalization. In this work, we revisit GRPO for AR image generation and identify two key issues: contradictory gradients from unnecessary tokens and unstable policy entropy dynamics. To address these, we introduce STAGE, a stable and generalizable framework that leverages two targeted solutions:-
Advantage/KL reweighting. Similarity-aware reweighting to alleviate conflicting updates; and
-
Entropy reward. An entropy-based reward corresponding to reference model to stabilize learning.
With the help of alleviating conflicts between tokens and an entropy reward for stabilizing training, we reduce disruption of the pretrained distribution and mitigate reward hacking, which in turn improves generalization and transfer better to other benchmarks. Experiments across multiple benchmarks show that STAGE consistently improves visual quality, stability, and cross-task generalization compared to baseline GRPO.
Clone the repository:
git clone https://github.com/krennic999/STAGE.git
cd STAGE/srcDownload models at Huggingface:
| Reward Type | Hugging Face Link |
|---|---|
| GenEval | 🤗Huggingface |
| HPS+GiT+Gdino | 🤗Huggingface |
| OCR | 🤗Huggingface |
cd into grpo/src/infer, and use:
python reason_inference.py \
--model_path YOUR_MODEL_CKPT \
--data_path test_data.txt Or run reason_inference_xxx.py for inference on GenEval/HPS/Drawbench/T2I-Compbench/OCR
torchrun --nnodes=1 --nproc_per_node=8 --node_rank=0 --master_port=29500 reason_inference_xxx.py --model_path your-janus-pro-model --save_root your-save-rootInstall GrouningDINO if you want to use Object Detector reward
cd grpo/src/utils/GroundingDINO
pip install -e .Install LLaVA if you want to use ORM reward
cd grpo/src/utils/LLaVA-NeXT
pip install -e ".[train]"Detailed install instructions can be found in T2I-R1
Download corresponding models to your local path, and update your local path to grpo/configs/reward_paths.json:
| Reward Model | Link | Download Command |
|---|---|---|
| HPS | HPS_v2.1_compressed.pt | wget https://huggingface.co/xswu/HPSv2/resolve/main/HPS_v2.1_compressed.pt |
| GIT | git-large-vqav2 | huggingface-cli download microsoft/git-large-vqav2 --repo-type model --local-dir git-large-vqav2 |
| GroundingDINO | groundingdino_swint_ogc.pth | wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth |
| ORM | ORM-T2I-R1 | huggingface-cli download CaraJ/ORM-T2I-R1 --repo-type model --local-dir ORM-T2I-R1 |
| CLIP | open-clip | huggingface-cli download laion/CLIP-ViT-H-14-laion2B-s32B-b79K --repo-type model --local-dir open-clip |
About how to prepare GenEval reward:
Based on Flow-GRPO, we reimplemented an offline version of the GenEval reward:
git clone https://github.com/djghosh13/geneval.git
cd geneval
conda env create -f environment.yml
conda activate geneval
./evaluation/download_models.sh "<OBJECT_DETECTOR_FOLDER>/"
git clone https://github.com/open-mmlab/mmdetection.git
cd mmdetection; git checkout 2.x
pip install -v -e .See GenEval for detailed install instruction.
And replace corresponding PATH in grpo/configs/reward_paths.json with your path:
default_config = "/your-mmdet-code/mmdetection/configs/mask2former/mask2former_swin-s-p4-w7-224_lsj_8x2_50e_coco.py"
default_ckpt = "/your-mmdet-ckpt/mmdetection"
default_clip = "your-clip-ckpt/timm/vit_large_patch14_clip_224.openai/open_clip_pytorch_model.bin"
default_obj = "/your-stage-root/STAGE/src/grpo/src/utils/reward-server/reward_server/object_names.txt"How to prepare OCR reward:
Following Flow-GRPO, install paddle-ocr first, and then set ocr_base_root in grpo/configs/reward_paths.json to your path of pre-downloaded model:
ocr_base_root = "/your-paddleocr-root/paddleocr/whl"For training on different rewards:
cd grpo/src
python run_training_xxx.pycd into grpo/src/infer, and run reason_inference_xxx.py for inference on GenEval/HPS/Drawbench/T2I-Compbench/OCR:
torchrun --nnodes=1 --nproc_per_node=8 --node_rank=0 --master_port=29500 reason_inference_xxx.py --model_path your-janus-pro-model --save_root your-save-rootWe provide useful tool for calculate T2I-Compbench score:
First prepare T2I-Compbench based on the official repo:
cd into into grpo/src/infer, first replace T2I_COMP_CODE_ROOTin ./cal_t2i_compbench_value.sh and use:
./cal_t2i_compbench_value.sh your-save-rootand results can be found in corresponding txt file.
For evaluation on OCR, use:
python cal_ocr_score.py --image_root your-save-rootWe thank T2I-R1 and Flow-GRPO for their great work, upon which our repo is built.
@misc{ma2025stage,
title={STAGE: Stable and Generalizable GRPO for Autoregressive Image Generation},
author={Xiaoxiao Ma and Haibo Qiu and Guohui Zhang and Zhixiong Zeng and Siqi Yang and Lin Ma and Feng Zhao},
year={2025},
eprint={2509.25027},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2509.25027},
}



