[ICLR'25] HERO: Human-Feedback-Efficient Reinforcement Learning for Online Diffusion Model Finetuning
This repository officially houses the official PyTorch implementation of the paper titled "HERO: Human-Feedback-Efficient Reinforcement Learning for Online Diffusion Model Finetuning", which is presented at ICLR 2025.
TL;DR: HERO efficiently fintetunes text-to-image diffusion models with minimal online human feedback (<1K) for various tasks.
- Project Page: https://hero-dm.github.io/
- arXiv: https://arxiv.org/pdf/2410.05116
- OpenReview: https://openreview.net/forum?id=yMHe9SRvxk
- Python 3.10+
- PyTorch
- Accelerate
- Diffusers
- WandB
- Other dependencies as listed in
setup.py.
-
Clone the repository
git clone <your-repo-url> cd HERO
-
Install dependencies
pip install -e . cd rl4dgm pip install -e .
The main training code is implemented in train_hero.py.
To start training, use the following command:
accelerate launch --num-processes 1 --dynamo_backend no --gpu_ids 1 train_hero.py--num-processes 1: Run on a single process (single GPU).--dynamo_backend no: Disables torch dynamo backend.--gpu_ids 1: Use GPU 1 (change as needed).train_hero.py: The main training script (make sure this file exists and is configured).
You may need to adjust the arguments or configuration files according to your experiment setup.
Training and model parameters are managed via hydra config files. See the HERO/config/hydra_configs for more details.
- Training progress and metrics are logged to Weights & Biases.
- Images generated during training are saved to HERO/real_human_ui_images
For more details, please refer to the code and comments in ddpo_trainer.py.
- Ayano Hiranaka: [email protected]
- Shang-Fu Chen: [email protected]
- Chieh-Hsin Lai: [email protected]