[ICLR'25] HERO: Human-Feedback-Efficient Reinforcement Learning for Online Diffusion Model Finetuning

This repository officially houses the official PyTorch implementation of the paper titled "HERO: Human-Feedback-Efficient Reinforcement Learning for Online Diffusion Model Finetuning", which is presented at ICLR 2025.

TL;DR: HERO efficiently fintetunes text-to-image diffusion models with minimal online human feedback (<1K) for various tasks.

Requirements

Python 3.10+
PyTorch
Accelerate
Diffusers
WandB
Other dependencies as listed in setup.py.

Setup

Clone the repository
```
git clone <your-repo-url>
cd HERO
```

Install dependencies

pip install -e .
cd rl4dgm
pip install -e .

Training

The main training code is implemented in train_hero.py.

To start training, use the following command:

accelerate launch --num-processes 1 --dynamo_backend no --gpu_ids 1 train_hero.py

--num-processes 1: Run on a single process (single GPU).
--dynamo_backend no: Disables torch dynamo backend.
--gpu_ids 1: Use GPU 1 (change as needed).
train_hero.py: The main training script (make sure this file exists and is configured).

You may need to adjust the arguments or configuration files according to your experiment setup.

Configuration

Training and model parameters are managed via hydra config files. See the HERO/config/hydra_configs for more details.

Logging

Training progress and metrics are logged to Weights & Biases.
Images generated during training are saved to HERO/real_human_ui_images

References

For more details, please refer to the code and comments in ddpo_trainer.py.

Contacts

Ayano Hiranaka: [email protected]
Shang-Fu Chen: [email protected]
Chieh-Hsin Lai: [email protected]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

[ICLR'25] HERO: Human-Feedback-Efficient Reinforcement Learning for Online Diffusion Model Finetuning

Requirements

Setup

Training

Configuration

Logging

References

Contacts

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
config		config
ddpo		ddpo
rl4dgm		rl4dgm
LICENSE.txt		LICENSE.txt
README.md		README.md
ddpo_trainer.py		ddpo_trainer.py
setup.py		setup.py
train_hero.py		train_hero.py

License

sony/hero

Folders and files

Latest commit

History

Repository files navigation

[ICLR'25] HERO: Human-Feedback-Efficient Reinforcement Learning for Online Diffusion Model Finetuning

Requirements

Setup

Training

Configuration

Logging

References

Contacts

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages