EditAR: Unified Conditional Generation with Autoregressive Models (CVPR 2025)

EditAR: Unified Conditional Generation with Autoregressive Models
JitengMu, Nuno Vasconcelos, Xiaolong Wang
University of California, San Diego

🌿 Introduction

Diffusion models have made significant advances in text-guided synthesis tasks. Recent progress in controllable image generation and editing is largely driven by diffusion-based methods. Although diffusion models perform exceptionally well in specific tasks with tailored designs, establishing a unified model is still challenging. In contrast, autoregressive models inherently feature a unified tokenized representation, which simplifies the creation of a single foundational model for various tasks. In this work, we propose EditAR, a single unified autoregressive framework for a variety of conditional image generation tasks, e.g., image editing, depth-to-image, edge-to-image, segmentation-to-image. The model takes both images and instructions as inputs, and predicts the edited images tokens in a vanilla next-token paradigm. To enhance the text-to-image alignment, we further propose to distill the knowledge from foundation models into the autoregressive modeling process. We evaluate its effectiveness across diverse tasks on established benchmarks, showing competitive performance to various state-of-the-art task-specific methods.

The codebase is implemented using PyTorch 2.2.1 with python 3.10 and tested on Ubuntu 20.04.6 LTS.

🔧 Preparation

Environment Setup. Please follow install.sh to install the packages as shown in requirements.txt. Then you may download all pre-trained checkpoints as instructed below.
Download text encoder model flan-t5-xl and put it as ./pretrained_models/t5-ckpt/flan-t5-xl. Download vqvae model vq_ds16_t2i.pt from LlamaGen and put it as ./pretrained_models/vq_ds16_t2i.pt.
(Required for training) Download pre-trained text-to-image model t2i_XL_stage2_512.pt from LlamaGen and put it as ./pretrained_models/t2i_XL_stage2_512.pt.
(Required for inference) Download our trained model editar_release.pt and put it as ./checkpoints/editar/editar_release.pt.

🚀 Demo

Please run the following script to edit single image. Put the source image and instruction text in the ./examples as demonstrated, then run,

python3 autoregressive/sample/sample_edit_example.py --gpt-ckpt ./checkpoints/editar/editar_release.pt --cfg-scale 3 --seed 83

🚀 Training

Data Preparation. For image editing, download SEED-Data-Edit-Unsplash and PIPE Dataset. For image translation, we follow ControlNet++ to download depth,canny, and Segmentation COCOStuff train set. Then each parquet dataset is then processed using process_data_HF.py by specifying the source path and target path.

The folder ends up looking like,

./data/
  Seedx_Unsplash_HF/
  PIPE_HF/
  MultiGen-20M_depth_HF/
  Captioned_COCOStuff_HF/

We provide an example as shown in train.sh. Please modify train.sh accordingly to run on your system.

🚀 Evaluation

Data Preparation. For image editing, please refer to Direct Inversion to download the PIE-Bench dataset. For image translation, we follow ControlNet++ to download depth,canny, and Segmentation COCOStuff validation set. Then each parquet dataset is then processed using process_data_HF.py by specifying the source path and target path.

The folder ends up looking like,

./data/
  PIE_Bench_Dataset/
  MultiGen-20M_depth_eval_HF/
  Captioned_COCOStuff_eval_HF/

Please replace $TESTSET with one of PIE-bench/depth/canny/conditionsegmentation for evaluation on different benchmark datasets.

python3 autoregressive/sample/sample_edit_folder.py --gpt-ckpt ./checkpoints/editar/editar_release.pt --cfg-scale 3 --testset $TESTSET

Acknowledgement

The implementation is mainly built on top of LlamaGen. We also want to thank the authors from ControlNetPlusPlus, ControlAR, SmartEdit, Dino-v2 for the code release.

License

The majority of this project is licensed under MIT License. Portions of the project are under separate license of referred projects.

BibTeX

@article{mu2025editAR,
  title={EditAR: Unified Conditional Generation with Autoregressive Models},
  author={Mu, Jiteng and Vasconcelos, Nuno and Wang, Xiaolong},
  journal={arXiv preprint arXiv:2501.04699},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
autoregressive		autoregressive
dataset		dataset
examples		examples
feature_encoders		feature_encoders
language		language
tokenizer		tokenizer
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh
process_data_HF.py		process_data_HF.py
requirements.txt		requirements.txt
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EditAR: Unified Conditional Generation with Autoregressive Models (CVPR 2025)

🌿 Introduction

🔧 Preparation

🚀 Demo

🚀 Training

🚀 Evaluation

Acknowledgement

License

BibTeX

About

Uh oh!

Releases

Packages

Languages

License

JitengMu/EditAR

Folders and files

Latest commit

History

Repository files navigation

EditAR: Unified Conditional Generation with Autoregressive Models (CVPR 2025)

🌿 Introduction

🔧 Preparation

🚀 Demo

🚀 Training

🚀 Evaluation

Acknowledgement

License

BibTeX

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages