Skip to content

JitengMu/EditAR

Repository files navigation

EditAR: Unified Conditional Generation with Autoregressive Models (CVPR 2025)

project page  arXiv 

EditAR: Unified Conditional Generation with Autoregressive Models
JitengMu, Nuno Vasconcelos, Xiaolong Wang
University of California, San Diego

🌿 Introduction

Diffusion models have made significant advances in text-guided synthesis tasks. Recent progress in controllable image generation and editing is largely driven by diffusion-based methods. Although diffusion models perform exceptionally well in specific tasks with tailored designs, establishing a unified model is still challenging. In contrast, autoregressive models inherently feature a unified tokenized representation, which simplifies the creation of a single foundational model for various tasks. In this work, we propose EditAR, a single unified autoregressive framework for a variety of conditional image generation tasks, e.g., image editing, depth-to-image, edge-to-image, segmentation-to-image. The model takes both images and instructions as inputs, and predicts the edited images tokens in a vanilla next-token paradigm. To enhance the text-to-image alignment, we further propose to distill the knowledge from foundation models into the autoregressive modeling process. We evaluate its effectiveness across diverse tasks on established benchmarks, showing competitive performance to various state-of-the-art task-specific methods.

The codebase is implemented using PyTorch 2.2.1 with python 3.10 and tested on Ubuntu 20.04.6 LTS.

🔧 Preparation

  • Environment Setup. Please follow install.sh to install the packages as shown in requirements.txt. Then you may download all pre-trained checkpoints as instructed below.

  • Download text encoder model flan-t5-xl and put it as ./pretrained_models/t5-ckpt/flan-t5-xl. Download vqvae model vq_ds16_t2i.pt from LlamaGen and put it as ./pretrained_models/vq_ds16_t2i.pt.

  • (Required for training) Download pre-trained text-to-image model t2i_XL_stage2_512.pt from LlamaGen and put it as ./pretrained_models/t2i_XL_stage2_512.pt.

  • (Required for inference) Download our trained model editar_release.pt and put it as ./checkpoints/editar/editar_release.pt.

🚀 Demo

Please run the following script to edit single image. Put the source image and instruction text in the ./examples as demonstrated, then run,

python3 autoregressive/sample/sample_edit_example.py --gpt-ckpt ./checkpoints/editar/editar_release.pt --cfg-scale 3 --seed 83

🚀 Training

Data Preparation. For image editing, download SEED-Data-Edit-Unsplash and PIPE Dataset. For image translation, we follow ControlNet++ to download depth,canny, and Segmentation COCOStuff train set. Then each parquet dataset is then processed using process_data_HF.py by specifying the source path and target path.

The folder ends up looking like,

./data/
  Seedx_Unsplash_HF/
  PIPE_HF/
  MultiGen-20M_depth_HF/
  Captioned_COCOStuff_HF/

We provide an example as shown in train.sh. Please modify train.sh accordingly to run on your system.

🚀 Evaluation

Data Preparation. For image editing, please refer to Direct Inversion to download the PIE-Bench dataset. For image translation, we follow ControlNet++ to download depth,canny, and Segmentation COCOStuff validation set. Then each parquet dataset is then processed using process_data_HF.py by specifying the source path and target path.

The folder ends up looking like,

./data/
  PIE_Bench_Dataset/
  MultiGen-20M_depth_eval_HF/
  Captioned_COCOStuff_eval_HF/

Please replace $TESTSET with one of PIE-bench/depth/canny/conditionsegmentation for evaluation on different benchmark datasets.

python3 autoregressive/sample/sample_edit_folder.py --gpt-ckpt ./checkpoints/editar/editar_release.pt --cfg-scale 3 --testset $TESTSET

Acknowledgement

The implementation is mainly built on top of LlamaGen. We also want to thank the authors from ControlNetPlusPlus, ControlAR, SmartEdit, Dino-v2 for the code release.

License

The majority of this project is licensed under MIT License. Portions of the project are under separate license of referred projects.

BibTeX

@article{mu2025editAR,
  title={EditAR: Unified Conditional Generation with Autoregressive Models},
  author={Mu, Jiteng and Vasconcelos, Nuno and Wang, Xiaolong},
  journal={arXiv preprint arXiv:2501.04699},
  year={2025}
}

About

EditAR: Unified Conditional Generation with Autoregressive Models (CVPR 2025)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published