Official Implementations "Faster Diffusion: Rethinking the Role of the Encoder for Diffusion Model Inference" for LDM (NeurIPS'24)
The official codebase for FasterDiffusion accelerates LDM with ~2.36x speedup.
A suitable conda environment named ldm-faster-diffusion can be created
and activated with:
conda env create -f environment.yaml
conda activate ldm-faster-diffusion
Please follow the instructions in the latent_imagenet_diffusion.ipynb to download the checkpoint (models/ldm/cin256-v2/model.ckpt with ~1.7 GB).
LDM provides a script for sampling from class-conditional ImageNet with Latent Diffusion Models. The sample code is modified from this code, and the 50k sampling results are saved in the same data format as ADM. The evaluation code is obtained from ADM's TensorFlow evaluation suite, and the evaluation environment is already included in the ldm-faster-diffusion environment.
#!/bin/bash
export NCCL_P2P_DISABLE=1
CONFIG_PATH=configs/latent-diffusion/cin256-v2.yaml
MODEL_PATH=models/ldm/cin256-v2/model.ckpt
NUM_GPUS=8
echo 'Class-conditional ldm sampling for ImageNet256x256:'
export OPENAI_LOGDIR=output_ldm_eval
MODEL_FLAGS="--batch_size 16 --num_samples 50000 --classifier_scale 1.5 --ddim_eta 0.0 --tqdm_disable True --use_faster_diffusion False"
mpiexec -n $NUM_GPUS python scripts/image_sample.py $MODEL_FLAGS --config_path $CONFIG_PATH --model_path $MODEL_PATH
python evaluations/evaluator.py evaluations/VIRTUAL_imagenet256_labeled.npz $OPENAI_LOGDIR/samples_50000x256x256x3.npz
echo 'Class-conditional ldm with faster-diffusion sampling for ImageNet256x256:'
export OPENAI_LOGDIR=output_ldm_fdiffusion_eval
MODEL_FLAGS="--batch_size 4 --num_samples 50000 --classifier_scale 6 --ddim_eta 0.0 --tqdm_disable True --use_faster_diffusion True"
mpiexec -n $NUM_GPUS python scripts/image_sample.py $MODEL_FLAGS --config_path $CONFIG_PATH --model_path $MODEL_PATH
python evaluations/evaluator.py evaluations/VIRTUAL_imagenet256_labeled.npz $OPENAI_LOGDIR/samples_50000x256x256x3.npz
| Model | Dataset | Resolution | FID↓ | sFID↓ | IS↑ | Precision↑ | Recall↑ | s/image↓ |
|---|---|---|---|---|---|---|---|---|
| LDM | ImageNet | 256x256 | 3.60 | -- | 247.67 | 0.870 | 0.480 | -- |
| LDM* | ImageNet | 256x256 | 3.39 | 5.14 | 204.57 | 0.825 | 0.534 | 7.951 |
| LDM w/ FasterDiffusion | ImageNet | 256x256 | 4.09 | 5.99 | 207.49 | 0.848 | 0.482 | 3.373 |
* Denotes the reproduced results.
Run the infer_ldm.py to generate images with FasterDiffusion.
@article{li2024faster,
title={Faster diffusion: Rethinking the role of the encoder for diffusion model inference},
author={Li, Senmao and Hu, Taihang and van de Weijer, Joost and Shahbaz Khan, Fahad and Liu, Tao and Li, Linxuan and Yang, Shiqi and Wang, Yaxing and Cheng, Ming-Ming and others},
journal={Advances in Neural Information Processing Systems},
volume={37},
pages={85203--85240},
year={2024}
}
@misc{rombach2021highresolution,
title={High-Resolution Image Synthesis with Latent Diffusion Models},
author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
year={2021},
eprint={2112.10752},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
This codebase is built based on LDM, and references both ADM and MDT code. Thanks very much.
