Armando Fortes Tianyi Wei Shangchen Zhou Xingang Pan
S-lab, Nanyang Technological University
SIGGRAPH Asia 2025
Bokeh Diffusion enables precise, scene-consistent bokeh transitions in text-to-image diffusion models
🎥 For more visual results, check out our project page.
- [2025.09] The model checkpoint and inference code are released.
- [2025.08] Bokeh Diffusion is conditionally accepted at SIGGRAPH Asia 2025! 😄🎉
- [2025.03] This repo is created.
- Release Dataset
- Release Model Weights
- Release Inference Code
- Release Training Code
Our environment has been tested on CUDA 12.6.
git clone https://github.com/atfortes/BokehDiffusion.git
cd BokehDiffusion
conda create -n bokehdiffusion -c conda-forge python=3.10
conda activate bokehdiffusion
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126
pip install flash-attn==2.7.4.post1 --no-build-isolation
pip install -r requirements.txt
Unbounded image generation from text and bokeh level input:
python inference_flux.py \
--prompt "a well-loved book lies forgotten on a park bench beneath a towering tree, its pages gently ruffling in the wind" \
--bokeh_target 15.0
Grounded image generation for scene-consistency:
python inference_flux.py \
--prompt "a well-loved book lies forgotten on a park bench beneath a towering tree, its pages gently ruffling in the wind" \
--bokeh_target 0.0 4.0 8.0 12.0 18.0 28.0 \
--bokeh_pivot 15.0 \
--num_grounding_steps 24
Refer to the inference script for further input options (e.g., seed, inference steps, guidance scale).
If you find our work useful, please cite the following paper:
@article{fortes2025bokeh,
title = {Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models},
author = {Fortes, Armando and Wei, Tianyi and Zhou, Shangchen and Pan, Xingang},
journal = {arXiv preprint arXiv:2503.08434},
year = {2025},
}This project is licensed under NTU S-Lab License 1.0. Redistribution and use should follow this license.
We would like to thank the following projects that made this work possible:
- Megalith-10M is used as the base dataset for collecting real in-the-wild photographs.
- BokehMe provides the synthetic blur rendering engine for generating defocus augmentations.
- Depth-Pro is used to estimate metric depth maps.
- RMBG v2.0 is used to generate foreground masks.
- FLUX & Realistic-Vision & Cyber-Realistic are used as the base models for generating the samples in the paper.
