Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models

Armando Fortes Tianyi Wei Shangchen Zhou Xingang Pan

S-lab, Nanyang Technological University

SIGGRAPH Asia 2025

Bokeh Diffusion enables precise, scene-consistent bokeh transitions in text-to-image diffusion models

🎥 For more visual results, check out our project page.

📮 Update

[2025.09] The model checkpoint and inference code are released.
[2025.08] Bokeh Diffusion is conditionally accepted at SIGGRAPH Asia 2025! 😄🎉
[2025.03] This repo is created.

🚧 TODO

Release Dataset
Release Model Weights
Release Inference Code
Release Training Code

⚙️ Installation

Our environment has been tested on CUDA 12.6.

git clone https://github.com/atfortes/BokehDiffusion.git
cd BokehDiffusion

conda create -n bokehdiffusion -c conda-forge python=3.10
conda activate bokehdiffusion
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126
pip install flash-attn==2.7.4.post1 --no-build-isolation
pip install -r requirements.txt

💡 Quick Start

Unbounded image generation from text and bokeh level input:

python inference_flux.py \
    --prompt "a well-loved book lies forgotten on a park bench beneath a towering tree, its pages gently ruffling in the wind" \
    --bokeh_target 15.0

Grounded image generation for scene-consistency:

python inference_flux.py \
    --prompt "a well-loved book lies forgotten on a park bench beneath a towering tree, its pages gently ruffling in the wind" \
    --bokeh_target 0.0 4.0 8.0 12.0 18.0 28.0 \
    --bokeh_pivot 15.0 \
    --num_grounding_steps 24

Refer to the inference script for further input options (e.g., seed, inference steps, guidance scale).

📑 Citation

If you find our work useful, please cite the following paper:

@article{fortes2025bokeh,
    title     = {Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models},
    author    = {Fortes, Armando and Wei, Tianyi and Zhou, Shangchen and Pan, Xingang},
    journal   = {arXiv preprint arXiv:2503.08434},
    year      = {2025},
}

©️ License

This project is licensed under NTU S-Lab License 1.0. Redistribution and use should follow this license.

🤝 Acknowledgements

We would like to thank the following projects that made this work possible:

Megalith-10M is used as the base dataset for collecting real in-the-wild photographs.
BokehMe provides the synthetic blur rendering engine for generating defocus augmentations.
Depth-Pro is used to estimate metric depth maps.
RMBG v2.0 is used to generate foreground masks.
FLUX & Realistic-Vision & Cyber-Realistic are used as the base models for generating the samples in the paper.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
assets		assets
bokeh_diffusion		bokeh_diffusion
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
inference_flux.py		inference_flux.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models

📮 Update

🚧 TODO

⚙️ Installation

💡 Quick Start

📑 Citation

©️ License

🤝 Acknowledgements

About

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

License

atfortes/BokehDiffusion

Folders and files

Latest commit

History

Repository files navigation

Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models

📮 Update

🚧 TODO

⚙️ Installation

💡 Quick Start

📑 Citation

©️ License

🤝 Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 2

Uh oh!

Languages