We introduce a new algorithm for sampling with novel constraints from pre-trained diffusion models.
Instead of performing gradient descent steps (see DPS), which require expensive backward passes through the denoiser network we propose using Inexact Newton steps, that can be done with just forward passes and produce as good results.
| Masked image | Inpainting using Stable Diffusion (~15s) |
|---|---|
![]() |
![]() |
We provide two implementations of the proposed inexact Newton sampling algorithm for linear and non-linear tasks:
-
In
stable-diffusionwe provide an implementation based on the LDM repository.inpaint.ipynbperforms inpainting on a given image and mask.superres.ipynbperforms super-resolution on an image and a given downsampling rate.style.ipynbgenerates an image from a given caption, following the style provided in the reference image. We utilize the second layer features from a CLIP ViT-B/16 to compare the style between the generated and reference images.
-
In
diffuserswe provide an implementation using the diffusers library.inpaint.ipynbperforms inpainting on a given image and mask.
We have also experimented with rectified flow models (e.g. Instaflow, Stable Diffusion 3). The extension is straightforward and we will be adding a code implementation for such models as well.
In mnist/train_diffusion.ipynb we showcase the comparison between the inexact (
In stable-diffusion/jacobian_exact_vs_gd.ipynb we demonstrate the qualitative differences between the proposed inexact Newton step and gradient descent.
Theoretically, the denoiser Jacobian should be symmetric, making the two update directions equivalent. In practice, we find fundamental differences between the two directions. The inexact Newton direction retains shapes better and shows stronger global coherency.
We provide the code to perform our convergence analysis in stable-diffusion/jacobian_analysis.ipynb. We use the Arnoldi Iteration to compute the eigenvalues of the Jacobian (and its symmetric and skew-symmetric versions).
Using the maximum computed eigenvalue, we test the convergence of different learning rates, showing that our dynamic
In stable-diffusion/superres_vae_newton.ipynb we show an implementation of super-resolution that also avoids backpropagating through the Stable Diffusion decoder using a second Newton approximation in the VAE space.
This is a central idea of our paper ZoomLDM, where backpropagating through the VAE is prohibitive due to memory constraints.
@article{graikos2024fast,
title={Fast constrained sampling in pre-trained diffusion models},
author={Graikos, Alexandros and Jojic, Nebojsa and Samaras, Dimitris},
journal={arXiv preprint arXiv:2410.18804},
year={2024}
}




