Parallel Diffusion Solver via
Residual Dirichlet Policy Optimization

1AGI lab, Westlake University, 2University of Illinois Urbana-Champaign
3Nanyang Technological University, 4Shanghai Jiao Tong University
5University of Science and Technology of China

Our work is an extension of EPD-Solver (ICCV 2025). We also share a wonderful video explanation of EPD-Solver below.

EPD-Solver Teaser

EPD-Solver leverages parallel gradient evaluations to reduce truncation errors, achieving high-fidelity generation at low latency.

Abstract

Diffusion models (DMs) have achieved state-of-the-art generative performance but suffer from high sampling latency due to their sequential denoising nature. Existing solver-based acceleration methods often face significant image quality degradation under a low-latency budget, primarily due to accumulated truncation errors arising from the inability to capture high-curvature trajectory segments.

In this paper, we propose the Ensemble Parallel Direction solver (EPD-Solver), a novel ODE solver that mitigates these errors by incorporating multiple parallel gradient evaluations in each step. Motivated by the geometric insight that sampling trajectories are largely confined to a low-dimensional manifold, EPD-Solver leverages the Mean Value Theorem for vector-valued functions to approximate the integral solution more accurately. Importantly, since the additional gradient computations are independent, they can be fully parallelized, preserving low-latency sampling nature.

We introduce a two-stage optimization framework. Initially, EPD-Solver optimizes a small set of learnable parameters via a distillation-based approach. We further propose a parameter-efficient Reinforcement Learning (RL) fine-tuning scheme that reformulates the solver as a stochastic Dirichlet policy. Unlike traditional methods that fine-tune the massive backbone, our RL approach operates strictly within the low-dimensional solver space, effectively mitigating reward hacking while enhancing performance in complex text-to-image (T2I) generation tasks.

Extensive experiments demonstrate the effectiveness of EPD-Solver. On validation benchmarks, at the same latency level of 5 NFE, the distilled EPD-Solver achieves state-of-the-art FID scores of 4.47 on CIFAR-10, 7.97 on FFHQ, 8.17 on ImageNet, and 8.26 on LSUN Bedroom. On T2I benchmarks, our RL-tuned EPD-Solver significantly improves human preference scores on both Stable Diffusion v1.5 and SD3-Medium. Notably, it outperforms the official 28-step baseline of SD3-Medium with only 20 steps.

Method Overview

Our method consists of two stages: (1) Distillation-Based Parameter Optimization, where we optimize learnable solver parameters to approximate high-precision teacher trajectories; and (2) Residual Dirichlet Policy Optimization, where we reformulate the solver as a stochastic policy and optimize it using RL to align with human preferences.

Method Pipeline

Visual Results

Results on Stable Diffusion v1.5

SD1.5 Results

Results on SD3-Medium (512x512)

SD3 512 Results

Results on SD3-Medium (1024x1024)

SD3 1024 Results

Quantitative Results

Table 1
Table 2
Table 3

BibTeX

@misc{wang2025paralleldiffusionsolverresidual,
      title={Parallel Diffusion Solver via Residual Dirichlet Policy Optimization}, 
      author={Ruoyu Wang and Ziyu Li and Beier Zhu and Liangyu Yuan and Hanwang Zhang and Xun Yang and Xiaojun Chang and Chi Zhang},
      year={2025},
      eprint={2512.22796},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.22796}, 
}
@inproceedings{zhu2025distilling,
      title={Distilling Parallel Gradients for Fast ODE Solvers of Diffusion Models},
      author={Zhu, Beier and  Wang, Ruoyu and Zhao, Tong and Zhang, Hanwang and Zhang, Chi},
      booktitle={International Conference on Computer Vision (ICCV)},
      year={2025}
}