Parallel Diffusion Solver via Residual Dirichlet Policy Optimization
Wang, Ruoyu ; Li, Ziyu ; Zhu, Beier ; Yuan, Liangyu ; Zhang, Hanwang ; Yang, Xun ; Chang, Xiaojun ; Zhang, Chi
Wang, Ruoyu
Li, Ziyu
Zhu, Beier
Yuan, Liangyu
Zhang, Hanwang
Yang, Xun
Chang, Xiaojun
Zhang, Chi
Supervisor
Department
Computer Vision
Embargo End Date
Type
Journal article
Date
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Diffusion models (DMs) have achieved state-of-the-art generative performance but suffer from high sampling latency due to their sequential denoising nature. Existing solver-based acceleration methods often face significant image quality degradation under a low-latency budget, primarily due to accumulated truncation errors arising from the inability to capture high-curvature trajectory segments. In this paper, we propose the Ensemble Parallel Direction solver (dubbed as EPD-Solver), a novel ODE solver that mitigates these errors by incorporating multiple parallel gradient evaluations in each step. Motivated by the geometric insight that sampling trajectories are largely confined to a low-dimensional manifold, EPD-Solver leverages the Mean Value Theorem for vector-valued functions to approximate the integral solution more accurately. Importantly, since the additional gradient computations are independent, they can be fully parallelized, preserving low-latency sampling nature. We introduce a two-stage optimization framework. Initially, EPD-Solver optimizes a small set of learnable parameters via a distillation-based approach. We further propose a parameter-efficient Reinforcement Learning (RL) fine-tuning scheme that reformulates the solver as a stochastic Dirichlet policy. Unlike traditional methods that fine-tune the massive backbone, our RL approach operates strictly within the low-dimensional solver space, effectively mitigating reward hacking while enhancing performance in complex text-to-image (T2I) generation tasks. In addition, our method is flexible and can serve as a plugin (EPD-Solverplugin) to improve existing ODE samplers. Extensive experiments demonstrate the effectiveness of EPD-Solver. On validation benchmarks, at the same latency level of 5 NFE, the distilled EPD-Solver achieves state-of-the-art FID scores of 4.47 on CIFAR-10, 7.97 on FFHQ, 8.17 on ImageNet, and 8.26 on LSUN Bedroom, surpassing existing learning-based solvers by a significant margin. On T2I benchmarks, our RL-tuned EPD-Solver significantly improves human preference scores on both Stable Diffusion v1.5 and SD3-Medium. Notably, it outperforms the official 28-step baseline of SD3-Medium with only 20 steps, effectively bridging the gap between inference efficiency and high-fidelity generation.
Citation
R. Wang, Z. Li, B. Zhu, L. Yuan, H. Zhang, X. Yang , et al., "Parallel Diffusion Solver via Residual Dirichlet Policy Optimization," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PP, no. 99, pp. 1-17, 2026, https://doi.org/10.1109/tpami.2026.3692227.
Source
IEEE Transactions on Pattern Analysis and Machine Intelligence
Conference
Keywords
46 Information and Computing Sciences, 4603 Computer Vision and Multimedia Computation, 4611 Machine Learning
Subjects
Source
Publisher
IEEE
