Item

EasyControl: Adding Control to Video Diffusion for Controllable Video Generation and Interpolation

Wang, Cong
Gu, Jiaxi
Hu, Panwen
Dong, Xiao
Guo, Yuanfan
Xu, Hang
Liang, Xiaodan
Supervisor
Department
Computer Vision
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
The diffusion model is widely leveraged for either controllable video generation or video interpolation. As each field has its task-specific problems, it is difficult to merely develop a single model for completing both tasks simultaneously. Moreover, most existing works only support image conditions and necessitate redesigning the model structure to accommodate other types of conditions. Even so, they still face frame flickering issues when using the image as the condition due to the strong alignment of image pixels. To tackle these problems, in this work, we are the first to propose a unified diffusion framework, EasyControl, for both tasks of controllable video generation and interpolation with different types of conditions. The proposed EasyControl introduces a condition adapter to extract the condition features, which is then injected into an interchangeable fundamental text-to-video model to guide the video generation. To alleviate frame flicker problems, we propose a module named VideoInit to integrate the low-frequency band of input condition images, ensuring smoother generation. Experimental results on four benchmarks suggest that our method outperforms the previous methods on each task.
Citation
C. Wang et al., "EasyControl: Adding Control to Video Diffusion for Controllable Video Generation and Interpolation," ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India, 2025, pp. 1-5, doi: 10.1109/ICASSP49660.2025.10889997.
Source
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Conference
Keywords
Interpolation, Adaptation Models, Fluctuations, Signal Processing, Benchmark Testing, Feature Extraction, Diffusion Models, Text To Video, Speech Processing, Faces, Video Generation, Video Interpolation, Controllable Video Generation, Diffusion Model, Computer Vision
Subjects
Source
Publisher
IEEE
Full-text link