UniAdapter: All-in-One Control for Flexible Video Generation
Wang, Cong ; Hu, Panwen ; Zhao, Haoyu ; Guo, Yuanfan ; Gu, Jiaxi ; Dong, Xiao ; Han, Jianhua ; Xu, Hang ; Liang, Xiaodan
Wang, Cong
Hu, Panwen
Zhao, Haoyu
Guo, Yuanfan
Gu, Jiaxi
Dong, Xiao
Han, Jianhua
Xu, Hang
Liang, Xiaodan
Supervisor
Department
Computer Vision
Embargo End Date
Type
Journal article
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Condition-based video generation aims to create video content based on given information that describes specific subjects. However, most existing works can only utilize a single condition to guide the denoising process, thereby limiting their applicability to specific scenarios. Although some works attempt to accommodate multiple conditions within one framework, they often require multiple encoders, leading to inefficiencies in integrating multi-condition features. In this work, we present a framework that, with the support of the proposed Unified Adapter (UniAdapter), enables simultaneous multi-condition control of video generation within a single model. To effectively merge these conditions, we propose a novel Probabilistic Multi-condition Concatenator (PMC) module, which employs a unified encoder to accommodate multiple conditions and concatenate condition features at the pixel level to achieve fine-grained control. Following the PMC module, we employ 2D down-sampling blocks to refine features for injection into the Video Diffusion Model (VDM). Moreover, our UniAdapter is designed to be model-agnostic and compatible with any U-Net-based VDM, offering a versatile solution for improving video generation quality. Experimental results on public benchmarks UCF-101 and MSR-VTT show that our method achieves superior results in both quantitative and qualitative evaluations.
Citation
C. Wang et al., "UniAdapter: All-in-One Control for Flexible Video Generation," in IEEE Transactions on Circuits and Systems for Video Technology, doi: 10.1109/TCSVT.2025.3532495
Source
IEEE Transactions on Circuits and Systems for Video Technology
Conference
Keywords
Diffusion models, Circuits, systems, Adaptation models, Transformers, Text to video, Modeling, Training, Feature extraction, Data models, Computational modeling
Subjects
Source
Publisher
IEEE
