Item

EXPOTION: FACIAL EXPRESSION AND MOTION CONTROL FOR MULTIMODAL MUSIC GENERATION

Izzati, Fathinah Asma
Li, Xinyue
Xia, Gus
Supervisor
Department
Machine Learning
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
We propose EXPOTION (Facial Expression and Motion Control for Multimodal Music Generation), a generative model leveraging multimodal visual controls—specifically, human facial expressions and upper-body motion—as well as text prompts to produce expressive and temporally accurate music. We adopt parameter-efficient fine-tuning (PEFT) on the pretrained text-to-music generation model, enabling fine-grained adaptation to the multimodal controls using a small dataset. To ensure precise synchronization between video and music, we introduce a temporal smoothing strategy to align multiple modalities. Experiments demonstrate that integrating visual features alongside textual descriptions enhances the overall quality of generated music in terms of musicality, creativity, beat-tempo consistency, temporal alignment with the video, and text adherence, surpassing both proposed baselines and existing state-of-the-art video-to-music generation models. Additionally, we introduce a novel dataset consisting of 7 hours of synchronized video recordings capturing expressive facial and upper-body gestures aligned with corresponding music, providing significant potential for future research in multimodal and interactive music generation. Code, demo and dataset are available at https: //github.com/xinyueli2896/Expotion.git.
Citation
Fathinah Izzati, Xinyue Liand Gus Xia, “Expotion: Facial Expression and Motion Control for Multimodal Music Generation”, in Proceedings of the 26th International Society for Music Information Retrieval Conference, Daejeon, South Korea and Online, Sep. 2025, pp. 368–376. doi: 10.5281/zenodo.17706414.
Source
Proceedings of the International Society for Music Information Retrieval Conference
Conference
26th International Society for Music Information Retrieval Conference (ISMIR 2025)
Keywords
Subjects
Source
26th International Society for Music Information Retrieval Conference (ISMIR 2025)
Publisher
International Society for Music Information Retrieval
Full-text link