Item

PgM: Partitioner Guided Modal Learning Framework

Hu, Guimin
Xin, Yi
Hu, Lijie
Zhu, Zhihong
Seifi, Hasti
Supervisor
Department
Machine Learning
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Multimodal learning benefits from multiple modal information, and each learned modal representations can be divided into uni-modal that can be learned from uni-modal training and paired-modal features that can be learned from cross-modal interaction. Building on this perspective, we propose a partitioner-guided modal learning framework, PgM, which consists of the modal partitioner, uni-modal learner, paired-modal learner, and uni-paired modal decoder. Modal partitioner segments the learned modal representation into uni-modal and paired-modal features. Modal learner incorporates two dedicated components for uni-modal and paired-modal learning. Uni-paired modal decoder reconstructs modal representation based on uni-modal and paired-modal features. PgM offers three key benefits: 1) thorough learning of uni-modal and paired-modal features, 2) flexible distribution adjustment for uni-modal and paired-modal representations to suit diverse downstream tasks, and 3) different learning rates across modalities and partitions. Extensive experiments demonstrate the effectiveness of PgM across four multimodal tasks and further highlight its transferability to existing models. Additionally, we visualize the distribution of uni-modal and paired-modal features across modalities and tasks, offering insights into their respective contributions.
Citation
G. Hu, Y. Xin, L. Hu, Z. Zhu, and H. Seifi, “PgM: Partitioner Guided Modal Learning Framework,” Proceedings of the 33rd ACM International Conference on Multimedia, pp. 2362–2371, Oct. 2025, doi: 10.1145/3746027.3754788
Source
MM '25: Proceedings of the 33rd ACM International Conference on Multimedia
Conference
The 33rd ACM International Conference on Multimedia
Keywords
Modal Learning, Modal Laziness, Multimodal Learning Framework, Uni-modal, Paired-modal
Subjects
Source
The 33rd ACM International Conference on Multimedia
Publisher
Association for Computing Machinery
Full-text link