Item

Towards Robust Multimodal Domain Generalization via Modality-Domain Joint Adversarial Training

Li, Hongzhao
Wan, Hualei
Zhang, Liangzhi
Jiu, Mingyuan
Li, Shupan
Xu, Mingliang
Khan, Muhammad Haris
Supervisor
Department
Computer Vision
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Multimodal Domain Generalization (MMDG) aims to enhance the robustness of multimodal models against distribution shifts in unseen target domains. Unlike unimodal domain generalization methods, which primarily focus on mitigating domain bias within individual modalities, MMDG faces unique challenges, notably modality heterogeneity (divergent feature spaces) and stability discrepancy (varying sensitivity to domain shifts). To tackle these challenges, we propose Modality-Domain Joint Adversarial Training, a unified framework that addresses these challenges through two key innovations: (1) a tri-discriminator adversarial module that mitigates domain biases in both modality-specific and multimodal representations, while suppressing modality-heterogeneous patterns in the representation space; and (2) a stability-aware dynamic weighting mechanism that adaptively balances modality contributions based on cross-domain stability, reducing reliance on unstable modalities. Additionally, we provide the first theoretical error bound for MMDG, offering a theoretical foundation that supports the effectiveness of our approach. Our approach achieves state-of-the-art performance on the EPIC-Kitchens and HAC datasets while using 75.2% fewer parameters than previous MMDG methods. The source code is available at https://github.com/lihongzhao99/MMDG-Joint-Adversarial-Training.
Citation
H. Li et al., “Towards Robust Multimodal Domain Generalization via Modality-Domain Joint Adversarial Training,” Proceedings of the 33rd ACM International Conference on Multimedia, pp. 180–188, Oct. 2025, doi: 10.1145/3746027.3754954
Source
MSMA '25: Proceedings of the 1st International Workshop on Multi-Sensorial Media and Applications
Conference
The 33rd ACM International Conference on Multimedia
Keywords
Multimodal, Adversarial Training, Domain Generalization
Subjects
Source
The 33rd ACM International Conference on Multimedia
Publisher
Association for Computing Machinery
Full-text link