Chameleon: A Multimodal Learning Framework Robust to Missing Modalities

Muhammad Zaigham ZaheerKarthik NandakumarMuhammad Haris Khan2025-06-262025-06-26202502/06/2025M. I. Liaqat et al., “Chameleon: A Multimodal Learning Framework Robust to Missing Modalities,” Int J Multimed Inf Retr, vol. 14, no. 2, pp. 1–14, Jun. 2025, doi: 10.1007/S13735-025-003702192-661110.1007/s13735-025-00370-yhttps://hdl.handle.net/20.500.14634/984Multimodal learning has demonstrated remarkable performance improvements over unimodal architectures However, multimodal learning methods often exhibit deteriorated performances if one or more modalities are missing This may be attributed to the commonly used multi-branch design containing modality-specific components, making such approaches reliant on the availability of a complete set of modalities In this work, we propose a robust multimodal learning framework, Chameleon, that adapts a common-space visual learning network to align all input modalities To enable this, we present the unification of input modalities into one format by encoding any non-visual modality into visual representations thus making it robust to missing modalities Extensive experiments are performed on multimodal classification task using four textual-visual (Hateful Memes, UPMC Food-101, MM-IMDb, and Ferramenta) and two audio-visual (avMNIST, VoxCeleb) datasets Chameleon not only achieves superior performance when all modalities are present at train/test time but also demonstrates notable resilience in the case of missing modalities Â© The Author(s) 2025EnglishMultimodal learningChameleon: A Multimodal Learning Framework Robust to Missing Modalities1214Journal14Journal articleInternational Journal of Multimedia Information Retrieval