Disentangled Noisy Correspondence Learning
Dang, Zhuohang ; Luo, Minnan ; Wang, Jihong ; Jia, Chengyou ; Han, Haochen ; Wan, Herun ; Dai, Guang ; Chang, Xiaojun ; Wang, Jingdong
Dang, Zhuohang
Luo, Minnan
Wang, Jihong
Jia, Chengyou
Han, Haochen
Wan, Herun
Dai, Guang
Chang, Xiaojun
Wang, Jingdong
Supervisor
Department
Computer Vision
Embargo End Date
Type
Journal article
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Cross-modal retrieval is crucial in understanding latent correspondences across modalities. However, existing methods implicitly assume well-matched training data, which is impractical as real-world data inevitably involves imperfect alignments, i.e., noisy correspondences. Although some works explore similarity-based strategies to address such noise, they suffer from sub-optimal similarity predictions influenced by modality-exclusive information (MEI), e.g., background noise in images and abstract definitions in texts. This issue arises as MEI is not shared across modalities, thus aligning it in training can markedly mislead similarity predictions. Moreover, although intuitive, directly applying previous cross-modal disentanglement methods suffers from limited noise tolerance and disentanglement efficacy. Inspired by the robustness of information bottlenecks against noise, we introduce DisNCL, a novel information-theoretic framework for feature Disentanglement in Noisy Correspondence Learning, to adaptively balance the extraction of modality-invariant information (MII) and MEI with certifiable optimal cross-modal disentanglement efficacy. DisNCL then enhances similarity predictions in modality-invariant subspace, thereby greatly boosting similarity-based alleviation strategy for noisy correspondences. Furthermore, DisNCL introduces soft matching targets to model noisy many-to-many relationships inherent in multi-modal inputs for noise-robust and accurate cross-modal alignment. Extensive experiments confirm DisNCL’s efficacy by 2% average recall improvement. Mutual information estimation and visualization results show that DisNCL learns meaningful MII/MEI subspaces, validating our theoretical analyses.
Citation
Z. Dang et al., "Disentangled Noisy Correspondence Learning," in IEEE Transactions on Image Processing, vol. 34, pp. 2602-2615, 2025, doi: 10.1109/TIP.2025.3559457
Source
IEEE Transactions on Image Processing
Conference
Keywords
Cross-modal retrieval, Noisy correspondence, Disentangled representation learning, Information bottleneck
Subjects
Source
Publisher
IEEE
