Loading...
MedCutMix: A Data-Centric Approach to Improve Radiology Vision-Language Pre-Training with Disease Awareness
Wang, Sinuo ; Xie, Yutong ; Liu, Yuyuan ; Wu, Qi
Wang, Sinuo
Xie, Yutong
Liu, Yuyuan
Wu, Qi
Author
Supervisor
Department
Others
Embargo End Date
Type
Conference proceeding
Date
License
http://creativecommons.org/licenses/by/4.0/
Language
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Vision-Language Pre-training (VLP) is drawing increasing interest for its ability to minimize manual annotation requirements while enhancing semantic understanding in downstream tasks. However, its reliance on image-text datasets poses challenges due to privacy concerns and the high cost of obtaining paired annotations. Data augmentation emerges as a viable strategy to address this issue, yet existing methods often fall short of capturing the subtle and complex variations in medical data due to limited diversity. To this end, we propose MedCutMix, a novel multi-modal disease-centric data augmentation method. MedCutMix performs diagnostic sentence CutMix within medical reports and establishes the cross-attention between the diagnostic sentence and medical image to guide attentive manifold mix within the imaging modality. Our approach surpasses previous methods across four downstream radiology diagnosis datasets, highlighting its effectiveness in enhancing performance and generalizability in radiology VLP.
Citation
S. Wang, Y. Xie, Y. Liu, Q. Wu, "MedCutMix: A Data-Centric Approach to Improve Radiology Vision-Language Pre-Training with Disease Awareness," 2026, pp. 6291-6295.
Source
Conference
ICASSP 2026 - 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Keywords
46 Information and Computing Sciences, 47 Language, Communication and Culture, 4704 Linguistics, 3 Good Health and Well Being
Subjects
Source
ICASSP 2026 - 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Publisher
IEEE
