Loading...
Cross-Modal Conditioned Reconstruction for Language-Guided Medical Image Segmentation
Huang, Xiaoshuang ; Li, Hongxiang ; Cao, Meng ; Chen, Long ; You, Chenyu ; An, Dong
Huang, Xiaoshuang
Li, Hongxiang
Cao, Meng
Chen, Long
You, Chenyu
An, Dong
Supervisor
Department
Computer Vision
Embargo End Date
Type
Journal article
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Recent developments underscore the potential of textual information in enhancing learning models for a deeper understanding of medical visual semantics. However, language-guided medical image segmentation still faces a challenging issue. Previous works employ implicit architectures to embed textual information. This leads to segmentation results that are inconsistent with the semantics represented by the language, sometimes even diverging significantly. To this end, we propose a novel cross-modal conditioned Reconstruction for Language-guided Medical Image Segmentation (RecLMIS) to explicitly capture cross-modal interactions, which assumes that well-aligned medical visual features and medical notes can effectively reconstruct each other. We introduce conditioned interaction to adaptively predict patches and words of interest. Subsequently, they are utilized as conditioning factors for mutual reconstruction to align with regions described in the medical notes. Extensive experiments demonstrate the superiority of our RecLMIS, surpassing LViT by 3.74% mIoU on the MosMedData+ dataset and 1.89% mIoU on the QATA-CoV19 dataset. More importantly, we achieve a relative reduction of 20.2% in parameter count and a 55.5% decrease in computational load. The code will be available at https://github.com/ShawnHuang497/RecLMIS.
Citation
X. Huang, H. Li, M. Cao, L. Chen, C. You and D. An, "Cross-Modal Conditioned Reconstruction for Language-Guided Medical Image Segmentation," in IEEE Transactions on Medical Imaging, vol. 44, no. 4, pp. 1821-1835, April 2025, doi: 10.1109/TMI.2024.3523333
Source
IEEE TRANSACTIONS ON MEDICAL IMAGING
Conference
Keywords
Language-guided segmentation, Medical image segmentation, Vision, Language
Subjects
Source
Publisher
IEEE
