Multi-semantic depth collaborative multi-modality image fusion network
Zhu, Huayi ; Li, Xuyang ; Chen, Rui ; Liu, Qiao ; Chang, Xiaojun ; Yuan, Di
Zhu, Huayi
Li, Xuyang
Chen, Rui
Liu, Qiao
Chang, Xiaojun
Yuan, Di
Supervisor
Department
Computer Vision
Embargo End Date
Type
Journal article
Date
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Multi-modality image fusion aims to integrate complementary data from diverse sensors to achieve a more comprehensive representation of the imaging scene. Existing methods primarily focus on surface-level interactions between pixel-level features and visual or textual semantics, and do not fully integrate the global consistency between text modalities, image fusion, and downstream tasks. The lack of this deep collaboration between multiple semantics cannot effectively capture multi-modal information and the fine-grained association between fusion and downstream tasks. Therefore, we propose a Multi-Semantic depth collaborative Image Fusion (MSIF) network. Firstly, different from the traditional cascaded network structure, MSIF adopts a dual-task parallel structure that efficiently establishes deep correlations between the fusion and segmentation tasks. Second, MSIF enhances the semantic consistency between fusion and downstream tasks by leveraging deeper and detailed semantic information in text modalities to guide image fusion and segmentation. Additionally, MSIF uses the designed Multi-Semantic Deep Synergistic Module (MSDSM) to deeply integrate text features, pixel-level features and segmentation semantic features, so as to realize the effective fusion of multi-modal information and meet the semantic requirements of downstream tasks. Finally, the combined semantic features are fed into a fusion decoder to yield the final fusion result. Extensive quantitative and qualitative experiments demonstrate that our approach achieves state-of-the-art (SOTA) performance on multimodal datasets.
Citation
H. Zhu, X. Li, R. Chen, Q. Liu, X. Chang, D. Yuan, "Multi-semantic depth collaborative multi-modality image fusion network," Knowledge-Based Systems, vol. 341, pp. 115756-115756, 2026, https://doi.org/10.1016/j.knosys.2026.115756.
Source
Knowledge-Based Systems
Conference
Keywords
46 Information and Computing Sciences, 4605 Data Management and Data Science
Subjects
Source
Publisher
Elsevier
