Item

Scene Graph Knowledge Enhanced Hashing with Contrastive Learning for Image-Text Retrieval

Meng, Hui
Zhang, Huaxiang
Liu, Li
Zhu, Lei
Chang, Xiaojun
Guo, Xinru
Supervisor
Department
Computer Vision
Embargo End Date
Type
Journal article
Date
License
Language
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Cross-modal hashing retrieval has emerged as a prominent research focus due to its advantages of high efficiency and low storage requirements. Most of the existing works predominantly focus on aligning objects or entities across different modalities, often ignoring the critical semantic relationships between them. However, a thorough understanding of these semantic relationships is pivotal for achieving fine grained cross-modal alignment. To address this issue, we pro pose a novel method called Scene Graph Knowledge Enhanced Hashing (SGKEH) with contrastive learning for cross-modal retrieval. Specifically, SGKEH orchestrates textual scene graphs to generate semantic hard negatives through entity swap and antonymic predicate replacement strategies, thereby simulating structurally plausible yet semantically divergent text variants. These hard negative samples are then used to train a fine tuned CLIP encoder, enhancing its ability to discriminate subtle relational semantics through contrastive learning. Moreover, a Multi-level Attention Fusion (MAF) module is proposed to bridge structural scene graph knowledge with global CLIP features via cross-modal attention, enabling explicit modeling of relation aware semantics. It ultimately facilitates cross-modal semantic alignment through the implementation of semantic consistency constraints. Extensive experiments conducted on the Flickr30K and MS-COCO datasets demonstrate that SGKEH achieves excellent retrieval performance, which verifies the effectiveness and superiority of our method.
Citation
H. Meng, H. Zhang, L. Liu, L. Zhu, X. Chang, X. Guo, "Scene Graph Knowledge Enhanced Hashing with Contrastive Learning for Image-Text Retrieval," IEEE Transactions on Multimedia, vol. PP, no. 99, pp. 1-15, 2026, https://doi.org/10.1109/tmm.2026.3673545.
Source
IEEE Transactions on Multimedia
Conference
Keywords
46 Information and Computing Sciences, 4605 Data Management and Data Science
Subjects
Source
Publisher
IEEE
Full-text link