LFSRM: Few-Shot Diagram-Sentence Matching via Local-Feedback Self-Regulating Memory
Zhang, Lingling ; Wu, Wenjun ; Liu, Jun ; Chang, Xiaojun ; Hu, Xin ; Zheng, Yuhui ; Wu, Yaqiang ; Zheng, Qinghua
Zhang, Lingling
Wu, Wenjun
Liu, Jun
Chang, Xiaojun
Hu, Xin
Zheng, Yuhui
Wu, Yaqiang
Zheng, Qinghua
Supervisor
Department
Computer Vision
Embargo End Date
Type
Journal article
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Image-sentence matching that aims to understand the correspondence between vision and language, has achieved significant progress with various deep methods trained under large-scale supervision. Different from natural images taken by camera, diagrams in the textbooks contain more graphic objects, drawings, and natural objects, and the diagram-sentence matching plays an important role in textbook understanding and question answering. However, existing matching models are not suitable for the challenging task between diagrams and sentences, due to the more serious few-shot content and incomplete description problems. In this paper, we propose a novel local-feedback self-regulating memory framework (LFSRM) for diagram-sentence matching. On one hand, LFSRM includes an external memory to store the useful multi-modal information, especially uncommon ones, to overcome the few-shot content problem, where the memory is updated flexibly according to the local-feedback from visual-textual alignment scores. On the other hand, LFSRM designs an attention mechanism on local-level alignment scores and a strengthening factor impacted on sentence-to-diagram matching direction for alleviating the incomplete description problem. Extensive experiments on three datasets show that LFSRM achieves satisfactory results on conventional image-sentence matching, and outperforms SOTA methods on few-shot image/diagram-sentence matching by a large margin. The dataset for diagram-sentence matching called AI2D# and the LFSRM code are opened on Github https://github.com/TeamResearchWork/LFSRM.
Citation
L. Zhang et al., "LFSRM: Few-Shot Diagram-Sentence Matching via Local-Feedback Self-Regulating Memory," in IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: 10.1109/TPAMI.2025.3528723
Source
IEEE Transactions on Pattern Analysis and Machine Intelligence
Conference
Keywords
Training, Visualization, Semantics, H, s, Electronic mail, Attention mechanisms, Pattern matching, Translation, Question answering (information retrieval), Organisms
Subjects
Source
Publisher
IEEE
