Loading...
VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering
Feng, Chun-Mei ; Bai, Yang ; Luo, Tao ; Li, Zhen ; Khan, Salman ; Zuo, Wangmeng ; Goh, Rick Siow Mong ; Liu, Yong
Feng, Chun-Mei
Bai, Yang
Luo, Tao
Li, Zhen
Khan, Salman
Zuo, Wangmeng
Goh, Rick Siow Mong
Liu, Yong
Supervisor
Department
Computer Vision
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Albeit progress has been made in Composed Image Retrieval (CIR), we empirically find that a certain percentage of failure retrieval results are not consistent with their relative captions. To address this issue, this work provides a Visual Question Answering (VQA) perspective to boost the performance of CIR. The resulting VQA4CIR is a post-processing approach and can be directly plugged into existing CIR methods. Given the top-C retrieved images by a CIR method, VQA4CIR aims to decrease the adverse effect of the failure retrieval results being inconsistent with the relative caption. To find the retrieved images inconsistent with the relative caption, we resort to the”QA generation → VQA” self-verification pipeline. For QA generation, we suggest fine-tuning LLM (e.g., LLaMA) to generate several pairs of questions and answers from each relative caption. We then fine-tune LVLM (e.g., LLaVA) to obtain the VQA model. By feeding the retrieved image and question to the VQA model, one can find the images inconsistent with relative caption when the answer by VQA is inconsistent with the answer in the QA pair. Consequently, the CIR performance can be boosted by modifying the ranks of inconsistently retrieved images. Experimental results show that our proposed method outperforms state-of-the-art CIR methods on the CIRR and Fashion-IQ datasets.
Citation
C.-M. Feng et al., “VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 3, pp. 2942–2950, Apr. 2025, doi: 10.1609/AAAI.V39I3.32301.
Source
Proceedings of the AAAI Conference on Artificial Intelligence
Conference
39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025
Keywords
Adverse effect, Fine tuning, Performance, Post-processing, Processing approach, Question Answering, Retrieval methods, Retrieval performance, Retrieved images, State of the art
Subjects
Source
39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025
Publisher
Association for the Advancement of Artificial Intelligence
