Loading...
Thumbnail Image
Item

Zero-Shot Reasoning with BLIP and SmolLM

Tosheva, Elena
Dimitrov, Dimitar
Koychev, Ivan
Nakov, Preslav
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
This article was developed as part of the ImageCLEF 2025 competition. We adapted the BLIP-Base image-captioning model for the Multimodal Reasoning task, integrating the SmolLM-360M model for question answering and training on the MBZUAI EXAMS-V dataset (16 724 training and 4 208 validation examples). We then conducted a prompt-ablation study using three different templates to evaluate their impact on answer-key accuracy, measured by case-insensitive substring matching against the correct option within the provided set of three to five answers. Finally, we analyzed the distributions of generated caption lengths.
Citation
E. Tosheva, D. Dimitrov, I. Koychev, and P. Nakov, “Zero-Shot Reasoning with BLIP and SmolLM,” CEUR Workshop Proc, 2025, Accessed: Oct. 28, 2025
Source
CEUR Workshop Proceedings
Conference
26th Working Notes of the Conference and Labs of the Evaluation Forum, CLEF 2025
Keywords
Image Captioning, Image CLEF 2025, MultiModal, MultiModal Reasoning
Subjects
Source
26th Working Notes of the Conference and Labs of the Evaluation Forum, CLEF 2025
Publisher
CEUR-WS
DOI
Full-text link