Item

Interpretable Zero-Shot Learning with Locally-Aligned Vision-Language Model

Chen, Shiming
Duan, Bowen
Khan, Salman
Khan, Fahad Shahbaz
Citations
Google Scholar:
Altmetric:
Supervisor
Department
Computer Vision
Embargo End Date
Type
Conference proceeding
Date
License
Language
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Large-scale vision-language models (VLMs), such as CLIP, have achieved remarkable success in zero-shot learning (ZSL) by leveraging large-scale visual-text pair datasets. However, these methods often lack interpretability, as they compute the similarity between an entire query image and the embedded category words, making it difficult to explain their predictions. One approach to address this issue is to develop interpretable models by integrating language, where classifiers are built using discrete attributes, similar to human perception. This introduces a new challenge: how to effectively align local visual features with corresponding attributes based on pre-trained VLMs. To tackle this, we propose LaZSL, a locally-aligned vision-language model for interpretable ZSL. LaZSL employs local visual-semantic alignment via optimal transport to perform interaction between visual regions and their associated attributes, facilitating effective alignment and providing interpretable similarity without the need for additional training. Extensive experiments demonstrate that our method offers several advantages, including enhanced interpretability, improved accuracy, and strong domain generalization. Codes available at: https://github.com/shiming-chen/LaZSL.
Citation
S. Chen, B. Duan, S. Khan, F.S. Khan, "Interpretable Zero-Shot Learning with Locally-Aligned Vision-Language Model," 2026, pp. 478-487.
Source
2025 IEEE/CVF International Conference on Computer Vision (ICCV)
Conference
2025 IEEE/CVF International Conference on Computer Vision (ICCV)
Keywords
46 Information and Computing Sciences, 4611 Machine Learning
Subjects
Source
2025 IEEE/CVF International Conference on Computer Vision (ICCV)
Publisher
IEEE
Full-text link