Loading...
Thumbnail Image
Item

AG-CLIP: Attribute-guided CLIP for Zero-shot Fine-grained Recognition

Ahmad, Jamil
Khan, Mustaqeem
Guiaeab, Wail
Elsaddik, Abdulmotaleb
De Masi, Giulia
Karray, Fakhri
Research Projects
Organizational Units
Journal Issue
Abstract
Zero-shot fine-grained recognition is challenging due to high visual similarities between classes and the inferior encoding of fine-grained features in embedding models. In this work, we present an attribute-guided Contrastive Language-Image Pre-training (AG-CLIP) model with an additional attribute encoder. Our approach first identifies relevant visual attributes from the textual class descriptions using an attribute mining module leveraging a large language model (LLM) GPT-4o. The attributes are then used to construct prompts for an open vocabulary object/region detector to extract relevant corresponding image regions. The attribute text, along with focused regions of the input, then guides the CLIP model to focus on these discriminative attributes during fine-tuning through a context-attribute fusion module. Our attribute-guided attention mechanism allows CLIP to effectively disambiguate fine-grained classes by highlighting their distinctive attributes without requiring fine-tuning or additional training data on unseen classes. We evaluate our approach on the CUB-200-2011 and plant disease datasets, achieving 73.3% and 84.6% accuracy, respectively. Our method achieves state-of-the-art zero-shot performance, outperforming prior methods that rely on external knowledge bases or complex meta-learning strategies. The strong results demonstrate the effectiveness of injecting generic attribute awareness into powerful vision-language models like CLIP for tackling fine-grained recognition in a zero-shot manner.
Citation
J. Ahmad, M. Khan, W. Guiaeab, A. Elsaddik, G. De Masi, F. Karray, "AG-CLIP: Attribute-guided CLIP for Zero-shot Fine-grained Recognition," <i>IEEE Open Journal of the Computer Society</i>, vol. 7, no. 99, pp. 365-375, 2026, https://doi.org/10.1109/ojcs.2026.3654171.
Source
IEEE Open Journal of the Computer Society
Conference
Keywords
46 Information and Computing Sciences, 4605 Data Management and Data Science
Subjects
Source
Publisher
IEEE
Full-text link