Item

AgriCLIP: Adapting CLIP for Agriculture and Livestock via Domain-Specialized Cross-Model Alignment

Nawaz, Umair
Muhammad, Awais
Gani, Hanan
Naseer, Muzammal
Khan, Fahad Shahbaz
Khan, Salman
Anwer, Rao
Supervisor
Department
Computer Vision
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Capitalizing on a vast amount of image-text data, large-scale vision-language pre-training has demonstrated remarkable zero-shot capabilities and has been utilized in several applications. However, models trained on general everyday web-crawled data often exhibit suboptimal performance for specialized domains, likely due to domain shift. Recent works have tackled this problem for some domains (e.g., healthcare) by constructing domain-specialized image-text data. However, constructing a dedicated large-scale image-text dataset for sustainable areas of agriculture and livestock is still open to research. Further, this domain desires fine-grained feature learning due to the subtle nature of the downstream tasks (e.g., nutrient deficiency detection and livestock breed classification). To address this, we present AgriCLIP, a vision-language foundational model dedicated to the domain of agriculture and livestock. First, we propose a large-scale dataset named ALive that leverages a customized prompt generation strategy to overcome the scarcity of expert annotations. Our ALive dataset covers crops, livestock, and fishery, with around 600,000 image-text pairs. Second, we propose a training pipeline that integrates both contrastive and self-supervised learning to learn both global semantic and local fine-grained domain-specialized features. Experiments on a diverse set of 20 downstream tasks demonstrate the effectiveness of the AgriCLIP framework, achieving an absolute gain of 9.07% in terms of average zero-shot classification accuracy over the standard CLIP adaptation via domain-specialized ALive dataset. Our ALive dataset and code can be accessible at Github. © 2025 Association for Computational Linguistics.
Citation
U. Nawaz et al., “AgriCLIP: Adapting CLIP for Agriculture and Livestock via Domain-Specialized Cross-Model Alignment,” Proceedings - International Conference on Computational Linguistics, COLING, vol. Part, pp. 9630–9639, Jan. 2025.
Source
31st International Conference on Computational Linguistics, COLING 2025
Conference
Keywords
Algae control, Self-supervised learning, Semantics, Visual languages
Subjects
Source
Publisher
Association for Computational Linguistics
DOI
Full-text link