Item

AgroGPT : Efficient Agricultural Vision-Language Model with Expert Tuning

Alharthi, Ali Husain Salem Abdulla
Department
Computer Vision
Embargo End Date
Type
Thesis
Date
2024
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Significant progress has been made in advancing multimodal conversational models, capitalizing on vast repositories of image-text data available online. Despite this progress, these models often encounter substantial domain gaps, hindering their ability to engage in complex conversations across new domains. Recent efforts have aimed to mitigate this issue, albeit relying on domain-specific image-text data for instruction tuning. However, many domains, such as agriculture, lack such vision-language data. In this work, we pro pose a novel approach to construct instruction-tuning data that harnesses vision-only data for the agriculture domain. We employed diverse agricultural datasets spanning multiple subdomains, curated class-specific information, and used large language models (LLMs) to construct an instruction-tuning set. This resulted in a 70k agricultural instruction-tuning dataset named AgroInstruct. Subsequently, we expert-tuned and created AgroGPT, an LLaVA-Phi-based LMM that can hold complex agriculture-related conversations and provide useful insights. We compare AgroGPT s performance with large open-source and closed-source models. Our AgroGPT excels at standard agricultural tasks, can act as an agricultural expert, and provides helpful information for agriculture-related questions. By fine-tuning with a bilingual dataset constructed by translating the original data, we produced a version of the model capable of conversing on agricultural topics in both Arabic and English.
Citation
A. H. Alharthi, "AgroGPT : Efficient Agricultural Vision-Language Model with Expert Tuning", M.S. Thesis, Computer Vision, MBZUAI, Abu Dhabi, UAE, 2024
Source
Conference
Keywords
Subjects
Source
Publisher
DOI
Full-text link