Item

Method and system for adapting a vision-language machine learning model for image recognition tasks

Khattak, Muhammad Uzair
Wasim, Syed Talal
Naseer, Muzammal
Khan, Salman
Khan, Fahad Shahbaz
Supervisor
Department
Computer Vision
Embargo End Date
Type
Patent
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
A computer-readable storage medium and method for adapting a vision-language (VL) machine learning model from a pre-trained VL model. The pre-trained VL model includes a pre-trained image encoder and a pre-trained textual encoder. The method is applicable to few shot learning and training of learnable prompts. After feature extraction, a machine learning engine determines a cross-entropy loss from prompted image features and prompted textual features. The machine learning engine then determines a self-regularizing consistency loss from the prompted image features, pre-trained image features, the prompted textual features, and pre-trained textual features. The learnable text and image prompts are updated based on the cross-entropy loss in combination with the self-regularizing consistency loss. To preserve generalization for downstream tasks, the updating with the self-regularizing consistency loss is constrained to maximize mutual agreement between the prompted image and textual features and the pre-trained image and textual features.
Citation
“US20250200931A1 - Method and system for adapting a vision-language machine learning model for image recognition tasks - Google Patents.” [Online]. Available: https://patents.google.com/patent/US20250200931A1/en
Source
US Patent. US20250200931A1
Conference
Keywords
Subjects
Source
Publisher
Google Patent
DOI
Full-text link