Test-Time Adaptation of Vision-Language Models using Prompt Learning
Hassan, Jameel
Hassan, Jameel
Author
Supervisor
Department
Computer Vision
Embargo End Date
2024-01-01
Type
Thesis
Date
2024
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Over the years, visual understanding has been in the context of assigning discrete labels to images. Recently, the computer vision field has seen a drastic shift with the emergence of foundational vision-language models, binding language and vision together. This has also resulted in foundational models such as CLIP with excellent zero-shot recognition and generalization capabilities. Prior works have thus explored different techniques to adapt such foundational vision-language models to downstream tasks, out of which prompt learning has gained significant prominence. The promising zero-shot generalization of vision-language models such as CLIP has led to their adoption using prompt learning for numerous downstream tasks. Previous works have shown test-time prompt tuning using entropy minimization to adapt text prompts for unseen domains. While effective, this overlooks the key cause for performance degradation to unseen domains distribution shift. In this work, we focus on explicitly handling the problem of distribution shift at test time. This is handled by aligning the out-of-distribution (OOD) test sample statistics to the pre-computed statistics of the source data, using prompt tuning. We use a single test sample to adapt multi-modal prompts at test time by minimizing the feature distribution shift to bridge the gap in the test domain. The method is evaluated for domain generalization and cross dataset generalization benchmarks with existing prompt learning methods on zero-shot image classification. Our source code and models are available at https://jameelhassan.github.io/promptalign/.
Citation
J. Hassan, "Test-Time Adaptation of Vision-Language Models using Prompt Learning", M.S. Thesis, Computer Vision, MBZUAI, Abu Dhabi, UAE, 2024
