Transferability of Vision-Language models with Prompt Learning
Khattak, Muhammad Uzair
Khattak, Muhammad Uzair
Author
Supervisor
Department
Computer Vision
Embargo End Date
Type
Thesis
Date
2023
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Second, we explore prompt learning from the perspective of optimization and propose a novel self-regularizing framework that effectively addresses the prompt over-fitting issue. Conventionally, trained using the task-specific objective, i.e., cross-entropy loss, prompts tend to overfit downstream data distributions and find it challenging to capture task-agnostic general features from the frozen CLIP. To address this issue, our work introduces a self-regularization framework for prompting that guides the prompts to optimize for both task-specific and task-agnostic general representations using a three-pronged approach. Specifically, our Prompting with Self-regulating Constraints (PromptSRC) approach comprises the following components: (a) regulating {prompted} representations via mutual agreement maximization with the frozen model, (b) regulating with self-ensemble of prompts over the training trajectory to encode their complementary strengths, and (c) regulating with textual diversity to mitigate sample diversity imbalance with the visual branch. PromptSRC explicitly steers the prompts to learn a representation space that maximizes performance on downstream tasks without compromising CLIP generalization. We perform extensive experiments on 4 image-recognition benchmarks where PromptSRC performs favorably well compared to the existing methods. Our code and models will be made public.
Citation
M.U. Khattak, "Transferability of Vision-Language models with Prompt Learning", M.S. Thesis, Computer Vision, MBZUAI, Abu Dhabi, UAE, 2023.
