Item

Parameter-Efficient Multimodal Adaptation for Certified Robustness of Medical Vision-Language Models

Shamshad, Fahad
Hussein, Noor
Nandakumar, Karthik
Supervisor
Department
Computer Vision
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Medical vision-language models (Med-VLMs) have shown strong generalization in downstream tasks by leveraging large-scale image-text pretraining. However, their vulnerability to adversarial perturbations poses critical risks in safety-sensitive applications. While randomized smoothing provides certifiable guarantees against such perturbations, it requires the model to remain accurate under Gaussian noise—an assumption that fails in practice without specialized adaptation. Prior work, such as PromptSmooth, introduced text-only prompt tuning to improve robustness, but neglected visual adaptation and struggled under high noise levels. In this paper, we propose PromptSmooth++, a multimodal adaptation framework that enhances the certified robustness of frozen Med-VLMs under randomized smoothing. Our method introduces two complementary variants: (i) a few-shot strategy that jointly optimizes visual and textual prompts for noise-aware cross-modal alignment, and (ii) a zero-shot approach that performs test-time vision-side adaptation using lightweight Low-Rank Adapters (LoRA), optimized with a self-supervised entropy loss on noisy inputs. Extensive experiments across multiple Med-VLMs and multiple datasets spanning diverse medical modalities demonstrate that PromptSmooth++ significantly improves certified accuracy over existing baselines while maintaining high clean performance and computational efficiency. Our results show that modality-aware prompting and vision-side adaptation are both essential for certifiably robust medical imaging systems. Code is available at https://github.com/fahadshamshad/multimodal-promptsmooth.
Citation
F. Shamshad, N. Hussein, and K. Nandakumar, “Parameter-Efficient Multimodal Adaptation for Certified Robustness of Medical Vision-Language Models,” pp. 300–315, 2026, doi: 10.1007/978-3-031-98688-8_21.
Source
Annual Conference on Medical Image Understanding and Analysis
Conference
29th Annual Conference on Medical Image Understanding and Analysis
Keywords
Certified Robustness, Medical Vision-Language Models, Parameter-Efficient Adaptation, Randomized Smoothing
Subjects
Source
29th Annual Conference on Medical Image Understanding and Analysis
Publisher
Springer Nature
Full-text link