MedROV: Towards Real-Time Open-Vocabulary Detection Across Diverse Medical Imaging Modalities
Sheikh, Tooba Tehreem ; Lahoud, Jean ; Anwer, Rao Muhammad ; Khan, Fahad Shahbaz ; Khan, Salman ; Cholakkal, Hisham
Sheikh, Tooba Tehreem
Lahoud, Jean
Anwer, Rao Muhammad
Khan, Fahad Shahbaz
Khan, Salman
Cholakkal, Hisham
Supervisor
Department
Computer Vision
Embargo End Date
Type
Conference proceeding
Date
License
Language
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Traditional object detection models in medical imaging operate within a closed-set paradigm, limiting their ability to detect objects of novel labels. Open-vocabulary object detection (OVOD) addresses this limitation but remains underexplored in medical imaging due to dataset scarcity and weak text-image alignment. To bridge this gap, we introduce MedROV, the first Real-time Open Vocabulary detection model for medical imaging. To enable open-vocabulary learning, we curate a large-scale dataset, Omnis, with 600K detection samples across nine imaging modalities and introduce a pseudo-labeling strategy to handle missing annotations from multi-source datasets. Additionally, we enhance generalization by incorporating knowledge from a large pre-trained foundation model. By leveraging contrastive learning and cross-modal representations, MedROV effectively detects both known and novel structures. Experimental results demonstrate that MedROV outperforms the previous state-of-the-art foundation model for medical image detection with an average absolute improvement of 40 mAP50, and surpasses closed-set detectors by more than 3 mAP50, while running at 70 FPS, setting a new benchmark in medical detection. Our source code, dataset, and trained model are available at MedROV.
Citation
T.T. Sheikh, J. Lahoud, R.M. Anwer, F.S. Khan, S. Khan, H. Cholakkal, "MedROV: Towards Real-Time Open-Vocabulary Detection Across Diverse Medical Imaging Modalities," 2026, pp. 8628-8638.
Source
2026 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
Conference
2026 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
Keywords
46 Information and Computing Sciences, 4603 Computer Vision and Multimedia Computation
Subjects
Source
2026 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
Publisher
IEEE
