Item

MedROV: Towards Real-Time Open-Vocabulary Detection Across Diverse Medical Imaging Modalities

Sheikh, Tooba Tehreem
Lahoud, Jean
Anwer, Rao Muhammad
Khan, Fahad Shahbaz
Khan, Salman
Cholakkal, Hisham
Citations
Google Scholar:
Altmetric:
Supervisor
Department
Computer Vision
Embargo End Date
Type
Conference proceeding
Date
License
Language
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Traditional object detection models in medical imaging operate within a closed-set paradigm, limiting their ability to detect objects of novel labels. Open-vocabulary object detection (OVOD) addresses this limitation but remains underexplored in medical imaging due to dataset scarcity and weak text-image alignment. To bridge this gap, we introduce MedROV, the first Real-time Open Vocabulary detection model for medical imaging. To enable open-vocabulary learning, we curate a large-scale dataset, Omnis, with 600K detection samples across nine imaging modalities and introduce a pseudo-labeling strategy to handle missing annotations from multi-source datasets. Additionally, we enhance generalization by incorporating knowledge from a large pre-trained foundation model. By leveraging contrastive learning and cross-modal representations, MedROV effectively detects both known and novel structures. Experimental results demonstrate that MedROV outperforms the previous state-of-the-art foundation model for medical image detection with an average absolute improvement of 40 mAP50, and surpasses closed-set detectors by more than 3 mAP50, while running at 70 FPS, setting a new benchmark in medical detection. Our source code, dataset, and trained model are available at MedROV.
Citation
T.T. Sheikh, J. Lahoud, R.M. Anwer, F.S. Khan, S. Khan, H. Cholakkal, "MedROV: Towards Real-Time Open-Vocabulary Detection Across Diverse Medical Imaging Modalities," 2026, pp. 8628-8638.
Source
2026 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
Conference
2026 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
Keywords
46 Information and Computing Sciences, 4603 Computer Vision and Multimedia Computation
Subjects
Source
2026 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
Publisher
IEEE
Full-text link