Item

Semi-Automatic Stable Surgical Instrument Segmentation in Cataract Surgery Images via Point Prompts

Zhaksylyk, Nuren
Department
Computer Vision
Embargo End Date
2027-05-30
Type
Thesis
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Accurate segmentation of surgical instruments in cataract surgery is crucial for clinical applications such as skill assessment, surgical guidance, and workflow optimization. Despite the transformative impact of deep learning on computer vision, medical image segmentation remains challenging due to the scarcity of high-quality annotated data. Recent advances with foundational models such as SAM and SAM2 have shown promise by leveraging large-scale datasets for general segmentation tasks. However, in specialized fields like surgical instrument segmentation, the direct application of these models is limited by their dependence on precise prompt positioning. Point-based prompts, although offering spatially informative cues, tend to generate inconsistent segmentation outputs when their placement varies. In response to this issue, we introduce RP-SAM2, a robust extension of SAM2 designed to enhance segmentation stability and accuracy, even under the variability of point prompt locations. RP-SAM2 incorporates a novel shift block that systematically adjusts the influence of point prompts and a compound loss function that enforces segmentation consistency. This dual approach minimizes the variance typically seen with different prompt placements, thereby stabilizing the segmentation output. The enhanced model is particularly beneficial in zero-shot annotation scenarios, where the limited availability of annotated data necessitates the automated generation of pseudo masks for further fine-tuning and downstream applications. Our experimental evaluation on the Cataract1k dataset shows that RP-SAM2 outperforms the baseline SAM2 model, with a 2% gain in mean Dice Similarity Coefficient (mDSC) and a 21.36% reduction in mean Hausdorff Distance at the 95th percentile (mHD95). These metrics indicate not only improved accuracy but also increased robustness, as the segmentation results exhibit less variability across different point prompt positions. Additionally, experiments on the CaDIS dataset reveal that pseudo masks generated by RP-SAM2 can effectively fine-tune SAM2’s mask decoder, achieving superior segmentation outcomes compared to those generated by the unmodified SAM2. Complementing these technical contributions, we developed a user-friendly Graphical User Interface (GUI) that assists annotators in selecting optimal point prompt placements. This interface provides real-time guidance, reducing the need for precise manual input while ensuring that the best possible segmentation masks are generated.
Citation
Nuren Zhaksylyk, “Semi-Automatic Stable Surgical Instrument Segmentation in Cataract Surgery Images via Point Prompts,” Master of Science thesis, Computer Vision, MBZUAI, 2025.
Source
Conference
Keywords
Foundation Models, Interactive Segmentation, Surgical Instruments, Point Prompts
Subjects
Source
Publisher
DOI
Additional links
Full-text link