Iterative Video Segmentation Framework and Benchmark Using Minimal User Annotations
Bermudez Venegas, Ariana Michelle
Bermudez Venegas, Ariana Michelle
Department
Machine Learning
Embargo End Date
Type
Thesis
Date
2023
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
In this study, the feasibility of enhancing user interaction with video object segmentation is explored, aiming to create an industry-driven environment. The primary objective is to examine the baseline performance on common industry edge cases while developing a robust solution capable of saving time for individuals labeling frames from various types of videos. Applications span multiple industries, including special effects with compositing and rotoscoping, realistic augmented reality, sports performance monitoring, and medical analysis for more accurate treatment. The need for a time-saving frame selector is emphasized, and modifications to the baseline XMem \cite{xmem} backbone are investigated. An attention mechanism-based frame selector is proposed to recommend the most suitable images for labeling, achieving a performance 27 times faster than experts and 32 times faster than non-experts. In light of the backbone modification results, potential future directions are suggested. Additionally, a benchmark featuring extreme object segmentation cases that reflect industry challenges is introduced, underscoring the need for model improvements to achieve robust VOS capabilities.
Citation
A.M.B. Venegas, "Iterative Video Segmentation Framework and Benchmark Using Minimal User Annotations", M.S. Thesis, Machine Learning, MBZUAI, Abu Dhabi, UAE, 2023.
