Item

Video Instance Segmentation Without Using Mask and Identity Supervision

Li, Ge
Cao, Jiale
Sun, Hanqing
Anwer, Rao Muhammad
Xie, Jin
Khan, Fahad
Pang, Yanwei
Supervisor
Department
Computer Vision
Embargo End Date
Type
Journal article
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Video instance segmentation (VIS) is a challenging vision problem in which the task is to simultaneously detect, segment, and track all the object instances in a video. Most existing VIS approaches rely on pixel-level mask supervision within a frame as well as instance-level identity annotation across frames. However, obtaining these 'mask and identity' annotations is time-consuming and expensive. We propose the first mask-identity-free VIS framework that neither utilizes mask annotations nor requires identity supervision. Accordingly, we introduce a query contrast and exchange network (QCEN) comprising instance query contrast and query-exchanged mask learning. The instance query contrast first performs cross-frame instance matching and then conducts query feature contrastive learning. The query-exchanged mask learning exploits both intra-video and inter-video query exchange properties: exchanging queries of an identical instance from different frames within a video results in consistent instance masks, whereas exchanging queries across videos results in all-zero background masks. Extensive experiments on three benchmarks (YouTube-VIS 2019, YouTube-VIS 2021, and OVIS) reveal the merits of the proposed approach, which significantly reduces the performance gap between the identify-free baseline and our mask-identify-free VIS method. On the YouTube-VIS 2019 validation set, our mask-identity-free approach achieves 91.4% of the stronger-supervision-based baseline performance when utilizing the same ImageNet pre-trained model.
Citation
G. Li et al., “Video Instance Segmentation Without Using Mask and Identity Supervision,” IEEE Trans Multimedia, 2024, doi: 10.1109/TMM.2024.3521668.
Source
IEEE Transactions on Multimedia
Conference
Keywords
instance query contrast, mask-identity-free, query exchange, Video instance segmentation
Subjects
Source
Publisher
IEEE
Full-text link