Item

System and method for video instance segmentation via recurrent encoder-based transformers

Thawakar, Omkar
Rivkin, Alexandre
Ahissar, Ehud
Khan, Fahad
Supervisor
Department
Computer Vision
Embargo End Date
Type
Patent
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
A method and system for video instance segmentation includes a recurrent encoder-based network trained by knowledge distillation from a transformer encoder. Real time performance is achieved by replacing the transformer encoder with the trained recurrent encoder for inference. The system includes a video camera to capture a sequence of video frames, a machine learning processing engine for video instance segmentation, and a video output for outputting a sequence of mask instances. The machine learning processing engine is configured with an interchangeable encoder module. During inference, the encoder module is configured with a recurrent encoder having a combination of convolutional and recurrent layers, The recurrent layers capture temporal relationships between the video frames. During training, the encoder module is configured with a teacher transformer encoder for training the recurrent encoder as a student through knowledge distillation. A transformer decoder outputs video instance mask predictions.
Citation
Omkar Thawakar, Alexandre Rivkind, Ehud Ahissar, and Fahad Khan, “System and method for video instance segmentation via recurrent encoder-based transformers,” U.S. Patent Application US 2025 0252740 A1, filed Feb. 7 2024, published Aug. 7 2025
Source
US Patent
Conference
Keywords
Subjects
Source
Publisher
Google Patent
DOI
Full-text link