Item

System and method for modeling local and global spatio-temporal context in video for video recognition

Wasim, Syed Talal
Khattak, Muhammad Uzair
Naseer, Muzammal
Khan, Salman
Khan, Fahad Shahbaz
Supervisor
Department
Computer Vision
Embargo End Date
Type
Patent
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
A system and a method for modeling local and global spatio-temporal context in a video for video recognition includes obtaining an input feature map and transforming the input feature map using linear functions to generate a spatial feature map and a temporal feature map corresponding to a video. The method further includes generating hierarchical contextual feature maps based on the spatial feature map and the temporal feature map that represent a context of the video at multiple levels of granularity. The method further includes aggregating the hierarchical contextual feature maps based on gating weights to obtain a spatial modulator and a temporal modulator that are representative of an aggregated context across the multiple levels. The method further includes obtaining an output spatio-temporal feature map based on the spatial modulator, the temporal modulator, and a query token associated with the video.
Citation
“US20250232583A1 - System and method for modeling local and global spatio-temporal context in video for video recognition - Google Patents.” [Online]. Available: https://patents.google.com/patent/US20250232583A1/en
Source
US Patent App. 18/411,928, 2025
Conference
Keywords
Subjects
Source
Publisher
Google Patent
DOI
Full-text link