System and method for modeling local and global spatio-temporal context in video for video recognition
Wasim, Syed Talal ; Khattak, Muhammad Uzair ; Naseer, Muzammal ; Khan, Salman ; Khan, Fahad Shahbaz
Wasim, Syed Talal
Khattak, Muhammad Uzair
Naseer, Muzammal
Khan, Salman
Khan, Fahad Shahbaz
Citations
Altmetric:
Supervisor
Department
Computer Vision
Embargo End Date
Type
Patent
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
A system and a method for modeling local and global spatio-temporal context in a video for video recognition includes obtaining an input feature map and transforming the input feature map using linear functions to generate a spatial feature map and a temporal feature map corresponding to a video. The method further includes generating hierarchical contextual feature maps based on the spatial feature map and the temporal feature map that represent a context of the video at multiple levels of granularity. The method further includes aggregating the hierarchical contextual feature maps based on gating weights to obtain a spatial modulator and a temporal modulator that are representative of an aggregated context across the multiple levels. The method further includes obtaining an output spatio-temporal feature map based on the spatial modulator, the temporal modulator, and a query token associated with the video.
Citation
“US20250232583A1 - System and method for modeling local and global spatio-temporal context in video for video recognition - Google Patents.” [Online]. Available: https://patents.google.com/patent/US20250232583A1/en
Source
US Patent App. 18/411,928, 2025
Conference
Keywords
Subjects
Source
Publisher
Google Patent
