Item

Beyond Simple Edits: Composed Video Retrieval with Dense Modifications

Thawakar, Omkar
Demidov, Dmitry
Thawkar, Ritesh
Anwer, Rao Muhammad
Shah, Mubarak
Khan, Salman
Khan, Fahad Shahbaz
Citations
Google Scholar:
Altmetric:
Supervisor
Department
Computer Vision
Embargo End Date
Type
Conference proceeding
Date
License
Language
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Composed video retrieval is a challenging task that strives to retrieve a target video based on a query video and a textual description detailing specific modifications. Standard retrieval frameworks typically struggle to handle the complexity of fine-grained compositional queries and variations in temporal understanding limiting their retrieval ability in the fine-grained setting. To address this issue, we introduce a novel dataset that captures both fine-grained and composed actions across diverse video segments, enabling more detailed compositional changes in retrieved video content. The proposed dataset, named Dense-WebVid-CoVR, consists of 1.6 million samples with dense modification text that is around seven times more than its existing counterpart. We further develop a new model that integrates visual and textual information through Cross-Attention (CA) fusion using grounded text encoder, enabling precise alignment between dense query modifications and target videos. The proposed model achieves state-of-the-art results surpassing existing methods on all metrics. Notably, it achieves 71.3% Recall@1 in visual+text setting and outperforms the state-of-the-art by 3.4%, highlighting its efficacy in terms of leveraging detailed video descriptions and dense modification texts. Our proposed dataset, code, and model are available at: https://github.com/OmkarThawakar/BSE-CoVR.
Citation
O. Thawakar, D. Demidov, R. Thawkar, R.M. Anwer, M. Shah, S. Khan , et al., "Beyond Simple Edits: Composed Video Retrieval with Dense Modifications," 2026, pp. 20435-20444.
Source
2025 IEEE/CVF International Conference on Computer Vision (ICCV)
Conference
2025 IEEE/CVF International Conference on Computer Vision (ICCV)
Keywords
46 Information and Computing Sciences, 4603 Computer Vision and Multimedia Computation, 4605 Data Management and Data Science
Subjects
Source
2025 IEEE/CVF International Conference on Computer Vision (ICCV)
Publisher
IEEE
Full-text link