Item

Video Spatial Reasoning with Object-Centric 3D Rollout

Tang, Haoran
Cao, Meng
Liu, Ruyang
Liang, Xiaoxi
Li, Linglong
Li, Ge
Liang, Xiaodan
Supervisor
Department
Computer Vision
Embargo End Date
Type
Conference proceeding
Date
License
Language
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Recent advances in Multi-modal Large Language Models (MLLMs) have showcased remarkable capabilities in vision-language understanding. However, enabling robust video spatial reasoning—the ability to comprehend object locations, orientations, and inter-object relationships in dynamic 3D scenes—remains a key unsolved challenge. Existing approaches primarily rely on spatially grounded supervised fine-tuning or reinforcement learning, yet we observe that such models often exhibit query-locked reasoning, focusing narrowly on objects explicitly mentioned in the prompt while ignoring critical contextual cues. To address this limitation, we propose Object-Centric 3D Rollout (OCR), a novel strategy that introduces structured perturbations to the 3D geometry of selected objects during training. By degrading object-specific visual cues and projecting the altered geometry into 2D space, OCR compels the model to reason holistically across the entire scene. We further design a rollout-based training pipeline that jointly leverages vanilla and region-noisy videos to optimize spatial reasoning trajectories. Experiments demonstrate state-of-the-art performance: our 3B-parameter model achieves 47.5% accuracy on VSI-Bench, outperforming several 7B baselines. Ablations confirm OCR’s superiority over prior rollout strategies (e.g., T-GRPO, NoisyRollout).
Citation
H. Tang, M. Cao, R. Liu, X. Liang, L. Li, G. Li , et al., "Video Spatial Reasoning with Object-Centric 3D Rollout," 2026, pp. 9395-9403.
Source
Proceedings of the AAAI Conference on Artificial Intelligence
Conference
The Fortieth AAAI Conference on Artificial Intelligence
Keywords
46 Information and Computing Sciences, 4602 Artificial Intelligence
Subjects
Source
The Fortieth AAAI Conference on Artificial Intelligence
Publisher
Association for the Advancement of Artificial Intelligence
Full-text link