Constraint-Aware Zero-Shot Vision-Language Navigation in Continuous Environments
Chen, Kehan ; An, Dong ; Huang, Yan ; Xu, Rongtao ; Su, Yifei ; Ling, Yonggen ; Reid, Ian ; Wang, Liang
Chen, Kehan
An, Dong
Huang, Yan
Xu, Rongtao
Su, Yifei
Ling, Yonggen
Reid, Ian
Wang, Liang
Supervisor
Department
Computer Vision
Embargo End Date
Type
Journal article
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
We address the task of Vision-Language Navigation in Continuous Environments (VLN-CE) under the zero-shot setting. Zero-shot VLN-CE is particularly challenging due to the absence of expert demonstrations for training and minimal environment structural prior to guide navigation. To confront these challenges, we propose a Constraint-Aware Navigator (CA-Nav), which reframes zero-shot VLN-CE as a sequential, constraint-aware sub-instruction completion process. CA-Nav continuously translates sub-instructions into navigation plans using two core modules: the Constraint-Aware Sub-instruction Manager (CSM) and the Constraint-Aware Value Mapper (CVM). CSM defines the completion criteria for decomposed sub-instructions as constraints and tracks navigation progress by switching sub-instructions in a constraint-aware manner. CVM, guided by CSM's constraints, generates a value map on the fly and refines it using superpixel clustering to improve navigation stability. CA-Nav achieves the state-of-the-art performance on two VLN-CE benchmarks, surpassing the previous best method by 12% and 13% in Success Rate on the validation unseen splits of R2R-CE and RxR-CE, respectively. Moreover, CA-Nav demonstrates its effectiveness in real-world robot deployments across various indoor scenes and instructions.
Citation
K. Chen et al., "Constraint-Aware Zero-Shot Vision-Language Navigation in Continuous Environments," in IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: 10.1109/TPAMI.2025.3594204
Source
IEEE Transactions on Pattern Analysis and Machine Intelligence
Conference
Keywords
Constraint-Aware, Value Map, Vision-Language Navigation, Zero-Shot
Subjects
Source
Publisher
IEEE
