Item

EvolveNav: Empowering LLM-Based Vision-Language Navigation via Self-Improving Embodied Reasoning

Lin, Bingqian
Nie, Yunshuang
Zai, Khun Loun
Wei, Ziming
Han, Mingfei
Xu, Rongtao
Niu, Minzhe
Han, Jianhua
Zhang, Hanwang
Lin, Liang
... show 3 more
Research Projects
Organizational Units
Journal Issue
Abstract
Recent studies have revealed the potential of training open-source Large Language Models (LLMs) to unleash LLMs' reasoning ability for enhancing vision-language navigation (VLN) performance, and simultaneously mitigate the domain gap between LLMs' training corpus and the VLN task. However, these approaches predominantly adopt straightforward input-output mapping paradigms, causing the mapping learning difficult and the navigational decisions unexplainable. Chain-of-Thought (CoT) training is a promising way to improve both navigational decision accuracy and interpretability, while the complexity of the navigation task makes the perfect CoT labels unavailable and may lead to overfitting through pure CoT supervised fine-tuning. To address these issues, we propose EvolveNav, a novel sElf-improving embodied reasoning paradigm that realizes adaptable and generalizable navigational reasoning for boosting LLM-based vision-language Navigation. Specifically, EvolveNav involves a two-stage training process: (1) Formalized CoT Supervised Fine-Tuning, where we train the model with curated formalized CoT labels to first activate the model's navigational reasoning capabilities, and simultaneously increase the reasoning speed; (2) Self-Reflective Post-Training, where the model is iteratively trained with its own reasoning outputs as self-enriched CoT labels to enhance the supervision diversity. A self-reflective auxiliary task is also designed to encourage the model to learn correct reasoning patterns by contrasting with wrong ones. Experimental results under both task-specific and cross-task training paradigms demonstrate the consistent superiority of EvolveNav over previous LLM-based VLN approaches on various popular benchmarks, including R2R, REVERIE, CVDN, and SOON. EvolveNav open avenues for exploring effective self-improving reasoning paradigms, enabling building agents capable of self-evolving for promoting LLM-based embodied AI research.
Citation
B. Lin, Y. Nie, K.L. Zai, Z. Wei, M. Han, R. Xu , et al., "EvolveNav: Empowering LLM-Based Vision-Language Navigation via Self-Improving Embodied Reasoning," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PP, no. 99, pp. 1-15, 2026, https://doi.org/10.1109/tpami.2026.3682514.
Source
IEEE Transactions on Pattern Analysis and Machine Intelligence
Conference
Keywords
46 Information and Computing Sciences, 4602 Artificial Intelligence, 4611 Machine Learning
Subjects
Source
Publisher
IEEE
Full-text link