Loading...
Thumbnail Image
Item

A Vision-Language-Action Framework for End-to-End Robotic Manipulation Using Qwen2-VL-Instruct

Kashkash, Mariam
Guizani, Mohsen
Citations
Google Scholar:
Altmetric:
Supervisor
Department
Machine Learning
Embargo End Date
Type
Conference proceeding
Date
License
http://creativecommons.org/licenses/by/4.0/
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Recent advancements in Vision-Language Models (VLMs) have shown strong potential for enabling robots to interpret multimodal inputs for task planning and execution. However, applying these models to long-horizon robotic manipulation remains challenging due to the need for structured perception, semantic planning, and fine-grained action control. This paper introduces a novel Vision-Language-Action (VLA) pipeline based on Qwen2-VL-Instruct, designed to bridge the gap between high-level task understanding and low-level action execution. The proposed approach consists of three core stages: object detection, text Planning, and action generation, where those instructions in the text plan are grounded into precise pick-and-place actions. The model has been fine-tuned using multimodal demonstrations and evaluated on a variety of long-horizon tasks from the Ravens benchmark. Results showed that the model has generalized to unseen task configurations and achieved significant improvements across all evaluation metrics. Specifically, the method surpasses existing baselines—including CLIPORT in the action stage, as well as SayCan and SayCanPay—in the success rate of completing the studied tasks. This highlighted the effectiveness of the fine-tuned Qwen2-VL-Instruct to perform coherent and reliable end-to-end robotic manipulation.
Citation
M. Kashkash, M. Guizani, "A Vision-Language-Action Framework for End-to-End Robotic Manipulation Using Qwen2-VL-Instruct," 2026, pp. 214-219.
Source
Proceedings of the 2025 9th International Conference on Advances in Artificial Intelligence
Conference
Proceedings of the 2025 9th International Conference on Advances in Artificial Intelligence
Keywords
46 Information and Computing Sciences, 4602 Artificial Intelligence, 4605 Data Management and Data Science
Subjects
Source
Proceedings of the 2025 9th International Conference on Advances in Artificial Intelligence
Publisher
Association for Computing Machinery
Full-text link