GraspMamba: A Mamba-based Language-driven Grasp Detection Framework with Hierarchical Feature Learning
Nguyen, Huy Hoang ; Vuong, An ; Nguyen, Anh ; Reid, Ian ; Vu, Minh Nhat
Nguyen, Huy Hoang
Vuong, An
Nguyen, Anh
Reid, Ian
Vu, Minh Nhat
Supervisor
Department
Computer Vision
Embargo End Date
Type
Conference proceeding
Date
License
Language
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Grasp detection is a fundamental robotic task critical to the success of many industrial applications. However, current language-driven models for this task often struggle with cluttered images, lengthy textual descriptions, or slow inference speed. We introduce GraspMamba, a new language-driven grasp detection method that employs hierarchical feature fusion with Mamba vision to tackle these challenges. By leveraging rich visual features of the Mamba-based backbone alongside textual information, our approach effectively enhances the fusion of multimodal features. GraspMamba represents the first Mamba-based grasp detection model to extract vision and language features at multiple scales, delivering robust performance and rapid inference time. Intensive experiments show that GraspMamba outperforms recent methods by a clear margin. We validate our approach through real-world robotic experiments, highlighting its fast inference speed.
Citation
H.H. Nguyen, A. Vuong, A. Nguyen, I. Reid, M.N. Vu, "GraspMamba: A Mamba-based Language-driven Grasp Detection Framework with Hierarchical Feature Learning," 2025, pp. 15808-15815.
Source
2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Conference
International Conference on Intelligent Robots and Systems (IROS)
Keywords
46 Information and Computing Sciences, 4605 Data Management and Data Science
Subjects
Source
International Conference on Intelligent Robots and Systems (IROS)
Publisher
IEEE
