Item

Meta-Learning for Text-Based Object Localization in Robotic Manipulation with DNN and CLIP

Kashkash, Mariam
Guizani, Mohsen
Supervisor
Department
Machine Learning
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
As robotics increasingly intersects with human environments, text-based guidance allows robots to perform tasks flexibly and intuitively, bridging the gap between human intentions and robot actions. However, enabling robots to interpret diverse natural language commands, detect and manipulate a variety of objects, and generalize effectively across different datasets poses significant challenges. This paper presents a novel approach based on meta-learning for robotic manipulation tasks, specifically focusing on pick-and-place operations guided by natural language instructions. Model-Agnostic Meta-Learning (MAML) framework is proposed to enable a robotic manipulator to generalize across varying tasks based on text-based commands. Utilizing a dataset of images paired with text instructions, a MAML-based model with a Deep Neural Network (DNN) and CLIP is proposed to localize objects mentioned in text commands within the images. The application of N-way K-shot learning allows the model to learn new tasks with minimal data efficiently. Experimental results reveal the impact of tuning the inner and outer learning rates, shedding light on their roles in enhancing the model's adaptability and accuracy. The performance of the approach is further validated using table-top and CLEVR datasets, where it achieves low loss values in object localization and task execution, demonstrating its effectiveness in generalizing across diverse tasks in a few meta-steps. The proposed model significantly reduces support and query losses, particularly in high N-way and K-shot configurations, highlighting its ability to learn effectively with minimal data.
Citation
M. Kashkash and M. Guizani, “Meta-Learning for Text-Based Object Localization in Robotic Manipulation with DNN and CLIP,” 2025 3rd International Conference on Mechatronics, Control and Robotics, ICMCR 2025, pp. 12–18, 2025, doi: 10.1109/ICMCR64890.2025.10963286.
Source
2025 3rd International Conference on Mechatronics, Control and Robotics (ICMCR)
Conference
3rd International Conference on Mechatronics, Control and Robotics, ICMCR 2025
Keywords
Meta-Learning, MAML, N-way, K-Shot, Text-Based Localization
Subjects
Source
3rd International Conference on Mechatronics, Control and Robotics, ICMCR 2025
Publisher
IEEE
Full-text link