Training Effective Language Agents: From Instruction Following to Unified Automatic Task Completion
Wang, Renxi
Wang, Renxi
Author
Supervisor
Department
Natural Language Processing
Embargo End Date
30/05/2025
Type
Thesis
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Language agents use large language models (LLMs) to receive environmental observations, make decisions and complete tasks automatically. This thesis investigates and explores the problem of how to train effective language agents that can solve problems with a high success rate. Generally, interfaces that enable language agents to interact with environments can be tools, APIs, or even embodied actions. We start from pure LLMs, a special type of language agent, where the interfaces that they use is a chat box. They use languages to tackle tasks such as answering questions, and interact directly with human users. We first focus on instruction-tuning, which is a critical component for LLM training, and the foundation for agent training. LLMs are shown to follow instructions after train ing with instruction response pairs. We select three types of instructions, representing LLMs’ three critical capabilities: general instruction-following, code generation, and other task-completion capabilities. We find that although these instructions can improve the performance corresponding to their domain, mixing them together may negatively transfer to other domains. Based on instruction-tunning, we extend it into multi-turn inference with tool utilization, and train models with agent-tool interaction trajectories. To mitigate low-resource issues for agent scenarios, we further propose negative-aware agent-tuning, which ap pends positive and negative prefixes and enables LLMs to learn from failed trajectories. Our method is shown to be more effective than simple supervised fine-tuning (SFT) and exploration-based methods, such as Proximal Preference Optimization (PPO) and Direct Preference Optimization (DPO). Finally, we scale up language agents to more available actions that can not fit into LLMs’ context windows. By virtualizing each action into a single token and utilizing three-stage training strategy, we obtain a unified tool retrieval and calling agent through generating tokens. We succeed in integrating 47,000 tools while surpassing previous retrieval-based agent paradigms.
Citation
Renxi Wang, “Training Effective Language Agents: From Instruction Following to Unified Automatic Task Completion,” Master of Science thesis, Natural Language Processing, MBZUAI, 2025.
Source
Conference
Keywords
Large Language Model (LLM), Agent, Training, Task-completion
