Item

NAT: Enhancing Agent Tuning with Negative Samples

Renxi Wang
Xudong Han
Yixuan Zhang
Timothy Baldwin
Haonan Li
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Interaction trajectories between agents and environments have proven effective in tuning LLMs into task-specific agents. However, constructing these trajectories, especially successful trajectories, is often computationally and time intensive due to the relatively low success rates of even the most advanced LLMs, such as GPT-4 and Claude. Additionally, common training paradigms like supervised fine-tuning (SFT) and reinforcement learning (RL) not only require large volumes of data but also have specific demands regarding the trajectories used. For instance, existing SFT approaches typically utilize only positive examples, limiting their efficiency in low-resource scenarios. To address this, we introduce Negative-Aware Training (NAT), a straightforward yet effective method that leverages both successful and failed trajectories for fine-tuning, maximizing the utility of limited resources. Experimental results demonstrate that NAT consistently surpasses existing methods, including SFT, DPO, and PPO, across various tasks.
Citation
R. Wang, X. Han, Y. Zhang, T. Baldwin, and H. Li, “NAT: Enhancing Agent Tuning with Negative Samples,” 2025, Long Papers. Accessed: May 05, 2025. [Online]. Available: https://aclanthology.org/2025.naacl-long.378/
Source
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies
Conference
NAACL 2025
Keywords
Subjects
Source
NAACL 2025
Publisher
Association for Computational Linguistics
DOI
Full-text link