Item

EvoP: Robust LLM Inference via Evolutionary Pruning

Wu, Shangyu
Du, Hongchao
Xiong, Ying
Chen, Shuai
Kuo, Tei-Wei
Guan, Nan
Xue, Chun Jason
Supervisor
Department
Computer Science
Embargo End Date
Type
Conference proceeding
Date
License
Language
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Large Language Models (LLMs) have achieved remarkable success in natural language processing tasks, but their massive size and computational demands hinder their deployment in resource-constrained environments. Existing model pruning methods address this issue by removing redundant structures (e.g., elements, channels, layers) from the model. However, these methods employ a heuristic pruning strategy, which leads to suboptimal performance. Besides, they also ignore the data characteristics when pruning the model.To overcome these limitations, we propose EvoP, an evolutionary pruning framework for robust LLM inference. EvoP first presents a cluster-based calibration dataset sampling (CCDS) strategy for creating a more diverse calibration dataset. EvoP then introduces an evolutionary pruning pattern searching (EPPS) method to find the optimal pruning pattern. Compared to existing model pruning techniques, EvoP achieves the best performance while maintaining the best efficiency. Experiments across different LLMs and different downstream tasks validate the effectiveness of the proposed EvoP, making it a practical and scalable solution for deploying LLMs in real-world applications.
Citation
Source
Lecture Notes in Computer Science
Conference
CCF International Conference on Natural Language Processing and Chinese Computing
Keywords
46 Information and Computing Sciences, 4605 Data Management and Data Science, 4611 Machine Learning
Subjects
Source
CCF International Conference on Natural Language Processing and Chinese Computing
Publisher
Springer Nature
Full-text link