A Survey of Post-Training Scaling in Large Language Models
Lai, Hanyu ; Liu, Xiao ; Gao, Junjie ; Cheng, Jiale ; Qi, Zehan ; Xu, Yifan ; Yao, Shuntian ; Zhang, Dan ; Du, Jinhua ; Hou, Zhenyu ... show 4 more
Lai, Hanyu
Liu, Xiao
Gao, Junjie
Cheng, Jiale
Qi, Zehan
Xu, Yifan
Yao, Shuntian
Zhang, Dan
Du, Jinhua
Hou, Zhenyu
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Large language models (LLMs) have achieved remarkable proficiency in understanding and generating human natural languages, mainly owing to the "scaling law" that optimizes relationships among language modeling loss, model parameters, and pre-trained tokens. However, with the exhaustion of high-quality internet corpora and increasing computational demands, the sustainability of pre-training scaling needs to be addressed. This paper presents a comprehensive survey of post-training scaling, an emergent paradigm aiming to relieve the limitations of traditional pre-training by focusing on the alignment phase, which traditionally accounts for a minor fraction of the total training computation. Our survey categorizes post-training scaling into three key methodologies: Supervised Fine-tuning (SFT), Reinforcement Learning from Feedback (RLxF), and Test-time Compute (TTC). We provide an in-depth analysis of the motivation behind post-training scaling, the scalable variants of these methodologies, and a comparative discussion against traditional approaches. By examining the latest advancements, identifying promising application scenarios, and highlighting unresolved issues, we seek a coherent understanding and map future research trajectories in the landscape of post-training scaling for LLMs.
Citation
H. Lai et al., “A Survey of Post-Training Scaling in Large Language Models,” vol. 1, pp. 2771–2791, Aug. 2025, doi: 10.18653/V1/2025.ACL-LONG.140.
Source
Proceedings of the Annual Meeting of the Association for Computational Linguistics
Conference
63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
Keywords
Post-Training Scaling, Large Language Models, Supervised Fine-Tuning, Reinforcement Learning from Feedback, Test-time Compute, Model Alignment, Scaling Laws, Training Efficiency
Subjects
Source
63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
Publisher
Association for Computational Linguistics
