CoDaPo: Confidence and difficulty-adaptive policy optimization for post-training language models
Zhou, Zhanke ; Lu, Xiangyu ; Cao, Chentao ; Miranda, Brando ; Liu, Tongliang ; Han, Bo ; Koyejo, Sanmi
Zhou, Zhanke
Lu, Xiangyu
Cao, Chentao
Miranda, Brando
Liu, Tongliang
Han, Bo
Koyejo, Sanmi
Supervisor
Department
Machine Learning
Embargo End Date
Type
Poster
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Large language models (LLMs) increasingly rely on reinforcement learning (RL) post-training to improve step-by-step reasoning. Therein, Group Relative Policy Optimization (GRPO) emerges as a prevailing approach that avoids the need for fully supervised traces. However, GRPO can struggle with high-difficulty tasks, overfit to easy problems, and suffer from sensitivity to reward design. To diagnose these weaknesses, we introduce a general analysis framework that maps training trajectories onto an advantage-confidence plane, revealing three critical phenomena: (1) advantage contraction: reward-normalized advantages collapse as accuracy improves; (2) confidence saturation: policies become overconfident even on incorrect outputs; and (3) hierarchical convergence: easy problems are quickly mastered while harder ones lag. Based on these insights, we propose CoDaPO (Confidence- and Difficulty-Adaptive Policy Optimization), an RL algorithm that adopts correctness-based reward and advantage reweighting w.r.t. confidence and difficulty. Experiments on several benchmarks demonstrate that CoDaPO achieves higher reasoning accuracy and better generalization than existing RL approaches.
Citation
Z. Zhou, X. Lu, C. Cao, B. Miranda, T. Liu, B. Han, and S. Koyejo, “CoDaPO: Confidence and Difficulty-Adaptive Policy Optimization for Post-Training Language Models,” poster presented at the 2nd AI for Math Workshop, held in conjunction with the Int. Conf. on Machine Learning (ICML), Vancouver, BC, Canada, Jul. 2025. [Online]. Available: https://icml.cc/virtual/2025/52428
Source
2nd AI for Math Workshop@ ICML 2025
Conference
42 nd International Conference on Machine Learning
Keywords
Subjects
Source
42 nd International Conference on Machine Learning
Publisher
ICML
