Item

Reflection-Window Decoding: Text Generation with Selective Refinement

Tang, Zeyu
Chen, Zhenhao
Song, Xiangchen
Li, Loka
Deng, Yunlong
Shen, Yifan
Chen, Guangyi
Spirtes, Peter L.
Zhang, Kun
Supervisor
Department
Machine Learning
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
The autoregressive decoding for text generation in large language models (LLMs), while widely used, is inherently suboptimal due to the lack of a built-in mechanism to perform refinement and/or correction of the generated content. In this paper, we consider optimality in terms of the joint probability over the generated response, when jointly considering all tokens at the same time. We theoretically characterize the potential deviation of the autoregressively generated response from its globally optimal counterpart that is of the same length. Our analysis suggests that we need to be cautious when noticeable uncertainty arises during text generation, which may signal the sub-optimality of the generation history. To address the pitfall of autoregressive decoding for text generation, we propose an approach that incorporates a sliding reflection window and a pausing criterion, such that refinement and generation can be carried out interchangeably as the decoding proceeds. Our selective refinement framework strikes a balance between efficiency and optimality, and our extensive experimental results demonstrate the effectiveness of our approach.
Citation
Z. Tang et al., “Reflection-Window Decoding: Text Generation with Selective Refinement,” Oct. 06, 2025, PMLR. [Online]. Available: https://proceedings.mlr.press/v267/tang25a.html
Source
Proceedings of Machine Learning Research
Conference
42nd International Conference on Machine Learning, ICML 2025
Keywords
Subjects
Source
42nd International Conference on Machine Learning, ICML 2025
Publisher
ML Research Press
DOI
Full-text link