Unconditional truthfulness: Learning unconditional uncertainty of large language models
Vazhentsev, Artem ; Fadeeva, Ekaterina ; Xing, Rui ; Kuzmin, Gleb ; Lazichny, Ivan ; Panchenko, Alexander ; Nakov, Preslav ; Baldwin, Timothy ; Panov, Maxim ; Shelmanov, Artem
Vazhentsev, Artem
Fadeeva, Ekaterina
Xing, Rui
Kuzmin, Gleb
Lazichny, Ivan
Panchenko, Alexander
Nakov, Preslav
Baldwin, Timothy
Panov, Maxim
Shelmanov, Artem
Supervisor
Department
Machine Learning
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Uncertainty quantification (UQ) has emerged as a promising approach for detecting hallucinations and low-quality output of Large Language Models (LLMs). However, obtaining proper uncertainty scores is complicated by the conditional dependency between the generation steps of an autoregressive LLM, because it is hard to model it explicitly. Here, we propose to learn this dependency from attention-based features. In particular, we train a regression model that leverages LLM attention maps, probabilities on the current generation step, and recurrently computed uncertainty scores from previously generated tokens. To incorporate the recurrent features, we also suggest a two-staged training procedure. Our experimental evaluation on ten datasets and three LLMs shows that the proposed method is highly effective for selective generation, achieving substantial improvements over rivaling unsupervised and supervised approaches.
Citation
A. Vazhentsev et al., “Unconditional Truthfulness: Learning Unconditional Uncertainty of Large Language Models,” Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pp. 35661–35682, 2025, doi: 10.18653/V1/2025.EMNLP-MAIN.1807.
Source
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Conference
2025 Conference on Empirical Methods in Natural Language Processing
Keywords
Uncertainty Quantification, Large Language Models, Attention-based Features, Selective Generation, Conditional Dependency in Autoregressive Models, Regression-based UQ, Hallucination Detection, Two-Stage Training Procedure
Subjects
Source
2025 Conference on Empirical Methods in Natural Language Processing
Publisher
Association for Computational Linguistics
