Loading...
Thumbnail Image
Item

Rectifying Belief Space via Unlearning to Harness LLMs’ Reasoning

Niwa, Ayana
Kaneko, Masahiro
Inui, Kentaro
Author
Niwa, Ayana
Kaneko, Masahiro
Inui, Kentaro
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Conference proceeding
Date
License
http://creativecommons.org/licenses/by/4.0/
Language
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Large Language Models (LLMs) exhibit sophisticated reasoning yet still generate incorrect answers. We attribute these errors to **Spurious Beliefs**, defined as propositions the model internally considers as true despite being factually false. To reduce reasoning errors, we propose a belief space rectification framework. Our method first identifies the beliefs invoked during inference via an explanation‐based approach with Forward‐Backward Beam Search (FBBS). We subsequently apply unlearning via gradient ascent to suppress spurious beliefs and enhance true ones, thereby effectively rectifying the model’s belief space. Experiments on three QA datasets and three LLMs show that our method significantly reduces erroneous reasoning and improves generalization.
Citation
A. Niwa, M. Kaneko, K. Inui, "Rectifying Belief Space via Unlearning to Harness LLMs’ Reasoning," 2025, pp. 25060-25075.
Source
Proceedings of the Annual Meeting of the Association for Computational Linguistics
Conference
Findings of the Association for Computational Linguistics: ACL 2025
Keywords
Subjects
Source
Findings of the Association for Computational Linguistics: ACL 2025
Publisher
Association for Computational Linguistics (ACL)
Full-text link