Understanding the Side Effects of Rank-One Knowledge Editing
Takahashi, Ryosuke ; Kamoda, Go ; Heinzerling, Benjamin ; Sakaguchi, Keisuke ; Inui, Kentaro
Takahashi, Ryosuke
Kamoda, Go
Heinzerling, Benjamin
Sakaguchi, Keisuke
Inui, Kentaro
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
This study conducts a detailed analysis of the side effects of rank-one knowledge editing using language models with controlled knowledge. The analysis focuses on each element of knowledge triples (subject, relation, object) and examines two aspects:“knowledge that causes large side effects when edited” and “knowledge that is affected by the side effects.” Our findings suggest that editing knowledge with subjects that have relationships with numerous objects or are robustly embedded within the LM may trigger extensive side effects. Furthermore, we demonstrate that the similarity between relation vectors, the density of object vectors, and the distortion of knowledge representations are closely related to how susceptible knowledge is to editing influences. The findings of this research provide new insights into the mechanisms of side effects in LM knowledge editing and indicate specific directions for developing more effective and reliable knowledge editing methods.
Citation
R. Takahashi, G. Kamoda, B. Heinzerling, K. Sakaguchi, and K. Inui, “Understanding the Side Effects of Rank-One Knowledge Editing,” Proceedings of the 8th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP, pp. 189–205, 2025, doi: 10.18653/V1/2025.BLACKBOXNLP-1.11.
Source
Proceedings of the 8th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP
Conference
8th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP
Keywords
Knowledge Editing, Rank-One Updates, Side Effects Analysis, Large Language Models, Knowledge Triple Manipulation, Representation Distortion, Relation & Object Density, Reliable Model Intervention
Subjects
Source
8th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP
Publisher
Association for Computational Linguistics
