Loading...
Thumbnail Image
Item

Weight-based Analysis of Detokenization in Language Models: Understanding the First Stage of Inference Without Inference

Kamoda, Go
Heinzerling, Benjamin
Inaba, Tatsuro
Kudo, Keito
Sakaguchi, Keisuke
Inui, Kentaro
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Conference proceeding
Date
License
http://creativecommons.org/licenses/by/4.0/
Language
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
According to the stages-of-inference hypothesis, early layers of language models map their subword-tokenized input, which does not necessarily correspond to a linguistically meaningful segmentation, to more meaningful representations that form the model’s “inner vocabulary”.Prior analysis of this *detokenization* stage has predominantly relied on probing and interventions such as path patching, which involve selecting particular inputs, choosing a subset of components that will be patched, and then observing changes in model behavior.Here, we show that several important aspects of the detokenization stage can be understood purely by analyzing model weights, without performing any model inference steps.Specifically, we introduce an analytical decomposition of first-layer attention in GPT-2.Our decomposition yields interpretable terms that quantify the relative contributions of position-related, token-related, and mixed effects.By focusing on terms in this decomposition, we discover weight-based explanations of attention bias toward close tokens and attention for detokenization.
Citation
G. Kamoda, B. Heinzerling, T. Inaba, K. Kudo, K. Sakaguchi, K. Inui, "Weight-based Analysis of Detokenization in Language Models: Understanding the First Stage of Inference Without Inference," 2025, pp. 6324-6343.
Source
Proceedings of the Conference Findings, NAACL 2025
Conference
Findings of the Association for Computational Linguistics: NAACL 2025
Keywords
46 Information and Computing Sciences, 47 Language, Communication and Culture, 4704 Linguistics
Subjects
Source
Findings of the Association for Computational Linguistics: NAACL 2025
Publisher
Association for Computational Linguistics
Full-text link