Loading...
Under the Shadow of Babel: How Language Shapes Reasoning in LLMs
Wang, Chenxi ; Zhang, Yixuan ; Gao, Lang ; Xu, Zixiang ; Song, Zirui ; Wang, Yanbo ; Chen, Xiuying
Wang, Chenxi
Zhang, Yixuan
Gao, Lang
Xu, Zixiang
Song, Zirui
Wang, Yanbo
Chen, Xiuying
Files
Supervisor
Department
Machine Learning
Embargo End Date
Type
Conference proceeding
Date
License
http://creativecommons.org/licenses/by/4.0/
Language
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Language is not only a tool for communication but also a medium for human cognition and reasoning. If, as linguistic relativity suggests, the structure of language shapes cognitive patterns, then large language models (LLMs) trained on human language may also internalize the habitual logical structures embedded in different languages. To examine this hypothesis, we introduce BICAUSE, a structured bilingual dataset for causal reasoning, which includes semantically aligned Chinese and English samples in both forward and reversed causal forms. Our study reveals three key findings: (1) LLMs exhibit typologically aligned attention patterns, focusing more on causes and sentence-initial connectives in Chinese, while showing a more balanced distribution in English. (2) Models internalize language-specific preferences for causal components order and often rigidly apply them to atypical inputs, leading to degraded performance, especially in Chinese. (3) When causal reasoning succeeds, model representations converge toward semantically aligned abstractions across languages, indicating a shared understanding beyond surface form. Overall, these results suggest that LLMs not only mimic surface linguistic forms but also internalize the reasoning biases shaped by language. Rooted in cognitive linguistic theory, this phenomenon is for the first time empirically verified through structural analysis of model internals.
Citation
C. Wang, Y. Zhang, L. Gao, Z. Xu, Z. Song, Y. Wang, X. Chen, "Under the Shadow of Babel: How Language Shapes Reasoning in LLMs," 2025, pp. 24327-24344.
Source
Proceedings of the EMNLP 2025 - Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2025
Conference
Findings of the Association for Computational Linguistics: EMNLP 2025
Keywords
Subjects
Source
Findings of the Association for Computational Linguistics: EMNLP 2025
Publisher
Association for Computational Linguistics
