Loading...
Thumbnail Image
Item

Can Language Models Learn Typologically Implausible Languages?

Xu, Tianyang
Kuribayashi, Tatsuki
Oseki, Yohei
Cotterell, Ryan
Warstadt, Alex
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Journal article
Date
License
http://creativecommons.org/licenses/by/4.0/
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Abstract Grammatical features across human languages exhibit intriguing correlations, often attributed to learning biases in humans. Language models (LMs) provide a scalable and naturalistic framework for studying artificial language learning—one not available in human research. We investigate how learnability varies across typologically plausible and implausible languages that closely follow the word order universals identified by linguistic typologists. Our study trains LMs on highly naturalistic counterfactual versions of English (head-initial) and Japanese (head-final). Compared to prior work, our datasets more precisely target the boundary between typological plausibility and implausibility. Our experiments show that LMs learn subtly implausible languages more slowly, though they eventually reach similar performance on some metrics regardless of typological plausibility. These findings suggest that LMs exhibit typologically aligned learning preferences and that certain typological patterns may emerge from general learning biases. https://github.com/sally-xu-42/Typological_Universals.
Citation
T. Xu, T. Kuribayashi, Y. Oseki, R. Cotterell, A. Warstadt, "Can Language Models Learn Typologically Implausible Languages?," Transactions of the Association for Computational Linguistics, vol. 14, pp. 588-611, 2026, https://doi.org/10.1162/tacl.a.640.
Source
Transactions of the Association for Computational Linguistics
Conference
Keywords
47 Language, Communication and Culture, 4704 Linguistics
Subjects
Source
Publisher
MIT Press
Full-text link