GCG-based artificial languages for evaluating inductive biases of neural language models
El-Naggar, Nadine ; Kuribayashi, Tatsuki ; Briscoe, Ted
El-Naggar, Nadine
Kuribayashi, Tatsuki
Briscoe, Ted
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Recent work has investigated whether extant neural language models (LMs) have an inbuilt inductive bias towards the acquisition of attested typologically-frequent grammatical patterns as opposed to infrequent, unattested, or impossible patterns using artificial languages (White and Cotterell, 2021; Kuribayashi et al., 2024). The use of artificial languages facilitates isolation of specific grammatical properties from other factors such as lexical or real-world knowledge, but also risks oversimplification of the problem.In this paper, we examine the use of Generalized Categorial Grammars (GCGs) (Wood, 2014) as a general framework to create artificial languages with a wider range of attested word order patterns, including those where the subject intervenes between verb and object (VSO, OSV) and unbounded dependencies in object relative clauses. In our experiments, we exemplify our approach by extending White and Cotterell (2021) and report some significant differences from existing results.
Citation
N. El-Naggar, T. Kuribayashi, and T. Briscoe, “GCG-Based Artificial Languages for Evaluating Inductive Biases of Neural Language Models,” pp. 540–556, Aug. 2025, doi: 10.18653/V1/2025.CONLL-1.35
Source
Proceedings of the 29th Conference on Computational Natural Language Learning
Conference
29th Conference on Computational Natural Language Learning
Keywords
Artificial Languages, Generalised Categorial Grammar, Inductive Biases, Neural Language Models, Word-Order Variation, Unbounded Dependencies, Controlled Benchmarking, Syntax–Semantics Interface
Subjects
Source
29th Conference on Computational Natural Language Learning
Publisher
Association for Computational Linguistics
