Item

Does Vision Accelerate Hierarchical Generalization in Neural Language Learners?

Kuribayashi, Tatsuki
Baldwin, Timothy
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Neural language models (LMs) are arguably less data-efficient than humans from a language acquisition perspective. One fundamental question is why this human-LM gap arises. This study explores the advantage of grounded language acquisition, specifically the impact of visual information - which humans can usually rely on but LMs largely do not have access to during language acquisition - on syntactic generalization in LMs. Our experiments, following the poverty of stimulus paradigm under two scenarios (using artificial vs. naturalistic images), demonstrate that if the alignments between the linguistic and visual components are clear in the input, access to vision data does help with the syntactic generalization of LMs, but if not, visual input does not help. This highlights the need for additional biases or signals, such as mutual gaze, to enhance cross-modal alignment and enable efficient syntactic generalization in multimodal LMs. © 2025 Association for Computational Linguistics.
Citation
T. Kuribayashi and T. B. Mbzuai, “Does Vision Accelerate Hierarchical Generalization in Neural Language Learners?,” 2025. [Online]. Available: https://aclanthology.org/2025.coling-main.127/
Source
Proceedings - International Conference on Computational Linguistics, COLING
Conference
Keywords
Multimodal language acquisition, Syntactic generalization, Neural language models, Visual grounding, Cross-modal alignment
Subjects
Source
Publisher
Association for Computational Linguistics
DOI
Full-text link