Extracting General-use Transformers for Low-resource Languages via Knowledge Distillation
Cruz, Jan Christian Blaise ; Aji, Alham Fikri
Cruz, Jan Christian Blaise
Aji, Alham Fikri
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Workshop
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
In this paper, we propose the use of simple knowledge distillation to produce smaller and more efficient single-language transformers from Massively Multilingual Transformers (MMTs) to alleviate tradeoffs associated with the use of such in low-resource settings. Using Tagalog as a case study, we show that these smaller single-language models perform on-par with strong baselines in a variety of benchmark tasks in a much more efficient manner. Furthermore, we investigate additional steps during the distillation process that improves the soft-supervision of the target language, and provide a number of analyses and ablations to show the efficacy of the proposed method.
Citation
J. Christian, B. Cruz, A. Fikri, and A. Mbzuai, “Extracting General-use Transformers for Low-resource Languages via Knowledge Distillation,” 2025. Accessed: Mar. 25, 2025. [Online]. Available: https://aclanthology.org/2025.loreslm-1.17/
Source
Workshop on Language Models for Low-Resource Languages
Conference
Keywords
Knowledge distillation, Low-resource languages, Massively Multilingual Transformers (MMTs), Single-language transformers, Tagalog
Subjects
Source
Publisher
Association for Computational Linguistics
