Loading...
MBZUAI at AMIYA Shared Task 2026: Adapting Open-Source LLMs for Dialectal Arabic
Gaber, Rana ; Allam, Yara ; Amin, Serag ; Aly, Ranwa ; Alhafni, Bashar
Gaber, Rana
Allam, Yara
Amin, Serag
Aly, Ranwa
Alhafni, Bashar
Files
Loading...
2026.vardial-1.31.pdf
Adobe PDF, 1.05 MB
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Conference proceeding
Date
License
http://creativecommons.org/licenses/by/4.0/
Language
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
This paper presents our contribution to the closed data track of the AMIYA Shared Task on Dialectal Arabic text generation. In this track, we train fully open-source Large Language Models (LLMs) on five Arabic dialects: Egyptian, Moroccan, Palestinian, Saudi, and Syrian, using the provided training datasets. We experiment with different base and instruct models using several pretraining and instruction tuning approaches. In total, five models were submitted, with three variants per dialect. Our best-performing models for the five dialects are ALLaM for Egyptian, LLaMa for Moroccan, and Palestinian, and Aya for Saudi and Syrian.
Citation
R. Gaber, Y. Allam, S. Amin, R. Aly, B. Alhafni, "MBZUAI at AMIYA Shared Task 2026: Adapting Open-Source LLMs for Dialectal Arabic," 2026, pp. 373-384.
Source
Conference
Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects
Keywords
Subjects
Source
Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects
Publisher
Association for Computational Linguistics
