Item

ASR Models for Traditional Emirati Arabic: Challenges, Adaptations, and Performance Evaluation

Alblooki, Maha
Inui, Kentaro
Shehata, Shady
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Traditional Emirati Arabic, a culturally rich and linguistically distinct dialect, remains underrepresented in modern automatic speech recognition (ASR) systems. This paper addresses the gap by introducing a curated speech corpus derived from heritage broadcasts and literary sources, and by evaluating the performance of state-of-the-art ASR models on this lowresource dialect. We examine the zero-shot and fine-tuned performance of five pre-trained models—Wav2Vec2, XLS-R, Whisper, and Massively Multilingual Speech (MMS)—on our traditional Emirati Arabic dataset. Our results show that fine-tuning improves both Word Error Rate (WER) and Character Error Rate (CER), with MMS achieving the best results post-adaptation. Through detailed error analysis, we highlight challenges posed by dialectal morphology, phonology, and lexical variation, and propose targeted adaptations for dialectspecific ASR. This work establishes a foundational benchmark for traditional Emirati ASR and contributes to the broader goal of preserving linguistic heritage through speech technology.
Citation
M. AlBlooki, A. Dhabi, K. Inui, and S. Shehata, “ASR Models for Traditional Emirati Arabic: Challenges, Adaptations, and Performance Evaluation,” 2025. Accessed: Oct. 15, 2025. [Online]. Available: https://aclanthology.org/2025.icnlsp-1.5/
Source
Proceedings of the 8th International Conference on Natural Language and Speech Processing (ICNLSP-2025)
Conference
8th International Conference on Natural Language and Speech Processing
Keywords
Subjects
Source
8th International Conference on Natural Language and Speech Processing
Publisher
Association for Computational Linguistics
DOI
Full-text link