ASR Models for Traditional Emirati Arabic: Challenges, Adaptations, and Performance Evaluation
Alblooki, Maha Mohammad Abdulla Abdulkarim
Alblooki, Maha Mohammad Abdulla Abdulkarim
Supervisor
Department
Natural Language Processing
Embargo End Date
30/05/2026
Type
Thesis
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
The rapid advancement of Automatic Speech Recognition (ASR) technology has significantly improved human-computer interaction across various languages and even dialects; however, traditional dialects, particularly the pure Emirati dialect, remain under-represented in modern ASR systems. This research addresses this gap by creating the first traditional Emirati dataset and developing a dialect-specific ASR model capable of accurately transcribing traditional Emirati Arabic, a spoken variety with distinct phonetic, morphological, and syntactic characteristics. This study adopts a data-driven approach, beginning with the compilation of an Emirati dialect speech corpus, sourced from heritage radio broadcasts and literary resources. To effectively model this dialect, the research explores multiple ASR architectures of pre-trained, self-supervised learning models, like Wav2Vec2, XLS-R, Whisper, and MMS, which are finetuned afterwards to assess their adaptability to traditional Emirati Arabic speech, performance across different linguistic variations, and effectiveness in reducing Word Error Rate (WER) and Character Error Rate (CER) compared to baseline models. Through a rigorous evaluation of average WER and CER of the baseline models and the fine-tuned ones, the study demonstrates the effectiveness of fine-tuned, self-supervised models in improving recognition accuracy for dialectal speech. The results indicate that using domain-specific data and linguistic adaptations significantly enhances ASR performance, paving the way for more robust and inclusive Arabic dialectal speech technologies.
Citation
Maha Mohammad Abdulla Abdulkarim Alblooki, “ASR Models for Traditional Emirati Arabic: Challenges, Adaptations, and Performance Evaluation,” Master of Science thesis, Natural Language Processing, MBZUAI, 2025.
Source
Conference
Keywords
Emirati Arabic Dataset, Automatic Speech Recognition, Low-Resource Dialect, Gulf Speech Corpus
