Item

Enhancing Automatic Speech Recognition for Emirati-English Code-Switched Speech

Al-Ali, Maryam
Department
Machine Learning
Embargo End Date
2024-01-01
Type
Thesis
Date
2024
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
This thesis explores automatic speech recognition (ASR) for Emirati Arabic-English code-switching, a common phenomenon in the United Arab Emirates (UAE). The study addresses the challenges of transcribing and understanding code-switched speech, contributing to advancements in multilingual ASR technology. A foundation of this research is the Mixat dataset, a comprehensive resource consisting of approximately 15 hours of audio content. This dataset was derived from UAE native podcasts. This dataset captures the complexities of Emirati Arabic-English code-switching, encompassing diverse linguistic variations and code-switching patterns observed across the UAE. Utilizing Mixat, baseline ASR models, including Whisper, MMS, and ArTST, were developed and fine-tuned for improved code-switched speech recognition. Our experiments revealed notable improvements in ASR performance following fine-tuning, particularly in the podcast-based setting. Among the models, Whisper emerged as the top performer, achieving a significant reduction in Word Error Rate (WER) from a baseline of 168.52 to 35.21 in this setting. This improvement was also observed when evaluating the model specifically on code-switching segments only, with a WER reduction from 121.78 to 37.43. Character Error Rate (CER) followed a similar trend.
Citation
M. Al-Ali, "Enhancing Automatic Speech Recognition for Emirati-English Code-Switched Speech", MS. Thesis, Machine Learning, MBZUAI, Abu Dhabi, UAE, 2024
Source
Conference
Keywords
Subjects
Source
Publisher
DOI
Full-text link