DysHub Augment: Addressing Data Scarcity Challenges For Dysarthria Assessment & Automatic Speech Recognition
Alhaddad, Rzan
Alhaddad, Rzan
Author
Supervisor
Department
Machine Learning
Embargo End Date
2026-05-21
Type
Thesis
Date
2024
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Dysarthria is a motor speech disorder marked by significant challenges in articulating speech sounds, often due to muscle weakness or incoordination. Individuals with dysarthria encounter several obstacles, including the initial diagnosis process, determining the severity level of their condition, and perhaps most critically, communicating effectively with others and voice-activated technologies found in various devices. A key focus of our study addresses the prevalent issue of data scarcity within this field of research by employing various data augmentation techniques. While studies resorted to voice conversion (VC) and generative adversarial networks (GANs) on healthy speech to enlarge their training datasets, our set of novel augmentation techniques is applied directly to dysarthric data. One of the techniques, which we call back generated synthetic dysarthric units (SDU), introduces a novel technique inspired by the back translation method used in machine translation. Additionally, we apply a range of augmentations such as speed, noise, time masking, and frequency masking on impaired speech data rather than on normal speech to produce dysarthria-like speech data. To the best of our knowledge, our pipeline for the automatic speech recognition (ASR) model for dysarthria represents a novel contribution to the field, distinguishing our work from existing research. Similarly, our application of HuBERT for the classification of dysarthria appears to be unprecedented. Our findings indicate that HuBERT demonstrates promising performance in our three tasks: dysarthria assessment and detection, achieving accuracy scores of 99.47% and 99.5%, respectively, and ASR. However, in the ASR task, introducing SDUs did not consistently enhance performance, especially with higher volumes of augmentation data. Interestingly, small amounts of augmented data did lead to improvements over the baseline. Our future work will explore speech normalization and speaker adaptation techniques to enhance ASR performance. As for the classification task, we will compare traditional and deep learning classifiers for HuBERT-extracted discrete units.
Citation
R. Alhaddad, "DysHub Augment: Addressing Data Scarcity Challenges For Dysarthria Assessment & Automatic Speech Recognition", MS. Thesis, Machine Learning, MBZUAI, Abu Dhabi, UAE, 2024
