ClearVoice: Dysarthric Speech Recognition using Speech-to-Text Model
Aboeitta, Ahmed
Aboeitta, Ahmed
Author
Supervisor
Department
Natural Language Processing
Embargo End Date
20/05/2025
Type
Thesis
Date
2024
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Dysarthric speech, resulting from neurological disorders, presents a significant challenge for Automatic Speech Recognition (ASR) systems. Traditional ASR technologies, designed with typical speech patterns in mind, often fail to accommodate the unique characteristics of dysarthric speech, such as its variability in tempo, clarity, and articulation. This limitation not only hinders the accuracy of these systems but also restricts the accessibility of speech-based technologies for individuals with dysarthria. Recognizing this gap, our research introduces ClearVoice, an innovative speech-to-text encoder-decoder model tailored specifically for the recognition and transcription of dysarthric speech into accurate, readable text. ClearVoice distinguishes itself through a novel integration of two advanced components: (1) Pretrained audio encoders utilizing discrete unit representations to precisely capture the complex acoustic features of dysarthric speech, and (2) Text autoregressive decoders designed to understand and transcribe these features effectively. This combination enables our model to navigate the intricate nuances of dysarthric speech with remarkable accuracy, far surpassing the capabilities of existing ASR technologies in this domain. Our comprehensive evaluation of ClearVoice on two widely recognized dysarthric speech datasets, TORGO and UASPEECH, demonstrates its superior performance. By achieving an average Word Error Rate (WER) of 0.019 on UASpeech and 0.129 on TORGO, ClearVoice sets new state-of-the-art benchmarks, significantly outperforming prior models. These results highlight the model s ability to understand and transcribe dysarthric speech with unprecedented accuracy, offering a significant advancement in the field of speech recognition technologies. The development of ClearVoice represents a pivotal step towards bridging the communication gap faced by individuals with dysarthria. By enhancing the accuracy and accessibility of ASR systems for dysarthric speech, this work opens new avenues for the use of speech-based technologies as effective communication aids. Furthermore, the insights gained from this research contribute to the broader understanding of speech recognition challenges and encourage the development of more inclusive and adaptable ASR technologies. Future directions for this work include refining the model s adaptability to individual speech patterns and extending its application to accommodate a wider range of speech impairments and low-resource languages.
Citation
A. Aboeitta, "ClearVoice: Dysarthric Speech Recognition using Speech-to-Text Mo", M.S. Thesis, Natural Language Processing, MBZUAI, Abu Dhabi, UAE, 2024.
