Item

Bridging ASR and LLMs for Dysarthric Speech Recognition: Benchmarking Self-Supervised and Generative Approaches

Aboeitta, Ahmed
Sharshar, Ahmed
Nafea, Youssef
Shehata, Shady
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Dysarthric speech presents significant challenges for Automatic Speech Recognition (ASR) due to phoneme distortions and high variability. While self-supervised ASR models like Wav2Vec, HuBERT, and Whisper have shown promise, their effectiveness in dysarthric speech remains unclear. This study systematically benchmarks these models with different decoding strategies, including CTC, seq2seq, and LLM-enhanced decoding (BART, GPT-2, Vicuna). Our contributions include (1) benchmarking ASR architectures for dysarthric speech, (2) introducing LLM-based decoding to improve intelligibility, (3) analyzing generalization across datasets, and (4) providing insights into recognition errors across severity levels. Findings highlight that LLM-enhanced decoding improves dysarthric ASR by leveraging linguistic constraints for phoneme restoration and grammatical correction.
Citation
A. Aboeitta, A. Sharshar, Y. Nafea, and S. Shehata, “Bridging ASR and LLMs for Dysarthric Speech Recognition: Benchmarking Self-Supervised and Generative Approaches,” Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp. 2123–2127, 2025, doi: 10.21437/Interspeech.2025-1994.
Source
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Conference
26th Interspeech Conference 2025
Keywords
dysarthria, dysarthric speech recognition
Subjects
Source
26th Interspeech Conference 2025
Publisher
International Speech Communication Association
Full-text link