Item

Robust ASR for Russian Dysarthric Speech Recognition

Syzdykov, Bekassyl
Department
Natural Language Processing
Embargo End Date
30/05/2027
Type
Thesis
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Hyperkinetic dysarthria degrades speech due to involuntary muscle movements, making it difficult for automatic speech recognition systems. While English has plenty of dysarthric automatic speech recognition datasets, resources for languages like Russian are limited. This study introduces the state-of-the-art Russian Dysarthric Speech Recognition by fine-tuning multiple transformer-based models on Russian Hyperkinetic Dysarthria dataset. We further augment the dataset using different techniques in the time and frequency domains to evaluate the impact of data augmentation on the ASR performance. Our analysis shows that Whisper-based models outperform traditional ASR baselines, with the best model, Whisper-Large-V3-Turbo, achieving the state-of-the-art with a word error rate (WER) of 18.36% and a character error rate (CER) of 9.16%, a significant improvement over existing methods. To the best of our knowledge, this is the first work to explore fine-tuning state-of-the-art ASR transformer models for Russian Dysarthric Speech Recognition. Our findings underscore the potential of transformer-based ASR models for underrepresented languages and highlight the need for specialized dysarthric speech datasets and training strategies.
Citation
Bekassyl Syzdykov, “Robust ASR for Russian Dysarthric Speech Recognition,” Master of Science thesis, Natural Language Processing, MBZUAI, 2025.
Source
Conference
Keywords
Automatic Speech Recognition, ASR encoder-decoder architecture, Hyperkinetic Dysarthric Speech
Subjects
Source
Publisher
DOI
Additional links
Full-text link