Item

An Effective Training Framework for Light-Weight Automatic Speech Recognition Models

Hannan, Abdul
Brutti, Alessio
Nawaz, Shah
Noman, Mubashir
Supervisor
Department
Computer Vision
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Recent advancement in deep learning encouraged developing large automatic speech recognition (ASR) models that achieve promising results while ignoring computational and memory constraints. However, deploying such models on low resource devices is impractical despite of their favorable performance. Existing approaches (pruning, distillation, layer skip etc.) transform the large models into smaller ones at the cost of significant performance degradation or require prolonged training of smaller models for better performance. To address these issues, we introduce an efficacious two-step representation learning based approach capable of producing several small sized models from a single large model ensuring considerably better performance in limited number of epochs. Comprehensive experimentation on ASR benchmarks reveals the efficacy of our approach, achieving three-fold training speed-up and up to 12.54% word error rate improvement.
Citation
A. Hannan, A. Brutti, S. Nawaz, and M. Noman, “An Effective Training Framework for Light-Weight Automatic Speech Recognition Models,” Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp. 3613–3617, 2025, doi: 10.21437/INTERSPEECH.2025-1704
Source
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Conference
26th Interspeech Conference 2025
Keywords
Automatic Speech Recognition, Efficient Training, Feature Representation Learning, Light-weight Model
Subjects
Source
26th Interspeech Conference 2025
Publisher
International Speech Communication Association
Full-text link