ArVoice: A Multi-Speaker Dataset for Arabic Speech Synthesis
Toyin, Hawau Olamide ; Marew, Rufael ; Alblooshi, Humaid ; Magdy, Samar M. ; Aldarmaki, Hanan
Toyin, Hawau Olamide
Marew, Rufael
Alblooshi, Humaid
Magdy, Samar M.
Aldarmaki, Hanan
Supervisor
Department
Machine Learning
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
We introduce ArVoice, a multi-speaker Modern Standard Arabic (MSA) speech corpus with diacritized transcriptions, intended for multi-speaker speech synthesis, and can be useful for other tasks such as speech-based diacritic restoration, voice conversion, and deepfake detection. ArVoice comprises: (1) a new professionally recorded set from six voice talents with diverse demographics, (2) a modified subset of the Arabic Speech Corpus; and (3) high-quality synthetic speech from two commercial systems. The complete corpus consists of a total of 83.52 hours of speech across 11 voices; around 10 hours consist of human voices from 7 speakers. We train three open-source TTS and two voice conversion systems to illustrate the use cases of the dataset. The corpus is available for research use.
Citation
H. Toyin, R. Marew, H. Alblooshi, S. M. Magdy, and H. Aldarmaki, “ArVoice: A Multi-Speaker Dataset for Arabic Speech Synthesis,” Interspeech 2025, pp. 4808–4812, Aug. 2025, doi: 10.21437/INTERSPEECH.2025-1550
Source
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Conference
26th Interspeech Conference 2025
Keywords
Subjects
Source
26th Interspeech Conference 2025
Publisher
International Speech Communication Association
