Diacritics Generation in Arabic Texts Using GPT Models: Bridging Technological Advances and Linguistic Depth
Albreiki, Ahmed
Albreiki, Ahmed
Author
Supervisor
Department
Machine Learning
Embargo End Date
2024-01-01
Type
Thesis
Date
2024
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
This thesis investigates the challenge of generating Arabic diacritics, a critical component in processing Arabic texts, leveraging the capabilities of Generative Pre-trained Transformer (GPT) models. Given the inherent complexity of Arabic, with its syntactic ambiguity and the nuanced role of diacritics in determining meaning, conventional natural language processing techniques often fall short. Through a comprehensive study, we evaluated the performance of various GPT models, including GPT-3.5, GPT-4, and specifically finetuned versions, across diverse text genres. The research adopted a methodology focusing on different model configurations and prompt designs to optimize diacritics generation. Quantitative analysis revealed significant improvements with finetuned GPT models, achieving a diacritics accuracy of up to 92.61% and word accuracy of 81.41%, markedly surpassing existing diacritics generation benchmarks. These findings underscore the potential of advanced AI models in enhancing Arabic text processing, offering insights into optimal strategies for implementing GPT models in diacritics generation. The study s implications extend to various applications, from text-to-speech synthesis to automated translation, demonstrating the critical role of finetuned GPT models in advancing Arabic linguistic research and technology.
Citation
A. Albreiki, "Diacritics Generation in Arabic Texts Using GPT Models: Bridging Technological Advances and Linguistic Depth", MS. Thesis, Machine Learning, MBZUAI, Abu Dhabi, UAE, 2024
