Item

Diacritics Generation in Arabic Texts Using GPT Models: Bridging Technological Advances and Linguistic Depth

Albreiki, Ahmed
Department
Machine Learning
Embargo End Date
2024-01-01
Type
Thesis
Date
2024
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
This thesis investigates the challenge of generating Arabic diacritics, a critical component in processing Arabic texts, leveraging the capabilities of Generative Pre-trained Transformer (GPT) models. Given the inherent complexity of Arabic, with its syntactic ambiguity and the nuanced role of diacritics in determining meaning, conventional natural language processing techniques often fall short. Through a comprehensive study, we evaluated the performance of various GPT models, including GPT-3.5, GPT-4, and specifically finetuned versions, across diverse text genres. The research adopted a methodology focusing on different model configurations and prompt designs to optimize diacritics generation. Quantitative analysis revealed significant improvements with finetuned GPT models, achieving a diacritics accuracy of up to 92.61% and word accuracy of 81.41%, markedly surpassing existing diacritics generation benchmarks. These findings underscore the potential of advanced AI models in enhancing Arabic text processing, offering insights into optimal strategies for implementing GPT models in diacritics generation. The study s implications extend to various applications, from text-to-speech synthesis to automated translation, demonstrating the critical role of finetuned GPT models in advancing Arabic linguistic research and technology.
Citation
A. Albreiki, "Diacritics Generation in Arabic Texts Using GPT Models: Bridging Technological Advances and Linguistic Depth", MS. Thesis, Machine Learning, MBZUAI, Abu Dhabi, UAE, 2024
Source
Conference
Keywords
Subjects
Source
Publisher
DOI
Full-text link