Item

Enhancing Arabic Automated Essay Scoring with Synthetic Data and Error Injection

Qwaider, Chatrine
Alhafni, Bashar
Chirkunov, Kirill
Habash, Nizar
Briscoe, Ted
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Automated Essay Scoring (AES) plays a crucial role in assessing language learners’ writingquality, reducing grading workload, and providing real-time feedback. The lack of annotatedessay datasets inhibits the development of Arabic AES systems. This paper leverages LargeLanguage Models (LLMs) and Transformermodels to generate synthetic Arabic essays forAES. We prompt an LLM to generate essaysacross the Common European Framework ofReference (CEFR) proficiency levels and introduce and compare two approaches to errorinjection. We create a dataset of 3,040 annotated essays with errors injected using our twomethods. Additionally, we develop a BERTbased Arabic AES system calibrated to CEFRlevels. Our experimental results demonstratethe effectiveness of our synthetic dataset in improving Arabic AES performance. We makeour code and data publicly available
Citation
C. Qwaider, B. Alhafni, K. Chirkunov, N. Habash, and T. Briscoe, “Enhancing Arabic Automated Essay Scoring with Synthetic Data and Error Injection,” pp. 549–563, Aug. 2025, doi: 10.18653/V1/2025.BEA-1.40.
Source
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications
Conference
20th Workshop on Innovative Use of NLP for Building Educational Applications
Keywords
Subjects
Source
20th Workshop on Innovative Use of NLP for Building Educational Applications
Publisher
Association for Computational Linguistics
Full-text link