Loading...
Thumbnail Image
Item

A Bilingual Bimodal Benchmark for Arabic-English NLP across Grammatical Correction, Essay Scoring, Morphological Tagging, and Speech Recognition

Alhafni, Bashar
Hamed, Injy
Eryani, Fadhl
Palfreyman, David
Habash, Nizar
Citations
Google Scholar:
Altmetric:
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Conference proceeding
Date
License
http://creativecommons.org/licenses/by/4.0/
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Building comprehensive datasets that support a variety of NLP tasks and cover a diversity of languages and domains is vital for NLP evaluation purposes. In this paper, we present ZAEBUC*, a dataset that builds upon and enriches prior corpora with new annotations and benchmarking experiments. ZAEBUC* serves as a benchmark for a range of NLP tasks, including grammatical error correction, automated essay scoring, automatic speech recognition, and morphological tagging, which includes tokenization, part-of-speech tagging, and lemmatization. The dataset covers Arabic and English in both written and spoken forms, offering a bilingual and bimodal resource. Furthermore, the corpus brings together a collection of resources gathered from a similar population, enabling cross-linguistic and cross-modal comparisons. We provide benchmarking results, demonstrating the performance of NLP models, including LLMs, across various tasks, languages, and modalities.
Citation
B. Alhafni, I. Hamed, F. Eryani, D. Palfreyman, N. Habash, "A Bilingual Bimodal Benchmark for Arabic-English NLP across Grammatical Correction, Essay Scoring, Morphological Tagging, and Speech Recognition," 2026, pp. 1732-1749.
Source
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Keywords
47 Language, Communication and Culture, 4703 Language Studies, 4704 Linguistics
Subjects
Source
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Publisher
ELDA (Evaluations and Language resources Distribution Agency)
Full-text link