Loading...
A Tale of Two Scripts: Transliteration and Post-Correction for Judeo-Arabic
Gonzalez, Juan Moreno ; Alhafni, Bashar ; Habash, Nizar
Gonzalez, Juan Moreno
Alhafni, Bashar
Habash, Nizar
Files
Loading...
2026.eacl-long.93.pdf
Adobe PDF, 433.58 KB
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Conference proceeding
Date
License
http://creativecommons.org/licenses/by/4.0/
Language
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Judeo-Arabic refers to Arabic variants historically spoken by Jewish communities across the Arab world, primarily during the Middle Ages. Unlike standard Arabic, it is written in Hebrew script by Jewish writers and for Jewish audiences. Transliterating Judeo-Arabic into Arabic script is challenging due to ambiguous letter mappings, inconsistent orthographic conventions, and frequent code-switching into Hebrew. In this paper, we introduce a two-step approach to automatically transliterate Judeo-Arabic into Arabic script: simple character-level mapping followed by post-correction to address grammatical and orthographic errors. We also present the first benchmark evaluation of LLMs on this task. Finally, we show that transliteration enables Arabic NLP tools to perform morphosyntactic tagging and machine translation, which would have not been feasible on the original texts. We make our code and data publicly available.
Citation
J.M. Gonzalez, B. Alhafni, N. Habash, "A Tale of Two Scripts: Transliteration and Post-Correction for Judeo-Arabic," 2026, pp. 2100-2113.
Source
Conference
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Keywords
Subjects
Source
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Publisher
Association for Computational Linguistics
