Item

Arabic Dataset for LLM Safeguard Evaluation

Ashraf, Yasser
Wang, Yuxia
Gu, Bin
Nakov, Preslav Ivanov
Baldwin, Timothy
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
The growing use of large language models (LLMs) has raised concerns regarding their safety. While many studies have focused on English, the safety of LLMs in Arabic, with its linguistic and cultural complexities, remains under-explored. Here, we aim to bridge this gap. In particular, we present an Arab-region-specific safety evaluation dataset consisting of 5,799 questions, including direct attacks, indirect attacks, and harmless requests with sensitive words, adapted to reflect the socio-cultural context of the Arab world. To uncover the impact of different stances in handling sensitive and controversial topics, we propose a dual-perspective evaluation framework. It assesses the LLM responses from both governmental and opposition viewpoints. Experiments over five leading Arabic-centric and multilingual LLMs reveal substantial disparities in their safety performance. This reinforces the need for culturally specific datasets to ensure the responsible deployment of LLMs. Warning: this paper contains example data that may be offensive, harmful, or biased.
Citation
Y. Ashraf, Y. Wang, B. Gu, P. Nakov, and T. Baldwin, “Arabic Dataset for LLM Safeguard Evaluation,” vol. 1, pp. 5529–5546, Jun. 2025, doi: 10.18653/V1/2025.NAACL-LONG.285
Source
Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies: Long Papers, NAACL-HLT 2025
Conference
2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2025
Keywords
Subjects
Source
2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2025
Publisher
Association for Computational Linguistics
Full-text link