Item

BALSAM: A Platform for Benchmarking Arabic Large Language Models

Almatham, Rawan Nasser
Darwish, Kareem Mohamed
Al-Rasheed, Raghad
Alshammari, Waad Thuwaini
Alhoshan, Muneera
Almazrua, Amal
Al Wazrah, Asma
Alheraki, Mais
Alam, Firoj
Nakov, Preslav
... show 10 more
Research Projects
Organizational Units
Journal Issue
Abstract
The impressive advancement of Large Language Models (LLMs) in English has not been matched across all languages. In particular, LLM performance in Arabic lags behind, due to data scarcity, linguistic diversity of Arabic and its dialects, morphological complexity, etc. Progress is further hindered by the quality of Arabic benchmarks, which typically rely on static, publicly available data, lack comprehensive task coverage, or do not provide dedicated platforms with blind test sets. This makes it challenging to measure actual progress and to mitigate data contamination. Here, we aim to bridge these gaps. In particular, we introduce BALSAM, a comprehensive, community-driven benchmark aimed at advancing Arabic LLM development and evaluation. It includes 78 NLP tasks from 14 broad categories, with 52K examples divided into 37K test and 15K development, and a centralized, transparent platform for blind evaluation. We envision BALSAM as a unifying platform that sets standards and promotes collaborative research to advance Arabic LLM capabilities.
Citation
R. N. Almatham et al., “BALSAM: A Platform for Benchmarking Arabic Large Language Models,” Proceedings of The Third Arabic Natural Language Processing Conference, pp. 258–277, 2025, doi: 10.18653/V1/2025.ARABICNLP-MAIN.21
Source
Proceedings of The Third Arabic Natural Language Processing Conference
Conference
Third Arabic Natural Language Processing Conference
Keywords
Arabic Large Language Models, Benchmarking Platform, Multitask Evaluation, Arabic NLP Tasks, Blind Test Sets, Leaderboard Design, Data Diversity, Community-Driven Benchmarking
Subjects
Source
Third Arabic Natural Language Processing Conference
Publisher
Association for Computational Linguistics
Full-text link