KazMMLU: Evaluating Language Models on Kazakh, Russian, and Regional Knowledge of Kazakhstan
Togmanov, Mukhammed ; Mukhituly, Nurdaulet ; Turmakhan, Diana ; Mansurov, Jonibek ; Goloburda, Maiya ; Sakip, Akhmed ; Xie, Zhuohan ; Wang, Yuxia ; Syzdykov, Bekassyl ; Laiyk, Nurkhan ... show 4 more
Togmanov, Mukhammed
Mukhituly, Nurdaulet
Turmakhan, Diana
Mansurov, Jonibek
Goloburda, Maiya
Sakip, Akhmed
Xie, Zhuohan
Wang, Yuxia
Syzdykov, Bekassyl
Laiyk, Nurkhan
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Despite having a population of twenty million, Kazakhstan's culture and language remain underrepresented in the field of natural language processing. Although large language models (LLMs) continue to advance worldwide, progress in Kazakh language has been limited, as seen in the scarcity of dedicated models and benchmark evaluations. To address this gap, we introduce KazMMLU, the first MMLU-style dataset specifically designed for Kazakh language. KazMMLU comprises 23,000 questions that cover various educational levels, including STEM, humanities, and social sciences, sourced from authentic educational materials and manually validated by native speakers and educators. The dataset includes 10,969 Kazakh questions and 12,031 Russian questions, reflecting Kazakhstan's bilingual education system and rich local context. Our evaluation of several state-of-the-art multilingual models (Llama3.1, Qwen-2.5, GPT-4, and DeepSeek V3) demonstrates substantial room for improvement, as even the best-performing models struggle to achieve competitive performance in Kazakh and Russian. These findings highlight significant performance gaps compared to high-resource languages. We hope that our dataset will enable further research and development of Kazakh-centric LLMs.
Citation
M. Togmanov et al., “KazMMLU: Evaluating Language Models on Kazakh, Russian, and Regional Knowledge of Kazakhstan,” vol. 1, pp. 14403–14416, Aug. 2025, doi: 10.18653/V1/2025.ACL-LONG.701
Source
Proceedings of the Annual Meeting of the Association for Computational Linguistics
Conference
63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
Keywords
Human-AI Collaboration, Large Language Models, Multi-modal Dialogue Systems, Instruction Adaptation, Efficiency-Aligned Learning, Few-Shot Task Generalisation, Contextualised Feedback Loops, Real-world Deployment Benchmarking
Subjects
Source
63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
Publisher
Association for Computational Linguistics
