Towards Inclusive NLP: Assessing Compressed Multilingual Transformers Across Diverse Language Benchmarks
Alshehhi, Maitha ; Sharshar, Ahmed ; Guizani, Mohsen
Alshehhi, Maitha
Sharshar, Ahmed
Guizani, Mohsen
Supervisor
Department
Machine Learning
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Although LLMs have attained significant success in high-resource languages, their capacity in low-resource linguistic environments like Kannada and Arabic is not yet fully understood. This work benchmarking the performance of multilingual and monolingual Large Language Models (LLMs) across Arabic, English, and Indic languages, with particular emphasis on the effects of model compression strategies such as pruning and quantization. Findings shows significant performance differences driven by linguistic diversity and resource availability on SOTA LLMS as BLOOMZ, AceGPT, Jais, LLaMA-2, XGLM, and AraGPT2. We find that multilingual versions of the model outperform their language-specific counterparts across the board, indicating substantial cross-lingual transfer benefits. Quantization (4-bit and 8-bit) is effective in maintaining model accuracy while promoting efficiency, but aggressive pruning significantly compromises performance, especially in bigger models. Our findings pinpoint key strategies to construct scalable and fair multilingual NLP solutions and underscore the need for interventions to address hallucination and generalization errors in the low-resource setting.
Citation
M. Alshehhi, A. Sharshar, and M. Guizani, “Towards Inclusive NLP: Assessing Compressed Multilingual Transformers Across Diverse Language Benchmarks,” Communications in Computer and Information Science, vol. 2640 CCIS, pp. 108–126, 2025, doi: 10.1007/978-981-95-0988-1_8
Source
Communications in Computer and Information Science
Conference
Generalizing from Limited Resources in the Open World, Third International Workshop, GLOW 2025
Keywords
Low-resource Languages, Model Compression, Model Evaluation And Benchmarking, Multilingual Large Language Models (llms), Computational Linguistics, Natural Language Processing Systems, Arabic-english, Language Model, Linguistic Environment, Low Resource Languages, Model Benchmarking, Model Compression, Model Evaluation, Multilingual Large Language Model, Performance, Quantisation, Benchmarking
Subjects
Source
Generalizing from Limited Resources in the Open World, Third International Workshop, GLOW 2025
Publisher
Springer Nature
