Item

Probing the Limits of Multilingual Language Understanding: Low-Resource Language Proverbs as LLM Benchmark for AI Wisdom

Thapa, Surendrabikram
Rauniyar, Kritesh
Veeramani, Hariram
Adhikari, Surabhi
Razzak, Imran
Naseem, Usman
Supervisor
Department
Computational Biology
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Understanding and interpreting culturally specific language remains a significant challenge for multilingual natural language processing (NLP) systems, particularly for less-resourced languages. To address this problem, this paper introduces PRONE, a novel dataset of 2,830 Nepali proverbs, and evaluates the performance of various language models (LMs) in two tasks: (i) identifying the correct meaning of a proverb from multiple choices, and (ii) categorizing proverbs into predefined thematic categories. The models, including both open-source and proprietary, were tested in zero-shot and few-shot settings with prompts in English and Nepali. While models like GPT-4o demonstrated promising results and achieved the highest performance among LMs, they still fall short of human-level accuracy in understanding and categorizing culturally nuanced content, highlighting the need for more inclusive NLP.
Citation
S. Thapa, K. Rauniyar, H. Veeramani, S. Adhikari, I. Razzak, and U. Naseem, “Probing the Limits of Multilingual Language Understanding: Low-Resource Language Proverbs as LLM Benchmark for AI Wisdom,” Proceedings of the 6th Workshop on Computational Approaches to Discourse, Context and Document-Level Inferences (CODI 2025), pp. 120–129, 2025, doi: 10.18653/V1/2025.CODI-1.11
Source
Proceedings of the 6th Workshop on Computational Approaches to Discourse, Context and Document-Level Inferences (CODI 2025)
Conference
6th Workshop on Computational Approaches to Discourse, Context and Document-Level Inferences
Keywords
Low-Resource Language Proverbs, Multilingual Language Understanding, Large Language Models, Cultural Wisdom Benchmark, Nepali Proverbs Dataset, Zero-Shot and Few-Shot Evaluation, Thematic Categorisation, AI Wisdom Assessment
Subjects
Source
6th Workshop on Computational Approaches to Discourse, Context and Document-Level Inferences
Publisher
Association for Computational Linguistics
Full-text link