NusaDialogue: Dialogue Summarization and Generation for Underrepresented and Extremely Low-Resource Languages
Purwarianti, Ayu ; Adhista, Dea ; Baptiso, Agung ; Mahfuzh, Miftahul ; Sabila, Yusrina ; Adila, Aulia ; Cahyawijaya, Samuel ; Aji, Alham Fikri
Purwarianti, Ayu
Adhista, Dea
Baptiso, Agung
Mahfuzh, Miftahul
Sabila, Yusrina
Adila, Aulia
Cahyawijaya, Samuel
Aji, Alham Fikri
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Workshop
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Developing dialogue summarization for extremely low-resource languages is a challenging task. We introduce NusaDialogue, a dialogue summarization dataset for three underrepresented languages in the Malayo-Polynesian language family: Minangkabau, Balinese, and Buginese. NusaDialogue covers 17 topics and 185 subtopics, with annotations provided by 73 native speakers. Additionally, we conducted experiments using fine-tuning on a specifically designed medium-sized language model for Indonesian, as well as zero- and few-shot learning on various multilingual large language models (LLMs). The results indicate that, for extremely low-resource languages such as Minangkabau, Balinese, and Buginese, the fine-tuning approach yields significantly higher performance compared to zero- and few-shot prompting, even when applied to LLMs with considerably larger parameter sizes.
Citation
A. Purwarianti et al., “NusaDialogue: Dialogue Summarization and Generation for Underrepresented and Extremely Low-Resource Languages,” 2025. Accessed: Mar. 12, 2025. [Online]. Available: https://aclanthology.org/2025.sealp-1.8/
Source
Proceedings of the Second Workshop in South East Asian Language Processing, 2025
Conference
Keywords
NusaDialogue dataset, Dialogue summarization, Low-resource languages, Malayo-Polynesian languages, Large Language Models (LLMs)?
Subjects
Source
Publisher
Association for Computational Linguistics
